Comp Chapter 1

Compiler Chapter 1

Uploaded by

Addisalem Ganfure

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

Comp Chapter 1

Compiler Chapter 1

Uploaded by

Addisalem Ganfure

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 31

Introduction to Compilers

What are Compilers?

Compiler is a program which translates a program written in one language (the source language) to an
equivalent program in other language (the target language). Usually the source language is a high level
language like Java, C, Fortran etc. whereas the target language is machine code or "code" that a computer's
processor understands. The source language is optimized for humans. It is more user-friendly, to some
extent platform-independent. They are easier to read, write, and maintain and hence it is easy to avoid errors.
Ultimately, programs written in a high-level language must be translated into machine language by a
compiler. The target machine language is efficient for hardware but lacks readability.
Compilers
. Translates from one representation of the program to another
. Typically from high level source code to low level machine code or object code
. Source code is normally optimized for human readability
- Expressive: matches our notion of languages (and application?!)
- Redundant to help avoid programming errors
. Machine code is optimized for hardware
- Redundancy is reduced
- Information about the intent is lost
How to translate?

• The high level languages and machine languages differ in level of abstraction. At machine
level we deal with memory locations, registers whereas these resources are never accessed
in high level languages. But the level of abstraction differs from language to language and
some languages are farther from machine code than others
• Goals of translation
• Good performance for the generated code
• Good performance for generated code : The metric for the quality of the generated code is
the ratio between the size of handwritten code and compiled machine code for same
program. A better compiler is one which generates smaller code. For optimizing compilers this
ratio will be lesser.
• Good compile time performance
• Good compile time performance : A handwritten machine code is more efficient than a
compiled code in terms of the performance it produces. In other words, the program
handwritten in machine code will run faster than compiled code. If a compiler produces a
code which is 20-30% slower than the handwritten code then it is considered to be
acceptable. In addition to this, the compiler itself must run fast (compilation time must be
proportional to program size).
• Maintainable code
• High level of abstraction
• Correctness is a very important issue.
• Correctness : A compiler's most important goal is correctness -
all valid programs must compile correctly. How do we check if a
compiler is correct i.e. whether a compiler for a programming
language generates correct machine code for programs in the
language. The complexity of writing a correct compiler is a
major limitation on the amount of optimization that can be done.
• Can compilers be proven to be correct? Very tedious!
• . However, the correctness has an implication on the
development cost
• Many modern compilers share a common 'two stage' design.
The "front end" translates the source language or the high level
program into an intermediate representation. The second stage
is the "back end", which works with the internal representation
to produce code in the output language which is a low level
code. The higher the abstraction a compiler can support, the
better it is.
The Big picture

• Compiler is part of program development environment

• The other typical components of this environment are editor,
assembler, linker, loader, debugger, profiler etc.
• The compiler (and all other tools) must support each other for
easy program development
• All development systems are essentially a combination of many
tools. For compiler, the other tools are debugger, assembler,
linker, loader, profiler, editor etc. If these tools have support for
each other than the program development becomes a lot easier.
• This is how the various tools work in coordination to make
programming easier and better. They all have a specific task to
accomplish in the process, from writing a code to compiling it
and running/debugging it. If debugged then do manual
correction in the code if needed, after getting debugging results.
It is the combined contribution of these tools that makes
programming a lot easier and efficient.
How to translate easily?
• In order to translate a high level code to a machine code one needs to go step by step,
with each step doing a particular task and passing out its output for the next step in the
form of another program representation. The steps can be parse tree generation, high
level intermediate code generation, low level intermediate code generation, and then the
machine language conversion. As the translation proceeds the representation becomes
more and more machine specific, increasingly dealing with registers, memory locations
etc.

• . Translate in steps. Each step handles a reasonably simple, logical, and well defined task
• . Design a series of program representations
• . Intermediate representations should be amenable to program manipulation of various
kinds (type checking, optimization, code generation etc.)
• . Representations become more machine specific and less language specific as the
translation proceeds
The first few steps

• The first few steps of compilation like lexical, syntax and semantic analysis can be
understood by drawing analogies to the human way of comprehending a natural
language. The first step in understanding a natural language will be to recognize
characters, i.e. the upper and lower case alphabets, punctuation marks, alphabets, digits,
white spaces etc. Similarly the compiler has to recognize the characters used in a
programming language. The next step will be to recognize the words which come from a
dictionary. Similarly the programming language have a dictionary as well as rules to
construct words (numbers, identifiers etc).
• . The first step is recognizing/knowing alphabets of a language. For example
• - English text consists of lower and upper case alphabets, digits, punctuations and white spaces
• - Written programs consist of characters from the ASCII characters set (normally 9-13, 32-126)
• . The next step to understand the sentence is recognizing words (lexical analysis)
• - English language words can be found in dictionaries
• - Programming languages have a dictionary (keywords etc.) and rules for constructing words
(identifiers, numbers etc.)
Lexical Analysis

• .Recognizing words is not completely trivial.

• For example: ist his ase nte nce?
• . Therefore, we must know what the word separators are
• . The language must define rules for breaking a sentence into a sequence of words.
• . Normally white spaces and punctuations are word separators in languages.
• . In programming languages a character from a different class may also be treated as word
separator.
• . The lexical analyzer breaks a sentence into a sequence of words or tokens: - If a == b then a =
1 ; else a = 2 ; - Sequence of words (total 14 words) if a == b then a = 1 ; else a = 2 ;
• In simple words, lexical analysis is the process of identifying the words from an input
string of characters, which may be handled more easily by a parser. These words must be
separated by some predefined delimiter or there may be some rules imposed by the
language for breaking the sentence into tokens or words which are then passed on to the
next phase of syntax analysis. In programming languages, a character from a different
class may also be considered as a word separator.
The next step
• . Once the words are understood, the next step is to understand
the structure of the sentence
• . The process is known as syntax checking or parsing
• Syntax analysis (also called as parsing) is a process of
imposing a hierarchical (tree like) structure on the token stream.
It is basically like generating sentences for the language using
language specific grammatical rules as we have in our natural
language
• Ex. sentence subject + object + subject The example drawn
above shows how a sentence in English (a natural language)
can be broken down into a tree form depending on the construct
of the sentence.
Parsing

• Just like a natural language, a programming language also has

a set of grammatical rules and hence can be broken down into
a parse tree by the parser. It is on this parse tree that the further
steps of semantic analysis are carried out.
• This is also used during generation of the intermediate
language code. Yacc (yet another compiler compiler) is a
program that generates parsers in the C programming
language.
Understanding the meaning

• . Once the sentence structure is understood we try to understand the meaning of the
sentence (semantic analysis)
• . Example: Prateek said Nitin left his assignment at home
• . What does his refer to? Prateek or Nitin ?
• . Even worse case
• Amit said Amit left his assignment at home
• . How many Amits are there? Which one left the assignment?
• Semantic analysis is the process of examining the statements and to make
sure that they make sense. During the semantic analysis, the types, values,
and other required information about statements are recorded, checked, and
transformed appropriately to make sure the program makes sense. Ideally
there should be no ambiguity in the grammar of the language. Each sentence
should have just one meaning.
Semantic Analysis

• . Too hard for compilers. They do not have capabilities similar to human understanding
• . However, compilers do perform analysis to understand the meaning and catch
inconsistencies
• . Programming languages define strict rules to avoid such ambiguities

{ int Amit = 3;
{ int Amit = 4;
cout << Amit;
}
}
• Since it is too hard for a compiler to do semantic analysis, the programming languages
define strict rules to avoid ambiguities and make the analysis easier. In the code written
above, there is a clear demarcation between the two instances of Amit. This has been
done by putting one outside the scope of other so that the compiler knows that these
two Amit are different by the virtue of their different scopes.
More on Semantic Analysis

• . Compilers perform many other checks besides variable bindings

• . Type checking Amit left her work at home
• . There is a type mismatch between her and Amit . Presumably Amit is a
male. And they are not the same person.
• From this we can draw an analogy with a programming statement. In
the statement:
• double y = "Hello World";
• The semantic analysis would reveal that "Hello World" is a string, and
y is of type double,
• which is a type mismatch and hence, is wrong.
Compiler structure once again

Till now we have conceptualized the front end of the compiler with its 3 phases, viz.
Lexical Analysis, Syntax Analysis and Semantic Analysis; and the work done in each
of the three phases. Next, we look into the backend in the forthcoming slides.
Front End Phases

Lexical analysis is based on the finite state automata and hence finds the lexicons from the
input on the basis of corresponding regular expressions. If there is some input which it
can't recognize then it generates error. In the above example, the delimiter is a blank space.
See for yourself that the lexical analyzer recognizes identifiers, numbers, brackets etc.
Syntax Analysis

• Error reporting and recovery

• Model using context free grammars
• Recognize using Push down automata/Table Driven Parsers
Syntax Analysis is modeled on the basis of context free grammars. Programming
languages can be written using context free grammars. Based on the rules of the
grammar, a syntax tree can be made from a correct code of the language. A code
written in a CFG is recognized using Push Down Automata. If there is any error in the
syntax of the code then an error is generated by the compiler. Some compilers also
tell that what exactly is the error, if possible.
Semantic Analysis
. Check semantics
. Error reporting
. Disambiguate overloaded operators
.Type coercion
. Static checking
- Type checking
- Control flow checking
- Unique ness checking
- Name checks

Semantic analysis should ensure that the code is unambiguous. Also it should do the type
checking wherever needed. Ex. int y = "Hi"; should generate an error. Type coercion can be
explained by the following example:- int y = 5.6 + 1; The actual value of y used will be 6 since it is
an integer. The compiler knows that since y is an instance of an integer it cannot have the value
of 6.6 so it down-casts its value to the greatest integer less than 6.6. This is called type coercion.
Code Optimization
• No strong counter part with English, but is similar to editing/précis writing
• . Automatically modify programs so that they
• - Run faster
• - Use less resources (memory, registers, space, fewer fetches etc.)
• . Some common optimizations
• - Common sub-expression elimination
• - Copy propagation
• - Dead code elimination
• - Code motion
• - Strength reduction
• - Constant folding
• . Example: x = 15 * 3 is transformed to x = 45
There is no strong counterpart in English, this is similar to precise writing where one cuts
down the redundant words. It basically cuts down the redundancy. We modify the compiled
code to make it more efficient such that it can - Run faster - Use less resources, such as
memory, register, space, fewer fetches etc.
Example of Optimizations
Code Generation
• . Usually a two step process
• Generate intermediate code from the semantic representation of the program
• Generate machine code from the intermediate code
• . The advantage is that each phase is simple
• . Requires design of intermediate language
• . Most compilers perform translation between successive intermediate
representations
• . Intermediate languages are generally ordered in decreasing level of abstraction
from highest (source) to lowest (machine)
• . However, typically the one after the intermediate code generation is the most
important
The final phase of the compiler is generation of the relocatable target code. First of all,
Intermediate code is generated from the semantic representation of the source program, and
this intermediate code is used to generate machine code.
Intermediate Code Generation
• Abstraction at the source level identifiers, operators, expressions, statements,
conditionals, iteration, functions (user defined, system defined or libraries)
• . Abstraction at the target level memory locations, registers, stack, opcodes,
addressing modes, system libraries, interface to the operating systems
• . Code generation is mapping from source level abstractions to target machine
abstractions
• . Map identifiers to locations (memory/storage allocation)
• . Explicate variable accesses (change identifier reference to relocatable/absolute
address
• . Map source operators to opcodes or a sequence of opcodes
• . Convert conditionals and iterations to a test/jump or compare instructions
• . Layout parameter passing protocols: locations for parameters, return values,
layout of activations frame etc.
• . Interface calls to library, runtime system, operating systems
By the very definition of an intermediate language it must be at a level of abstraction which is in
the middle of the high level source language and the low level target (machine) language. Design
of the intermediate language is important.

The IL should satisfy 2 main properties :

. easy to produce, and

. easy to translate into target language.

Thus it must not only relate to identifiers, expressions, functions & classes but also to opcodes,
registers, etc. Then it must also map one abstraction to the other.
These are some of the things to be taken care of in the intermediate code generation.
Post translation Optimizations

• Algebraic transformations and re-ordering

• - Remove/simplify operations like
• . Multiplication by 1
• . Multiplication by 0
• . Addition with 0
• - Reorder instructions based on
• . Commutative properties of operators
• . For example x+y is same as y+x (always?)

Instruction selection
• - Addressing mode selection
• - Opcode selection
• - Peephole optimization
Intermediate code generation

Code Generation
CMP Cx, 0 CMOVZ Dx,Cx
There is a clear intermediate code optimization - with 2 different sets of codes having 2 different parse trees. The optimized code
does away with the redundancy in the original code and produces the same result.

Swahili English Dictionary
100% (4)
Swahili English Dictionary
69 pages
Mastering IBM I Mcpress 2011 Ed1 PDF
100% (5)
Mastering IBM I Mcpress 2011 Ed1 PDF
953 pages
Beowulf PPT Presentation
100% (3)
Beowulf PPT Presentation
55 pages
2018 Com 414 (Compiler Construction)
100% (2)
2018 Com 414 (Compiler Construction)
79 pages
Properties of Verbs
No ratings yet
Properties of Verbs
2 pages
Unit-1 Introduction To Compilers: Goals of Translation
No ratings yet
Unit-1 Introduction To Compilers: Goals of Translation
22 pages
Introduction To Compiler
No ratings yet
Introduction To Compiler
24 pages
What Are Compilers?
No ratings yet
What Are Compilers?
20 pages
Compiler IITK
No ratings yet
Compiler IITK
512 pages
comp chapter 1
No ratings yet
comp chapter 1
31 pages
01 Introduction Annotated - PPTX 1
No ratings yet
01 Introduction Annotated - PPTX 1
63 pages
Introduction To Compilers1
No ratings yet
Introduction To Compilers1
47 pages
Expressive: Matches Our Notion of Languages (And Application?!) Redundant To Help Avoid Programming Errors
No ratings yet
Expressive: Matches Our Notion of Languages (And Application?!) Redundant To Help Avoid Programming Errors
45 pages
What Are Compilers?
No ratings yet
What Are Compilers?
52 pages
Compiler Design
100% (2)
Compiler Design
17 pages
Bedasa
No ratings yet
Bedasa
31 pages
Lecture 08 Language Translation PDF
No ratings yet
Lecture 08 Language Translation PDF
11 pages
Compiler Design Ch1
No ratings yet
Compiler Design Ch1
13 pages
Compiler Lecture 3 4 5
No ratings yet
Compiler Lecture 3 4 5
14 pages
CD Unit - 1 Lms Notes
No ratings yet
CD Unit - 1 Lms Notes
58 pages
Compiler Construction Week 1
No ratings yet
Compiler Construction Week 1
34 pages
Compiler Design
No ratings yet
Compiler Design
65 pages
Compli Er
No ratings yet
Compli Er
52 pages
Compiler Design.: Why To Learn About Compilers
No ratings yet
Compiler Design.: Why To Learn About Compilers
12 pages
Compiler Notes
No ratings yet
Compiler Notes
68 pages
Chapter 1 - Introduction
No ratings yet
Chapter 1 - Introduction
13 pages
Indian Institute of Information Technology, Bhagalpur: Assignment - 1
No ratings yet
Indian Institute of Information Technology, Bhagalpur: Assignment - 1
26 pages
CD All Units
No ratings yet
CD All Units
117 pages
Compiler Design Introduction
No ratings yet
Compiler Design Introduction
23 pages
Compilers
No ratings yet
Compilers
86 pages
Phases of Compiler
No ratings yet
Phases of Compiler
9 pages
Compiler Construction Notes
No ratings yet
Compiler Construction Notes
61 pages
Compiler Design
No ratings yet
Compiler Design
11 pages
CD Unit1 Notes
No ratings yet
CD Unit1 Notes
28 pages
compiler design unit I 2025
No ratings yet
compiler design unit I 2025
75 pages
CD Notes
No ratings yet
CD Notes
28 pages
Compiler Construction
No ratings yet
Compiler Construction
63 pages
compiler
No ratings yet
compiler
27 pages
Unit-1 Notes CD OU
No ratings yet
Unit-1 Notes CD OU
19 pages
Lecture 2
No ratings yet
Lecture 2
34 pages
Compiler Construction CSEC325 Token
No ratings yet
Compiler Construction CSEC325 Token
2 pages
Compiler Lecture-1
No ratings yet
Compiler Lecture-1
47 pages
CSC303 - Compiler Design - 060624
No ratings yet
CSC303 - Compiler Design - 060624
49 pages
Compiler Unit - 1 PDF
No ratings yet
Compiler Unit - 1 PDF
16 pages
CH1-1 and 1-2
No ratings yet
CH1-1 and 1-2
34 pages
Lecture 1 - Ch1. Introduction To Compiler
No ratings yet
Lecture 1 - Ch1. Introduction To Compiler
29 pages
CD Notes
No ratings yet
CD Notes
69 pages
Introduction To Compiler Development
No ratings yet
Introduction To Compiler Development
15 pages
2024 CSC305 Chapter 1
No ratings yet
2024 CSC305 Chapter 1
49 pages
CD Experiments 1,2
No ratings yet
CD Experiments 1,2
12 pages
Chapter1 Introduction
No ratings yet
Chapter1 Introduction
26 pages
Narayana Engineering College::Nellore: Department of Computer Science and Engineering
No ratings yet
Narayana Engineering College::Nellore: Department of Computer Science and Engineering
20 pages
Lec00 Outline
No ratings yet
Lec00 Outline
27 pages
Compiler Construction and Phases
No ratings yet
Compiler Construction and Phases
8 pages
Compiler Construction Chapter1
No ratings yet
Compiler Construction Chapter1
20 pages
Complier Design1
No ratings yet
Complier Design1
17 pages
Introduction To Compilation
No ratings yet
Introduction To Compilation
33 pages
1-Phases of Compiler
No ratings yet
1-Phases of Compiler
66 pages
Compiler 1
No ratings yet
Compiler 1
28 pages
Chapter-1[1]
No ratings yet
Chapter-1[1]
49 pages
Core Course Viii Compiler Design Unit I
No ratings yet
Core Course Viii Compiler Design Unit I
27 pages
INTRO TO COMPILERS
No ratings yet
INTRO TO COMPILERS
77 pages
COMPUTER PROGRAMMING FOR KIDS: An Easy Step-by-Step Guide For Young Programmers To Learn Coding Skills (2022 Crash Course for Newbies)
From Everand
COMPUTER PROGRAMMING FOR KIDS: An Easy Step-by-Step Guide For Young Programmers To Learn Coding Skills (2022 Crash Course for Newbies)
Dexter Rogers
No ratings yet
Compiler Design
From Everand
Compiler Design
Knowledge Flow
No ratings yet
Chapter 5 Intermidate Code Generation
No ratings yet
Chapter 5 Intermidate Code Generation
30 pages
Compl Chapter3
No ratings yet
Compl Chapter3
19 pages
Comp Chap2
No ratings yet
Comp Chap2
36 pages
Chapter 4 Semantic Analysis
No ratings yet
Chapter 4 Semantic Analysis
36 pages
Computer Graphics Comp565: Chapter - 2 Simple Drawing Algorithms
100% (1)
Computer Graphics Comp565: Chapter - 2 Simple Drawing Algorithms
42 pages
CH 1
No ratings yet
CH 1
15 pages
Chapter 3 AI of Emerging Technology
67% (3)
Chapter 3 AI of Emerging Technology
19 pages
Chapter 4 IOT of Emerging Technology
No ratings yet
Chapter 4 IOT of Emerging Technology
19 pages
Emerging Technology: Currently Developing, or That Are Expected To Be Available Within The Next Five To Ten Years
100% (2)
Emerging Technology: Currently Developing, or That Are Expected To Be Available Within The Next Five To Ten Years
42 pages
A Short History of Rohingya by U Ba Tha
No ratings yet
A Short History of Rohingya by U Ba Tha
32 pages
Hippo Wonderland Competition-April2024
No ratings yet
Hippo Wonderland Competition-April2024
14 pages
Modul Bing
No ratings yet
Modul Bing
56 pages
Reading Paragraph MC
No ratings yet
Reading Paragraph MC
6 pages
Đề kiểm tra cuối kỳ II
No ratings yet
Đề kiểm tra cuối kỳ II
8 pages
Richmond RLP
No ratings yet
Richmond RLP
1 page
Ed Sheeran Shape of You Marina Andrade
0% (1)
Ed Sheeran Shape of You Marina Andrade
3 pages
CV & Cover Letter Sample
No ratings yet
CV & Cover Letter Sample
3 pages
EXERCISES OF UNIT 10 - Sociolinguistic
No ratings yet
EXERCISES OF UNIT 10 - Sociolinguistic
2 pages
Syllabus: Board Position Test System Plan For Class 10 As Per PBCC 2022
No ratings yet
Syllabus: Board Position Test System Plan For Class 10 As Per PBCC 2022
7 pages
Instant Japanese How To Express 1 000 Different Ideas With Just 100 Key Words and Phrases00001 PDF
No ratings yet
Instant Japanese How To Express 1 000 Different Ideas With Just 100 Key Words and Phrases00001 PDF
10 pages
Grade 9 t3 Project t7
No ratings yet
Grade 9 t3 Project t7
13 pages
English Grade 5 Q2
33% (3)
English Grade 5 Q2
244 pages
Speech - The Power of Positive Thinking PDF
100% (1)
Speech - The Power of Positive Thinking PDF
2 pages
List of Verbs
No ratings yet
List of Verbs
10 pages
F2F FearofFailure WhatWouldyouDo
No ratings yet
F2F FearofFailure WhatWouldyouDo
7 pages
Nikolayev & Starostin - A North Caucasian Etymological Dictionary (1994)
No ratings yet
Nikolayev & Starostin - A North Caucasian Etymological Dictionary (1994)
1,408 pages
A1 PKC PKeyCompActivity Bank
No ratings yet
A1 PKC PKeyCompActivity Bank
20 pages
Antonim
No ratings yet
Antonim
4 pages
Five Day Lesson Plans PBS
No ratings yet
Five Day Lesson Plans PBS
20 pages
Motivate! 3 End-Of-Term Test Standard: Units 7 - 9
No ratings yet
Motivate! 3 End-Of-Term Test Standard: Units 7 - 9
6 pages
EHS Advisor - M. Tayyab
No ratings yet
EHS Advisor - M. Tayyab
4 pages
A Glossary of Indian Figures of Speech PDF
No ratings yet
A Glossary of Indian Figures of Speech PDF
128 pages
Mime PPT
No ratings yet
Mime PPT
20 pages
Passive Voice With Modals
No ratings yet
Passive Voice With Modals
5 pages
Honors Thesis - Natural Enmity in The Pancatantra
No ratings yet
Honors Thesis - Natural Enmity in The Pancatantra
53 pages