0% found this document useful (0 votes)
13 views

Compiler RNP SP Unit 4

System Programming

Uploaded by

404badshah404
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Compiler RNP SP Unit 4

System Programming

Uploaded by

404badshah404
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

Systems Programming Unit 4 :

Compilers

Prof. Reshma Pise


Computer Engg. Dept
Vishwakarma University
Unit 4 : Compiler Design

• Introduction to Compiling
• Phases in Compilation
• Lexical Analysis
• Syntax Analysis
– Context Free Grammars
– Top-Down Parsing,
– Bottom-Up Parsing,
– Ambiguity in Grammar

2
Prof. Reshma Pise
Translators

• Programs written in high-level languages need to be translated into


low-level (machine code) for processing and execution by the CPU.
This is done by a translator program.

• There are two types of translator program:


interpreters
compilers

3
Prof. Reshma Pise
COMPILERS
• A compiler is a program takes a program written in a source language
and translates it into an equivalent program in a target language.

source program COMPILER target program


( Normally a program written in ( Normally the equivalent program in
a high-level programming language) machine code – relocatable object file)

error messages

Prof. Reshma Pise 4


Compilers
A Compiler program translates the whole program into a machine code
version that can be run without the compiler being present.

Advantage: program runs fast as


already in machine code, translator
program only needed at the time of
compiling

Disadvantage: slow to compile as


whole program translated

Prof. Reshma Pise 5


Interpreters
Interpreter program translates HLL
code into machine code one line at a
time.

Advantage: easy to find errors, better


for learners

Disadvantage: program runs slow as


have to be continually interpreted,
interpreter program always in memory
to interpret program.

Prof. Reshma Pise 6


Major Parts of Compilers
• There are two major parts of a compiler: Analysis and Synthesis
Analysis - > Front End
Synthesis - > Back end
• In analysis phase, an intermediate representation is created from the
given source program.

• In synthesis phase, the equivalent target program is created from this


intermediate representation.

Prof. Reshma Pise 7


Phases of A Compiler

Source Lexical Syntax Semantic Intermediate Code Code Target


Program Analyzer Analyzer Analyzer Code Generator Optimizer Generator Program

• Each phase transforms the source program from one representation


into another representation.

• They communicate with error handlers.

• They communicate with the symbol table.

Prof. Reshma Pise 8


1. Lexical Analyzer
• Lexical Analyzer reads the source program character by character and
returns the tokens of the source program.
• A token describes a pattern of characters having same meaning in the
source program. (such as identifiers, operators, keywords, numbers,
delimeters and so on)
Ex: newval := oldval + 12 => tokens: newval identifier
:= assignment operator
oldval identifier
+ add operator
12 a number

• Puts information about identifiers into the symbol table.


• Regular expressions are used to describe tokens (lexical constructs).
• A (Deterministic) Finite State Automaton can be used in the
implementation of a lexical analyzer.
Prof. Reshma Pise 9
2. Syntax Analyzer
• A Syntax Analyzer creates the syntactic structure (generally a parse
tree) of the given program.
• A syntax analyzer is also called as a parser.
• A parse tree describes a syntactic structure.
assgstmt

identifier := expression • In a parse tree, all terminals are at leaves.

newval expression + expression • All inner nodes are non-terminals in


a context free grammar.
identifier number

oldval 12

Prof. Reshma Pise 10


Syntax Analyzer
• The syntax of a language is specified by a context free grammar (CFG).
• The rules in a CFG are mostly recursive.
• A syntax analyzer checks whether a given program satisfies the rules
implied by a CFG or not.
– If it satisfies, the syntax analyzer creates a parse tree for the given program.

• Ex: We use BNF (Backus Naur Form) to specify a CFG


assgstmt -> identifier := expression
expression -> identifier
expression -> number
expression -> expression + expression

Prof. Reshma Pise 11


Syntax Analyzer versus Lexical
Analyzer
• Which constructs of a program should be recognized by the lexical
analyzer, and which ones by the syntax analyzer?

– Both of them do similar things; But the lexical analyzer deals with the simple non-
recursive constructs of the language.

– The syntax analyzer deals with the recursive constructs of the language.

– The lexical analyzer simplifies the job of the syntax analyzer.

– The lexical analyzer recognizes the smallest meaningful units (tokens) in a source
program.

– The syntax analyzer works on the smallest meaningful units (tokens) in a source
program to recognize meaningful structures in our programming language.
Prof. Reshma Pise 12
Parsing Techniques
• Depending on how the parse tree is created, there are different
parsing techniques.
• These parsing techniques are categorized into two groups:
– Top-Down Parsing,
– Bottom-Up Parsing
• Top-Down Parsing:
– Construction of the parse tree starts at the root, and proceeds towards the
leaves.
– Efficient top-down parsers can be easily constructed by hand.
– Recursive Predictive Parsing, Non-Recursive Predictive Parsing (LL Parsing).
• Bottom-Up Parsing:
– Construction of the parse tree starts at the leaves, and proceeds towards the
root.
– Normally efficient bottom-up parsers are created with the help of some software
tools.
– Bottom-up parsing is also known as shift-reduce parsing.
– Operator-Precedence Parsing –Prof.simple,
Reshma Pise restrictive, easy to implement 13
3. Semantic Analyzer
• A semantic analyzer checks the source program for semantic errors
and collects the type information for the code generation.
• Type-checking is an important part of semantic analyzer.
• Context-free grammars used in the syntax analysis are integrated with
attributes (semantic rules)
– the result is a syntax-directed translation,
– Attribute grammars
• Ex:
newval := oldval + 12

• The type of the identifier newval must match with type of the expression
(oldval+12)

Prof. Reshma Pise 14


4. Intermediate Code Generation
• A compiler may produce an explicit intermediate codes representing
the source program.
• These intermediate codes are generally machine (architecture)
independent. But the level of intermediate codes is closer to the level
of machine codes.
• Ex:
newval := oldval * fact + 1

id1 := id2 * id3 + 1

temp1 := id2 * id3 Intermediates Codes (three address code)


temp2 := temp1 + 1
id1 := temp2

Prof. Reshma Pise 15


Intermediate Code Generation
• Properties of IR-
Easy to produce
Easy to translate into the target program
• IR can be in the following forms-
Syntax trees
Postfix notation
Three address statements
• Properties of three address statements-
At most one operator in addition to an assignment operator
Must generate temporary names to hold the value computed at
each
instruction
Some instructions have fewer than three operands
Prof. Reshma Pise 16
5. Code Optimizer (for Intermediate
Code Generator)
• The code optimizer optimizes the code produced by the intermediate
code generator in the terms of time and space.

• Ex:
temp1 := id2 * id3
id1 := temp1 + 1

Prof. Reshma Pise 17


6. Code Generator
• Produces the target language in a specific architecture.
• The target program is normally a relocatable object file containing the
machine code or assembly code.
• Intermediate instructions are each translated into a sequence of
machine instructions that perform the same task.
• Ex:
( assume that we have an architecture with instructions whose at least one of its operands is
a machine register)

MOVE id2,R1
MULT id3,R1
ADD #1,R1
MOVE R1,id1

Prof. Reshma Pise 18


Phases of compiler

Prof. Reshma Pise 19


Prof. Reshma Pise 20
The Structure of a Compiler

Code Generator
[Intermediate Code Generator]

Non-optimized Intermediate Code


Scanner
[Lexical Analyzer]

Tokens

Code Optimizer
Parser
[Syntax Analyzer]
Optimized Intermediate Code
Parse tree

Code Genrator
Semantic Process
[Semantic analyzer] Target machine code

Abstract Syntax Tree w/ Attributes

Prof. Reshma Pise 21


The input program as you see it.

main ()
{
int i,sum;
sum = 0;
for (i=1; i<=10; i++);
sum = sum + i;
printf("%d\n",sum);
}

Prof. Reshma Pise 22


Prof. Reshma Pise 23
Prof. Reshma Pise 24
Prof. Reshma Pise 25
Prof. Reshma Pise 26
Role of the Lexical Analyzer

Prof. Reshma Pise 27


Tasks of Lexical Analyzer
• Reads source text and detects the token

• Stripe out comments, white spaces, tab, newline characters.

• Correlates error messages from compilers to source program

Approaches to implementation
. Use assembly language- Most efficient but most difficult to implement

. Use high level languages like C- Efficient but difficult to implement

. Use tools like lex, flex- Easy to implement but not as efficient as the first
two cases
Prof. Reshma Pise 28
Prof. Reshma Pise 29
Prof. Reshma Pise 30
Prof. Reshma Pise 31
Prof. Reshma Pise 32
Prof. Reshma Pise 33
Prof. Reshma Pise 34
Prof. Reshma Pise 35
lex Programming Utility
General Information:
• Input is stored in a file with *.l extension
• File consists of three main sections
• lex generates C function stored in lex.yy.c

Using lex:
1) Specify words to be used as tokens (Extension of regular
expressions)
2) Run the lex utility on the source file to generate yylex(
), a C function
3) Declares global variables char* yytext and int yyleng
lex Programming Utility
Three sections of a lex input file:

/* C declarations and #includes lex definitions */


%{ #include “header.c”
int i; }%
%%
/* lex patterns and actions */
{INT} {sscanf (yytext, “%d”, &i);
printf(“INTEGER\n”);}
%%
/* C functions called by the above actions */
{ yylex(): }
flex - fast lexical analyzer generator
• Flex is a tool for generating scanners.
• Flex source is a table of regular expressions and corresponding
program fragments.
• Generates lex.yy.c which defines a routine yylex()
Format of the Input File
• The flex input file consists of three sections, separated by a line with
just %% in it:
definitions
%%
rules
%%
user code
Definitions Section
• The definitions section contains declarations of simple name
definitions to simplify the scanner specification.
• Name definitions have the form:
name definition
• Example:
DIGIT [0-9]
ID [a-z][a-z0-9]*
Rules Section

• The rules section of the flex input contains a series of rules


of the form:
pattern action
• Example:
{ID} printf( "An identifier: %s\n", yytext );

• The yytext and yylength variable.


• If action is empty, the matched token is discarded.
Action
• If the action contains a ‘{‘, the action spans till the balancing ‘}‘ is
found, as in C.
• An action consisting only of a vertical bar ('|') means "same as the
action for the next rule.“
• The return statement, as in C.
• In case no rule matches: simply copy the input to the standard output
(A default rule).
Precedence Problem

• For example: a “<“ can be matched by “<“ and “<=“.


• The one matching most text has higher precedence.
• If two or more have the same length, the rule listed first in
the flex input has higher precedence.
User Code Section
• The user code section is simply copied to lex.yy.c verbatim.
• The presence of this section is optional; if it is missing, the second %%
in the input file may be skipped.
• In the definitions and rules sections, any indented text or text enclosed
in %{ and %} is copied verbatim to the output (with the %{}'s
removed).
A Simple Example

%{
int num_lines = 0, num_chars = 0;
%}

%%
\n ++num_lines; ++num_chars;
. ++num_chars;

%%
main() {
yylex();
printf( "# of lines = %d, # of chars = %d\n",
num_lines, num_chars );
}
Syntax Analyzer

Introduction to the parser

•Context-free grammars
•Writing a grammar

• Using ambiguous grammars

Prof. Reshma Pise 46


Syntax Analyzer
• Syntax Analyzer creates the syntactic structure of the given source program.
• This syntactic structure is mostly a parse tree.
• Syntax Analyzer is also known as parser.
• The syntax of a programming is described by a context-free grammar (CFG). We will
use BNF (Backus-Naur Form) notation in the description of CFGs.
• The syntax analyzer (parser) checks whether a given source program satisfies the
rules implied by a context-free grammar or not.
– If it satisfies, the parser creates the parse tree of that program.
– Otherwise the parser gives the error messages.
• A context-free grammar
– gives a precise syntactic specification of a programming language.
– the design of the grammar is an initial phase of the design of a compiler.
– a grammar can be directly converted into a parser by some tools.

Prof. Reshma Pise 47


Prof. Reshma Pise 48
Prof. Reshma Pise 49
Prof. Reshma Pise 50
Prof. Reshma Pise 51
Prof. Reshma Pise 52
Prof. Reshma Pise 53
Prof. Reshma Pise
54
Prof. Reshma Pise 55
Prof. Reshma Pise 56
Prof. Reshma Pise 57
Prof. Reshma Pise 58
Prof. Reshma Pise 59
Prof. Reshma Pise 60
Prof. Reshma Pise 61
Prof. Reshma Pise 62
Prof. Reshma Pise 63
Prof. Reshma Pise 64
Parsers (cont.)
• We categorize the parsers into two groups:

1. Top-Down Parser
– the parse tree is created top to bottom, starting from the root.
2. Bottom-Up Parser
– the parse is created bottom to top; starting from the leaves

• Both top-down and bottom-up parsers scan the input from left to
right (one symbol at a time).
• Efficient top-down and bottom-up parsers can be implemented only
for sub-classes of context-free grammars.
– LL for top-down parsing
– LR for bottom-up parsing

Prof. Reshma Pise 65


Ambiguity (cont.)
• For the most parsers, the grammar must be unambiguous.

• unambiguous grammar
➔ unique selection of the parse tree for a sentence

• We should eliminate the ambiguity in the grammar during the design


phase of the compiler.
• An unambiguous grammar should be written to eliminate the
ambiguity.
• We have to prefer one of the parse trees of a sentence (generated by
an ambiguous grammar) to disambiguate that grammar to restrict to
this choice.
Prof. Reshma Pise 66
Ambiguity (cont.)

stmt → if expr then stmt |


if expr then stmt else stmt | otherstmts

if E1 then if E2 then S1 else S2

stmt stmt

if expr then stmt else stmt if expr then stmt

E1 if expr then stmt S2 E1 if expr then stmt else stmt

E2 S1 E2 S1 S2
1 2
Prof. Reshma Pise 67
Ambiguity (cont.)

• We prefer the second parse tree (else matches with closest if).
• So, we have to disambiguate our grammar to reflect this choice.

• The unambiguous grammar will be:


stmt → matchedstmt | unmatchedstmt

matchedstmt → if expr then matchedstmt else matchedstmt | otherstmts

unmatchedstmt → if expr then stmt |


if expr then matchedstmt else unmatchedstmt

Prof. Reshma Pise 68


Thank You

Prof. Reshma Pise 69

You might also like