0% found this document useful (0 votes)
4 views

1-Structure and Phases of a Compiler-19!07!2024 (1)

The document outlines the objectives, outcomes, and syllabus for a Compiler Design course (BCSE307L) at Vellore Institute of Technology. It covers topics such as lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code optimization, and code generation, along with the roles of compilers, interpreters, and assemblers. The course aims to equip students with the skills necessary for compiler construction and optimization techniques.

Uploaded by

karthik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

1-Structure and Phases of a Compiler-19!07!2024 (1)

The document outlines the objectives, outcomes, and syllabus for a Compiler Design course (BCSE307L) at Vellore Institute of Technology. It covers topics such as lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code optimization, and code generation, along with the roles of compilers, interpreters, and assemblers. The course aims to equip students with the skills necessary for compiler construction and optimization techniques.

Uploaded by

karthik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 99

BCSE307L – COMPILER

DESIGN

Dr. M. Bhuvaneswari
Assistant Professor Sr. Grade 2
School of Computer Science and
Engineering
Vellore Institute of Technology Vellore
Objectives
• To provide fundamental knowledge of various language
translators.
• To make students familiar with lexical analysis and
parsing techniques.
• To understand the various actions carried out in
semantic analysis.
• To make the students get familiar with how the
intermediate code is generated.
• To understand the principles of code optimization
techniques and code generation.
• To provide foundation for study of high-performance
compiler design.
Outcomes
• Apply the skills on devising, selecting, and using tools
and techniques towards compiler design
• Develop language specifications using context free
grammars (CFG).
• Apply the ideas, the techniques, and the knowledge
acquired for the purpose of developing software
systems.
• Constructing symbol tables and generating
intermediate code.
• Obtain insights on compiler optimization and code
generation
Syllabus
• Module: 1 Introduction to Compilation and Lexical Analysis 7 hours
• Introduction to LLVM - Structure and Phases of a Compiler-Design Issues-
Patterns Lexemes-Tokens-Attributes-Specification of Tokens-Extended
Regular Expression- Regular expression to Deterministic Finite Automata
(Direct method) - Lex - A Lexical Analyzer Generator
• Module: 2 Syntax Analysis 8 hours
• Role of Parser- Parse Tree - Elimination of Ambiguity – Top Down Parsing
– Recursive Descent Parsing - LL (1) Grammars – Shift Reduce Parsers-
Operator Precedence Parsing - LR Parsers, Construction of SLR Parser
Tables and Parsing- CLR Parsing- LALR Parsing
• Module: 3 Semantic Analysis 5 hours
• Syntax Directed Definition – Evaluation Order - Applications of Syntax
Directed Translation - Syntax Directed Translation Schemes -
Implementation of L-attributed Syntax Directed Definition
• Module: 4 Intermediate Code Generation 5 hours
• Variants of Syntax trees - Three Address Code- Types – Declarations -
Procedures - Assignment Statements - Translation of Expressions -
Control Flow - Back Patching- Switch Case Statements.
Cont..
• Module: 5 Code Optimization 6 hours
• Loop optimizations- Principal Sources of Optimization -Introduction to
Data Flow Analysis - Basic Blocks - Optimization of Basic Blocks -
Peephole Optimization- The DAG Representation of Basic Blocks -Loops in
Flow Graphs - Machine Independent Optimization Implementation of a
naïve code generator for a virtual Machine- Security checking of virtual
machine code
• Module: 6 Code Generation 5 hours
• Issues in the design of a code generator- Target Machine- Next-Use
Information – Register Allocation and Assignment- Runtime Organization-
Activation Records.
• Module: 7 Parallelism 7 hours
• Parallelization- Automatic Parallelization- Optimizations for Cache
Locality and Vectorization- Domain Specific Languages-Compilation-
Instruction Scheduling and Software Pipelining- Impact of Language
Design and Architecture Evolution on Compilers Static Single Assignment
• Module: 8 Contemporary Issues 2 hours
Text Books & References
• Text Book
• A. V. Aho, Monica S. Lam, Ravi Sethi and Jeffrey D. Ullman, Compilers:
Principles, techniques, & tools, 2007, Second Edition, Pearson Education,
Boston.

• Reference Books
• Watson, Des. A Practical Approach to Compiler Construction. Germany,
Springer International Publishing, 2017
Content - Module -1
• Introduction to Compilation And Lexical
Analysis
• Introduction to LLVM
• Structure and Phases of a Compiler
• Design Issues
• Patterns Lexemes
• Tokens-Attributes
• Specification of Tokens
• Extended Regular Expression
• Regular expression to Deterministic Finite Automata (Direct
method)
• Lex - A Lexical Analyzer Generator
Translator
• A translator is a program that takes one form of
program as input and converts it into another
form.
• Types of translators are:
1. Compiler Source Translator Target
Program Program
2. Interpreter
3. Assembler
Error
Messages
Compiler
• A compiler is a program that reads a program written
in source language and translates it into an equivalent
program in target language.

void main() 0000 1100 0010


{ 0100
int a=1,b=2,c; Compiler 0111 1000 0001
c=a+b; 1111 0101 1110
printf(“%d”,c); 1100 0000 1000
} 1011

Source Error Target


Program Messages Program
Interpreter
• Interpreter is also program that reads a program
written in source language and translates it into an
equivalent program in target language line by line

void main() 0000 1100 0010


{ 0000
int a=1,b=2,c; Interpreter 1111 1100 0010
c=a+b; 1010 1100 0010
printf(“%d”,c); 0011 1100 0010
} 1111

Source Error Target


Program Messages Program
Difference between compiler &
interpreter
Compiler Interpreter
Scans the entire program and translates it It translates program’s one statement at a
as a whole into machine code. time.
It generates intermediate code. It does not generate intermediate code.
An error is displayed after entire program An error is displayed for every instruction
is checked. interpreted if any.
Memory requirement is more. Memory requirement is less.
Example: C compiler Example: Basic, Python, Ruby
Assembler
• Assembler is a translator which takes the assembly
code as an input and generates the machine code as an
output.
MOV id3, R1 0000 1100 0010
MUL #2.0, R1 0100
MOV id2, R2 0111 1000 0001
MUL R2, R1 Assembler 1111 0101 1110
MOV id1, R2 1100 0000 1000
ADD R2, R1 1011
MOV R1, id1 1100 0000 1000
Error
Assembly Code Messages Machine Code
Context of Compiler
(Cousins of compiler)
Context of compiler (Cousins of compiler)
Source Program
• In addition to compiler, many other
system programs are required to Preprocessor
generate absolute machine code. Modified Source
Program
• These system programs are: Compiler

Target Assembly
• Preprocessor Program
• Assembler Assembler
• Linker Relocatable Object
• Loader Code
Libraries & Linker / Loader
Object Files

Absolute Machine
Code
Context of compiler (Cousins of compiler)
Source Program
Preprocessor
 Some of the task performed by Preprocessor

preprocessor: Modified Source


Program
1. Macro processing: Allows user to define Compiler
macros. Ex: #define PI
3.14159265358979323846 Target Assembly
Program
2. File inclusion: A preprocessor may include Assembler
the header file into the program. Ex:
Relocatable Object
#include<stdio.h> Code
Libraries & Linker / Loader
Object Files

Absolute Machine
Code
Context of compiler (Cousins of compiler)
Skeletal Source Program
Compiler
 A compiler is a program that reads a program Preprocessor

written in source language and translates it into an Source


Program
equivalent program in target language.
Compiler

Target Assembly
Program
Assembler

Relocatable Object
Code
Libraries & Linker / Loader
Object Files

Absolute Machine
Code
Context of compiler (Cousins of compiler)
Skeletal Source Program
Assembler
 Assembler is a translator which takes the assembly Preprocessor

program (mnemonic) as an input and generates Source


Program
the machine code as an output.
Compiler

Target Assembly
Program
Assembler

Relocatable Object
Code
Libraries & Linker / Loader
Object Files

Absolute Machine
Code
Context of compiler (Cousins of compiler)
Skeletal Source Program
Linker
 Linker makes a single program from a several files Preprocessor

of relocatable machine code. Pure HLL

 These files may have been the result of several


Compiler
different compilation, and one or more library files.
Target Assembly
Loader Program
Assembler
 The process of loading consists of:
 Taking relocatable machine code Relocatable Object
Code
 Altering the relocatable address Libraries & Linker / Loader
 Placing the altered instructions and data in Object Files
memory at the proper location.
Absolute Machine
Code
Linker
Cont.,
Analysis Synthesis model of
compilation
• There are two parts of compilation.
1. Analysis Phase
2. Synthesis Phase

void main() Analysis Synthesis


{ Phase Phase 0000 1100
int a=1,b=2,c; 0111 1000
c=a+b; 0001
printf(“%d”,c); Intermediate 1111 0101
} Representation 1000
1011
Source Code Target Code
Analysis phase & Synthesis phase
Analysis Phase Synthesis Phase
• Analysis part breaks up the • The synthesis part
source program into
constituent pieces and creates
constructs the desired
an intermediate target program from the
representation of the source intermediate
program. representation.
• Analysis phase consists of • Synthesis phase consist of
three sub phases: the following sub phases:
1. Lexical analysis
1. Code optimization
2. Syntax analysis
3. Semantic analysis
2. Code generation
Phases of compiler
Compiler

Analysis phase Synthesis phase

Lexical analysis
Intermediate Code
code optimization
Syntax analysis generation
Code
Semantic analysis generation
Lexical analysis
• Lexical Analysis is also called linear analysis
or scanning. Position = initial + rate*60
• Lexical Analyzer divides the given source
statement into the tokens.
Lexical analysis
• Ex: Position = initial + rate * 60 would be
grouped into the following tokens:
<id,1><=><id,2><+><id,3><*><60>
Position (identifier) <id,1>
Reads the stream of char
1 – points to an entry in the symbol table for this making up the source
token program &
= (Assignment symbol) <=>
Lexical analyzer groups the
initial (identifier) characters into meaningful
sequences called lexemes.
+ (Plus symbol)
rate (identifier) For each lexeme, the lexical
analyzer produces as output
* (Multiplication symbol) a token of the form
<token-name, attribute-
Phases of Compiler
Compiler

Analysis phase Synthesis phase

Lexical analysis
Intermediate Code
code optimization
Syntax analysis generation
Code
Semantic analysis generation
Syntax analysis
Position = initial + rate*60
• Syntax Analysis is also called Parsing or
Lexical analysis
Hierarchical Analysis.
id1 = id2 + id3 *
• It takes token produced by lexical analyzer as 60
Input & generates the parse tree.
Syntax analysis
• Matching of parenthesis.
• The syntax analyzer checks each line of the =
code and spots every tiny mistake.
id1 +
• If code is error free then syntax analyzer
generates the tree. id2 *
id3 60
Phases of compiler
Compiler

Analysis phase Synthesis phase

Lexical analysis
Intermediate Code
code optimization
Syntax analysis generation
Code
Semantic analysis generation
Semantic analysis
• Semantic analyzer determines the =
meaning of a source string. id1 +
• It performs following operations: id2 * int to
1. Type checking, Coercions. float
id3 60
2. Array index should be int. Typecasting
3. Performing arithmetic operation that
Semantic analysis
are type compatible.
4. Checking the scope of operation. =
*Note: Consider id1, id2 and id3 are real
id1 +

id2 *
id3 inttofloat

60
Phases of compiler
Compiler

Analysis phase Synthesis phase

Lexical analysis
Intermediate Code
code optimization
Syntax analysis generation
Code
Semantic analysis generation
Intermediate code generator
• Two important properties of =
intermediate code : id1 +
1. It should be easy to produce.
id2 *
2. Easy to translate into target
t3 id3 inttofloat
program. t2 t1
60
• Intermediate form can be represented
Intermediate code
using “three address code”.
• Three address code consist of a t1= inttofloat(60)
sequence of instruction, each of t2= id3 * t1
t3= t2 + id2
which has at most three operands. id1= t3
Phases of compiler
Compiler

Analysis phase Synthesis phase

Lexical analysis
Intermediate Code
code optimization
Syntax analysis generation
Code
Semantic analysis generation
Code optimization
• It improves the intermediate code.
• This is necessary to have a faster Intermediate code
execution of code or less
t1= int to real(60)
consumption of memory. t2= id3 * t1
t3= t2 + id2
id1= t3

Code optimization

t1= id3 * 60.0


id1 = id2 + t1
Phases of compiler
Compiler

Analysis phase Synthesis phase

Lexical analysis
Intermediate Code
code optimization
Syntax analysis generation
Code
Semantic analysis generation
Code generation
• The intermediate code instructions
are translated into sequence of Code optimization
machine instruction.
t1= id3 * 60.0
id1 = id2 + t1

Code generation

MOV id3, R2
MUL #60.0, R2
MOV id2, R1
ADD R2,R1
MOV R1, id1

Id3R2
Id2R1
Phases of compiler
Source
program
Analysis
Lexical analysis Phase

Syntax analysis

Semantic
analysis Error
Symbol
table detection
Intermediate
and recovery
code

Code
optimization

Code Synthesis
generation Phase
Target
Program
Symbol table
• Symbol table are data structures that are used by compilers to
hold information about source-program constructs.
• It is created and maintained by compiler.
• It is used to store information about the occurrences of various
entities such as, variable names, functions, objects, classes,
etc.,
• All these information are collected incrementally by analysis
phase and used by synthesis phase to generate target code.
• Symbol table is used for the following purposes
• It is used to store the name of all the entities in a structured form at
one place.
• It is used to verify if a variable has been declared.
• It is used to determine the scope of a name.
• It is used to implement type checking by verifying assignments and
expression in the source code are semantically correct.
Cont.,
• Symbol table can be a linear (Linked list) or hash table.
• Role of each phases of compiler with respect to symbol table.
• Lexical Analysis - Create new entry for new identifiers
• Syntax analysis - Add attributes information such as
type, dimension, scope etc.
• Semantic analysis - Check semantics and update the
information, if needed.
• ICG - Based on the available information in the symbol
table, add temporary variable information.
• Code optimization - As per the available information in
the symbol table, code optimization is done as per the
address and aliased information.
• TCG - Generate target code as per the identifier’s
address info that are present in the symbol table
Cont.,
• Example
int coursecode; - Line 1
char name[]=“Compiler”; - Line 2
printf(“%d”,coursecode) - Line 3
Name Type Size Dimensio LOD LO Addres
n U s
coursecode int 4 0 1 3 2024
name char 8 1 2 10 3056
Cont.,
Exercise 1
• Write output of all the phases of compiler for following
statements:
1. x = b-c*2
2. I=p*n*r/100
Grouping of Phases
Front end & back end (Grouping of
phases)
Front end
• Depends primarily on source language and largely independent of the target machine.
• It includes following phases:
1. Lexical analysis
2. Syntax analysis
3. Semantic analysis
4. Intermediate code generation
5. Creation of symbol table

Back end
 Depends on target machine and do not depends on source program.
 It includes following phases:
1. Code optimization
2. Code generation phase
3. Error handling and symbol table operation
Pass structure
Pass structure
• Several phases are grouped into pass that reads an input file and
writes an output file.
• One complete scan of a source program is called pass.
• In a single pass compiler, analysis of source statement is
immediately followed by synthesis - equivalent target statement.
• While in a two pass compiler intermediate code is generated
between analysis and synthesis phase.
• Some compiler collection have been created around carefully
designed intermediate representations that allow the front end for
a particular language to interface with the back end for a certain
target machine.
• With these collections, we can produce compilers for different
target machines
Pass structure
It is difficult to compile the source program into single pass due
to: forward reference.
Forward reference: A forward reference of a program entity is
a reference to the entity which precedes its definition in
the program.
• This problem can be solved by postponing the generation of
target
Pass I: code until more information concerning the entity
becomes available.

•Pass
It II:
leads to multi pass model of compilation.
Types of compiler
Types of compiler
1. One pass compiler - Turbo Pascal
• It is a type of compiler that compiles whole process in one-pass.
2. Two pass compiler
• It is a type of compiler that compiles whole process in two-pass.
• It generates intermediate code.
3. Incremental compiler
• The compiler which compiles only the changed line from the source code and
update the object code.
4. Native code compiler
• The compiler used to compile a source code for a same type of platform only.
5. Cross compiler
• The compiler used to compile a source code for a different kinds
platform.
Token, Pattern &
Lexemes
Interaction of scanner & parser
Toke
Source Lexical n
Parser
Progra Analyzer
m Get next
token

Symbol Table

• Upon receiving a “Get next token” command from parser, the


lexical analyzer reads the input character until it can identify
the next token.
• Lexical analyzer also stripping out comments and white space
in the form of blanks, tabs, and newline characters from the
Why to separate lexical analysis &
parsing?
1. Simplicity in design.
2. Improves compiler efficiency.
3. Enhance compiler portability.
Token, Pattern & Lexemes
Token Pattern
The set of rules called pattern associated
Sequence of character
with a token.
having a collective meaning
Example: “non-empty sequence of digits”,
is known as token. “letter followed by letters and digits”
Token – <Rate, int,
identifier> Lexemes
Categories of Tokens:
The sequence of character in a source
1.Identifier
program matched with a pattern for a
2.Keyword
token is called lexeme.
3.Operator
Example: Rate, DIET, count, Flag
4.Special symbol
5.Constant
Example: Token, Pattern & Lexemes
Example: total = sum + 45
Tokens:
total Identifier1

= Operator1
Tokens
sum Identifier2

+ Operator2

45 Constant1

Lexemes
Lexemes of identifier: total, sum
Lexemes of operator: =, +
Lexemes of constant: 45
Example: Token, Pattern & Lexemes
C code:
printf("Total = %d\n", score);
 printf and score are lexemes matching the pattern for token
id,
 "Total = %d\n" is a lexeme matching literal
In many programming languages, the following classes
cover most or all of the tokens:
Attributes of Tokens
 When more than one lexeme can match a pattern, the lexical analyzer
must provide the subsequent compiler phases additional information
about the particular lexeme that matched.
 For example, the pattern for token number matches both 0 and 1
 Thus, in many cases the lexical analyzer returns to the parser not
only a token name, but an attribute value that describes the lexeme
represented by the token;
 Normally, information about an identifier e.g., its lexeme, its type,
and the location at which it is first found is kept in the symbol
table. Thus, the appropriate attribute value for an identifier is a
Attributes of Tokens
The token names and associated attribute values for the Fortran
statement
E = M * C ** 2
are written below as a sequence of pairs.
<id, pointer to symbol-table entry for E>
<assign op>
<id, pointer to symbol-table entry for M>
<mult op>
<id, pointer to symbol-table entry for C>
<exp op>
<float> <id, limitedSquaare> <(> <id, x> <)> <{>
<float> <id, x>
<return> <(> <id, x> <op,"<="> <num, -10.0> <op, "||"> <id, x> <op, ">="> <num, 10.0>
<)> <op, "?"> <num, 100> <op, ":"> <id, x> <op, "*"> <id, x> <}>
Specification of tokens

Regular Expressions are important


notation for specifying lexeme
patterns.
Strings and Languages
• Symbols – a,..,z,A,...,Z,0,...,9,*,&,@,#
• Eg.: Letters, digits, punctuations
• Alphabets (S)
• A finite, non-empty set of symbols
Example
a) S = {0, 1}, the binary alphabet
b) S = {a, b, … , z}, the set of all lower-case letter
c) The set of all ASCII characters
d) Unicode – 1,00,000 characters
59
Strings and Languages
• Strings
• A string (or word) is a finite sequence of symbols chosen from
some alphabet
• Example
• 1011 is a string from binary alphabet ⅀= {0, 1}
• Empty string, e  a string with zero occurrences of symbols
• Length |w| of string w  the number of positions for symbols in w
• Examples:|0111|= 4, | e |=0, …
60
Strings and Languages - Parts of strings
Term Definition
Prefix of s A string obtained by removing zero or more
trailing symbol of string S.
e.g., ban is prefix of banana.
Suffix of S A string obtained by removing zero or more
leading symbol of string S.
e.g., nana is suffix of banana.
Sub string of S A string obtained by removing prefix and suffix
from S.
Proper prefix, suffix Any
e.g.,nonempty string x thatofis banana
nan is substring respectively proper prefix, suffix or
and substring of S substring of S, such that s≠x.

Subsequence of A string obtained by removing zero or more not


S necessarily contiguous symbol from S.
e.g., baaa is subsequence of banana.
Strings and Languages
• Power of an alphabet, Sk • Concatenation of two strings x
• A set of all strings of length k and y is xy
• Examples • Example
• Given S = {0, 1}, we have
• If x = 01101, y =
S0 = {e}, S2 = {00, 01,
110,
10, 11}
• then xy = 01101110,
• Set of all strings over S  denoted as S*
and xx = x2 =
S* = S0 ∪ S1 ∪ S2 ∪ … 0110101101, …
• Set of nonempty strings from S  S+
• e is the identity for concatenation
S+ = S*  {e}
• since ew = we = w
S = S ∪S ∪S ∪…
+ 1 2 3
R
• Notations of Languages
• A language is a set of strings all chosen from some S*
• If S is an alphabet, and LS*, then L is a language over S.
• Examples
• The set of all legal English words is a language  {a, b,
c, …., z}
• The set of all strings of n 0’s followed by n 1’s for n 0  {e,
01, 0011, 000111, …}
• The set of strings of equal no. of 0’s and 1’s for n 0  {e, 01,
10, 0011, 0101,…}
• S* is an infinite language for any alphabet S
•  = the empty language (not the empty string e) is a language
over any alphabet
Operations on languages
Operation Definition
Union of L and
M
Written L U M
Concatenation
of L and M
Written LM
Kleene closure
of L
Written
Positive closureL of L

Written L+
Regular Expression &
Regular Definition
Regular expression
• A regular expression is a sequence of characters that define
a pattern.
Notational shorthand's
1. One or more instances: +
2. Zero or more instances: *
3. Zero or one instances: ?
4. Alphabets: Σ
Rules to define regular expression
1. is a regular expression that denotes , the set containing empty
string.
2. If is a symbol in then is a regular expression,
3. Suppose and are regular expression denoting the languages
and . Then,
a. is a regular expression denoting
b. is a regular expression denoting
c. * is a regular expression denoting
d. is a regular expression denoting

The language denoted by regular expression is said to be a


regular set.
Regular expression
a*
• L = Zero or More Occurrences of a =

*
𝜖
a
aa Infinite
aaa
aaa …..
a
aaaaa
…..
Regular expression

+
a +
• L = One or More Occurrences of a =

a
aa
aaa Infinite …..
aaaa
aaaaa…..
Precedence and associativity of operators
Operator Precedence Associative
Kleene * 1 left
Concatenation 2 left
Union | 3 left
Regular expression examples
1. 0 or 1
𝐒𝐭𝐫𝐢𝐧𝐠𝐬 :𝟎 ,𝟏𝐑 . 𝐄 .=𝟎∨𝟏
2. 0 or 11 or 111
𝐒𝐭𝐫𝐢𝐧𝐠𝐬 :𝟎 ,𝟏𝟏, 𝟏𝟏𝟏 𝐑 . 𝐄 .=𝟎|𝟏𝟏|𝟏𝟏𝟏
3. String having zero or more a.
𝐑 . 𝐄 .= 𝐚 ∗
𝐒𝐭𝐫𝐢𝐧𝐠𝐬 : 𝛜 , 𝐚 , 𝐚𝐚 , 𝐚𝐚𝐚 , 𝐚𝐚𝐚𝐚 …..
4. String having one or more a.
𝐑 . 𝐄 .= 𝐚 +¿
𝐒𝐭𝐫𝐢𝐧𝐠𝐬 : 𝐚 , 𝐚𝐚 , 𝐚𝐚𝐚 , 𝐚𝐚𝐚𝐚 …..
5. Regular expression over that represent all string of length 3.
𝐒𝐭𝐫𝐢𝐧𝐠𝐬 : 𝐚𝐛𝐜 , 𝐛𝐜𝐚 , 𝐛𝐛𝐛 ,𝐜𝐚𝐛 ,𝐚𝐛𝐚 …. 𝐑 . 𝐄 .= ( 𝐚|𝐛|𝐜 )( 𝐚|𝐛|𝐜 ) (𝐚|𝐛|𝐜)
6. All binary string
𝐒𝐭𝐫𝐢𝐧𝐠𝐬 :𝟎,𝟏𝟏,𝟏𝟎𝟏,𝟏𝟎𝟏𝟎𝟏,𝟏𝟏𝟏𝟏… +
Regular expression examples
7. 0 or more occurrence of either a or b or both
𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝝐,𝒂,𝒂𝒂,𝒂𝒃𝒂𝒃,𝒃𝒂𝒃… 𝑹. 𝑬 .=(𝒂∨𝒃)∗
8. 1 or more occurrence of either a or b or both
𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝒂,𝒂𝒂,𝒂𝒃𝒂𝒃,𝒃𝒂𝒃,𝒃𝒃𝒃𝒂𝒂𝒂… +

9. Binary no. ends with 0


𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝟎,𝟏𝟎,𝟏𝟎𝟎,𝟏𝟎𝟏𝟎,𝟏𝟏𝟏𝟏𝟎… *

10.Binary no. ends with 1


𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝟏,𝟏𝟎𝟏,𝟏𝟎𝟎𝟏,𝟏𝟎𝟏𝟎𝟏,… 𝑹. 𝑬 .=(𝟎∨𝟏)∗𝟏
11.Binary no. starts and ends with 1
𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝟏𝟏,𝟏𝟎𝟏,𝟏𝟎𝟎𝟏,𝟏𝟎𝟏𝟎𝟏,…
12.String starts and ends with same character
𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝟎𝟎,𝟏𝟎𝟏,𝒂𝒃𝒂,𝒃𝒂𝒂𝒃…
Regular expression examples
13.All string of a and b starting with a
… *

14.String of 0 and 1 ends with 00


… 𝑹. 𝑬 .=(𝟎∨𝟏)∗𝟎𝟎
15.String ends with abb
… 𝑹. 𝑬 .=(𝒂∨𝒃)∗𝒂𝒃𝒃
16.String starts with 1 and ends with 0
… 𝑹. 𝑬 .=𝟏(𝟎∨𝟏)∗𝟎
17.All binary string with at least 3 characters and 3rd character should be zero

𝑹.𝑬.=( 𝟎|𝟏 )( 𝟎|𝟏) 𝟎(𝟎∨𝟏)∗
18.Language which consist of exactly two b’s over the set

𝑹. 𝑬 .=𝒂∗𝒃 𝒂∗𝒃𝒂∗
Regular expression examples
24.The language with where should be multiple of 3
𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝒂𝒂𝒂,𝒃𝒂𝒂𝒂,𝒃𝒂𝒄𝒂𝒃𝒂,𝒂𝒂𝒂𝒂𝒂𝒂.. ∗ ∗ ∗
𝑹.𝑬.=( ( 𝒃∨𝒄 ) 𝒂 ( 𝒃∨𝒄 ) 𝒂 ( 𝒃∨𝒄 ) 𝒂 ( 𝒃∨𝒄 ) )
∗∗

25.Even no. of 0
∗ ∗ ∗ ∗
…. 𝑹 . 𝑬 .=(𝟏 𝟎 𝟏 𝟎 𝟏 )
26.String should have odd length

…. 𝑹. 𝑬 .=( 𝟎∨𝟏 ) (( 𝟎|𝟏 ) (𝟎∨𝟏))
27.String should have even length

…. 𝑹 . 𝑬 .=( ( 𝟎|𝟏 ) ( 𝟎∨𝟏))
28.String start with 0 and has odd length

…. 𝑹. 𝑬 .=( 𝟎 ) ( ( 𝟎|𝟏 ) (𝟎∨𝟏))
30.String start with 1 and has even length

…. 𝑹. 𝑬 .=𝟏(𝟎∨𝟏)(( 𝟎|𝟏 ) (𝟎∨𝟏))
31.All string begins or ends with 00 or 11
𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝟎𝟎𝟏𝟎𝟏,𝟏𝟎𝟏𝟎𝟎,𝟏𝟏𝟎,𝟎𝟏𝟎𝟏𝟏… 𝑹.𝑬.=(𝟎𝟎∨𝟏𝟏)(𝟎∨𝟏)∗∨( 𝟎|𝟏 ) ∗(𝟎𝟎∨𝟏𝟏)
Regular expression examples
31.Language of all string containing both 11 and 00 as
𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝟎𝟎𝟏𝟏,𝟏𝟏𝟎𝟎,𝟏𝟎𝟎𝟏𝟏𝟎,𝟎𝟏𝟎𝟎𝟏𝟏…
substring

𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝟎𝟏𝟏,𝟏𝟏𝟎𝟏,𝟏𝟎𝟏𝟏….
32.String ending with 𝑹1. and
𝑬 .=not( 𝟏|𝟎𝟏 )00
contain +¿
𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝒂𝒓𝒆𝒂,𝒊,𝒓𝒆𝒅𝒊𝒐𝒖𝒔,𝒈𝒓𝒂𝒅𝒆𝟏….
33.Language

𝑹. 𝑬 .=(¿+𝑳)(¿+𝑳+𝑫)
of C identifier
𝒘𝒉𝒆𝒓𝒆 𝑳𝒊𝒔𝑳𝒆𝒕𝒕𝒆𝒓 ∧𝐃𝐢𝐬𝐝𝐢𝐠𝐢𝐭
Regular definition
• A regular definition gives names to certain regular expressions
and uses those names in other regular expressions.
• Regular definition is a sequence of definitions of the form:

……

Where is a distinct name & is a regular expression.


 Example: Regular definition for identifier
letter  A|B|C|………..|Z|a|b|………..|z
digit  0|1|…….|9|
id letter (letter | digit)*
Regular definition example
• Example: Unsigned Pascal numbers
3
5280
39.37
6.336E4
1.894E-4
2.56E+7
Regular Definition
digit  0|1|…..|9

optional_fraction  .digits | 𝜖
digits  digit digit*

optional_exponent  (E(+|-|𝜖)digits)|𝜖
num  digits optional_fraction optional_exponent
Transition Diagram
Transition Diagram
• A stylized flowchart is called transition diagram.

is a state

is a transition

is a start state

is a final state
Transition diagram : Unsigned number

3
5280
39.37
1.894 E - 4
2.56 E + 7
45 E + 6
96 E 2
Transition Diagram : Relational operator

<
0 1
=
2 return (relop,LE)

>
3 return (relop,NE)
=
other
5
4 return (relop,LT)
return (relop,EQ)
>

6 =
7 return (relop,GE)

other
8 return (relop,GT)
Finite Automata
• Finite Automata are recognizers.
• FA simply say “Yes” or “No” about each possible input string.
• Finite Automata is a mathematical model consist of:
1. Set of states
2. Set of input symbol
3. A transition function move
4. Initial state
5. Final states or accepting states
Types of finite automata
• Types of finite automata are:
DFA
b
 Deterministic finite automata (DFA): have
for each state exactly one edge leaving out a b b
1 2 3 4
for each symbol.
a
a
b a
NFA DFA
 Nondeterministic finite automata (NFA): a
There are no restrictions on the edges
leaving a state. There can be several with a b b
1 2 3 4
the same symbol as label and some edges
can be labeled with .
b NFA
Conversion from
regular expression to
DFA
Rules to compute nullable, firstpos,
lastpos
• nullable(n)
• The subtree at node generates languages including the empty string.

• firstpos(n)
• The set of positions that can match the first symbol of a string generated by
the subtree at node
• lastpos(n)
• The set of positions that can match the last symbol of a string generated be
the subtree at node
• followpos(i)
• The set of positions that can follow position in the tree.
Rules to compute nullable, firstpos,
lastpos
Node n nullable(n) firstpos(n) lastpos(n)
A leaf labeled
true
by with
A leaf
false
position
firstpos(c1) lastpos(c1)
n
¿ nullable(c1)
or  
c c nullable(c2) firstpos(c2) lastpos(c2)
1 2

if
n . if (nullable(c1)) (nullable(c2))
c c nullable(c1) then firstpos(c1) then
1 2 and  firstpos(c2) lastpos(c1) 
nullable(c2) else lastpos(c2)
n ∗ firstpos(c else )
true firstpos(c1))
1 lastpos(c
c lastpos(c12)
1
Rules to compute followpos
1. If n is concatenation node with left child c1 and right child
c2 and i is a position in lastpos(c1), then all position in
firstpos(c2) are in followpos(i)

2. If n is * node and i is position in lastpos(n), then all position


in firstpos(n) are in followpos(i)
Conversion from regular expression to
DFA
(a|b)* abb (Given RE)
Augmented RE
(a|b)* abb #
. Step 1: Convert RE in augmented RE
. Step 2: Construct Syntax Tree
¿
𝟔 Step 3: Identify Nullable node * is only nullable node
. Step 4: Find firstpos and lastpos
𝑏
. 𝟓 Step 5: Find followpos(i)
𝑏
𝟒 Step 6: Construction of transition table
∗ 𝑎 Step 7: : Construction of DFA
𝟑
¿
𝑎 𝑏
𝟏 𝟐
Conversion from regular expression to
DFA
Step 4: Calculate firstpos
Firstpos
{1,2,3} .
{1,2,3} . Firstpos (A leaf with position
{6 }¿
{1,2,3} . 𝟔
{5 }𝑏
{1,2,3} . 𝟓
n
¿ firstpos(c1) 
{4 }𝑏 c c firstpos(c2)
𝟒 1
{1,2} ∗ {3 }𝑎
2

n∗
𝟑 firstpos(c1)
c
{1,2} ¿ 1

n if (nullable(c1))
.
𝑎 𝑏 thenfirstpos(c1) 
{1}𝟏 {2 𝟐
} c c firstpos(c2)
1 2 else firstpos(c1)
Conversion from regular expression to
DFA
Step 4: Calculate lastpos
Lastpos
{1,2,3} . {6 }
{1,2,3} . {5 }
{6 }¿{6 } Lastpos( A leaf with position

{1,2,3} . {4 } 𝟔
{5 }𝑏{5 }
{1,2,3} . {3 } {4 }𝑏 𝟓
n
¿ lastpos(c1)  lastpos(c2)
{4 } c1 c2
𝟒
{1,2} ∗{1,2} {3 }𝑎 {3 } n∗
𝟑 lastpos(c1)

{1,2} ¿
c1
{1,2}
n if (nullable(c2)) then
.
𝑎 𝑏 lastpos(c1)  lastpos(c2)
{1} {2 𝟐
{1}𝟏 } {2 } c1 c2 else lastpos(c2)
Conversion from regular expression to
DFA
Step 5: Calculate followpos Position followpos
5 6
Firstpos {1,2,3} . {6 }
Lastpos
{1,2,3} .{5 }
{6 }¿{6 }
{1,2,3} .{4 } 𝟔
{5 }𝑏{5 }
{1,2,3} . {3 } {4 }𝑏 𝟓 .
{4 }
𝟒 1. If n is
{1,2} ∗{1,2} {3 }𝑎 {3 } {1,2,3} 𝒄 𝟏{5 } {6 } 𝒄 𝟐{6 } concatenation
𝟑 node with left child
{1,2} ¿{1,2} c1 and right child
c2 and i is a
position in
𝑎 𝑏 lastpos(c1), then
{1} {2 𝟐
{1}𝟏 } {2 } all position in
firstpos(c2) are in
followpos(i)
Conversion from regular expression to
DFA
Step 4: Calculate followpos Position followpos
5 6
{1,2,3} . {6 } 4 5

{1,2,3} .{5 }
{6 }¿{6 }
{1,2,3} .{4 } 𝟔
{5 }𝑏{5 }
{1,2,3} . {3 } {4 }𝑏 𝟓 .
{4 }
𝟒 1. If n is
{1,2} ∗{1,2} {3 }𝑎 {3 } {1,2,3} 𝒄 𝟏{4 } {5 } 𝒄 𝟐{5 } concatenation
𝟑 node with left child
{1,2} ¿{1,2} c1 and right child
c2 and i is a
position in
𝑎 𝑏 lastpos(c1), then
{1} {2 𝟐
{1}𝟏 } {2 } all position in
firstpos(c2) are in
followpos(i)
Conversion from regular expression to
DFA
Step 4: Calculate followpos Position followpos
5 6
Firstpos {1,2,3} . {6 } 4 5
Lastpos
{1,2,3} .{5 } 3 4
{6 }¿{6 }
{1,2,3} .{4 } 𝟔
{5 }𝑏{5 }
{1,2,3} . {3 } {4 }𝑏 𝟓 .
{4 }
𝟒 1. If n is
{1,2} ∗{1,2} {3 }𝑎 {3 } {1,2,3} 𝒄 𝟏{3 } {4 } 𝒄 𝟐{4 } concatenation
𝟑 node with left child
{1,2} ¿{1,2} c1 and right child
c2 and i is a
position in
𝑎 𝑏 lastpos(c1), then
{1} {2 𝟐
{1}𝟏 } {2 } all position in
firstpos(c2) are in
followpos(i)
Conversion from regular expression to
DFA
Step 4: Calculate followpos Position followpos
5 6
Firstpos {1,2,3} . {6 } 4 5
Lastpos
{1,2,3} .{5 } 3 4
{6 }¿{6 }
2 3
{1,2,3} .{4 } 𝟔
{5 }𝑏{5 } 1 3
{1,2,3} . {3 } {4 }𝑏 𝟓 .
{4 }
𝟒 1. If n is
{1,2} ∗{1,2} {3 }𝑎 {3 } {1,2} 𝒄 𝟏{1,2} {3 } 𝒄 𝟐{3 } concatenation
𝟑 node with left child
{1,2} ¿{1,2} c1 and right child
c2 and i is a
position in
𝑎 𝑏 lastpos(c1), then
{1} {2 𝟐
{1}𝟏 } {2 } all position in
firstpos(c2) are in
followpos(i)
Conversion from regular expression to
DFA
Step 4: Calculate followpos Position followpos
5 6
Firstpos {1,2,3} . {6 } 4 5
Lastpos
{1,2,3} .{5 } 3 4
{6 }¿{6 }
2 1,2,3
{1,2,3} .{4 } 𝟔
{5 }𝑏{5 } 1 1,2,3
{1,2,3} . {3 } {4 }𝑏 𝟓
{4 } If n is * node and i is
𝟒 {1,2} *{1,2}
position in lastpos(n),
{1,2} ∗{1,2} {3 }𝑎 {3 } 𝒏 then all position in
𝟑
firstpos(n) are in
{1,2} ¿{1,2} followpos(i)

𝑎 𝑏
{1} {2 𝟐
{1}𝟏 } {2 }
Conversion from regular expression to
DFA
Initial state = of root = {1,2,3} ----- A Position followpos
b 5 6
State A b 4 5
δ( (1,2,3),a) = followpos(1) U a 3 4
followpos(3) b 2 1,2,3
1 1,2,3
=(1,2,3) U (4) = {1,2,3,4} a
----- B
States a b
δ( (1,2,3),b) = followpos(2) A={1,2,3} B A
B={1,2,3,4}
=(1,2,3) ----- A
Conversion from regular expression to
DFA
State B
Position followpos
δ( (1,2,3,4),a) = followpos(1) U followpos(3) b 5 6
=(1,2,3) U (4) = {1,2,3,4} ----- B b 4 5
a 3 4

δ( (1,2,3,4),b) = followpos(2) U followpos(4) b 2 1,2,3


a 1 1,2,3
=(1,2,3) U (5) = {1,2,3,5} ----- C
State C
States a b
δ( (1,2,3,5),a) = followpos(1) U followpos(3)
A={1,2,3} B A
=(1,2,3) U (4) = {1,2,3,4} -----B={1,2,3,4}
B B C
C={1,2,3,5} B D
D={1,2,3,6}
δ( (1,2,3,5),b) = followpos(2) U followpos(5)
=(1,2,3) U (6) = {1,2,3,6} ----- D
Conversion from regular expression to
DFA
State D Position followpos
δ( (1,2,3,6),a) = followpos(1) U followpos(3)
b 5 6
=(1,2,3) U (4) = {1,2,3,4} -----
b B
4 5
a 3 4
b 2 1,2,3
δ( (1,2,3,6),b) = followpos(2)
a 1 1,2,3
=(1,2,3) ----- A
b States a b
a
A={1,2,3} B A
a b b B={1,2,3,4} B C
A B C D
C={1,2,3,5} B D
a
a D={1,2,3,6} B A
b

DFA
Conversion from regular expression to
DFA
Construct DFA for following regular expression:
1. (c | d)*c
2. (a+b)*+(a.c)*
Exercise
Convert following regular expression to DFA:
1. abba
2. bb(a)*
3. (a|b)*
4. a* | b*
5. a(a)*ab
6. aa*+ bb*
7. (a+b)*abb
8. 10(0+1)*1
9. (a+b)*a(a+b)
10.(0+1)*010(0+1)*
11.(010+00)*(10)*
12. 100(1)*00(0+1)*

You might also like