1-Structure and Phases of a Compiler-19!07!2024 (1)
1-Structure and Phases of a Compiler-19!07!2024 (1)
DESIGN
Dr. M. Bhuvaneswari
Assistant Professor Sr. Grade 2
School of Computer Science and
Engineering
Vellore Institute of Technology Vellore
Objectives
• To provide fundamental knowledge of various language
translators.
• To make students familiar with lexical analysis and
parsing techniques.
• To understand the various actions carried out in
semantic analysis.
• To make the students get familiar with how the
intermediate code is generated.
• To understand the principles of code optimization
techniques and code generation.
• To provide foundation for study of high-performance
compiler design.
Outcomes
• Apply the skills on devising, selecting, and using tools
and techniques towards compiler design
• Develop language specifications using context free
grammars (CFG).
• Apply the ideas, the techniques, and the knowledge
acquired for the purpose of developing software
systems.
• Constructing symbol tables and generating
intermediate code.
• Obtain insights on compiler optimization and code
generation
Syllabus
• Module: 1 Introduction to Compilation and Lexical Analysis 7 hours
• Introduction to LLVM - Structure and Phases of a Compiler-Design Issues-
Patterns Lexemes-Tokens-Attributes-Specification of Tokens-Extended
Regular Expression- Regular expression to Deterministic Finite Automata
(Direct method) - Lex - A Lexical Analyzer Generator
• Module: 2 Syntax Analysis 8 hours
• Role of Parser- Parse Tree - Elimination of Ambiguity – Top Down Parsing
– Recursive Descent Parsing - LL (1) Grammars – Shift Reduce Parsers-
Operator Precedence Parsing - LR Parsers, Construction of SLR Parser
Tables and Parsing- CLR Parsing- LALR Parsing
• Module: 3 Semantic Analysis 5 hours
• Syntax Directed Definition – Evaluation Order - Applications of Syntax
Directed Translation - Syntax Directed Translation Schemes -
Implementation of L-attributed Syntax Directed Definition
• Module: 4 Intermediate Code Generation 5 hours
• Variants of Syntax trees - Three Address Code- Types – Declarations -
Procedures - Assignment Statements - Translation of Expressions -
Control Flow - Back Patching- Switch Case Statements.
Cont..
• Module: 5 Code Optimization 6 hours
• Loop optimizations- Principal Sources of Optimization -Introduction to
Data Flow Analysis - Basic Blocks - Optimization of Basic Blocks -
Peephole Optimization- The DAG Representation of Basic Blocks -Loops in
Flow Graphs - Machine Independent Optimization Implementation of a
naïve code generator for a virtual Machine- Security checking of virtual
machine code
• Module: 6 Code Generation 5 hours
• Issues in the design of a code generator- Target Machine- Next-Use
Information – Register Allocation and Assignment- Runtime Organization-
Activation Records.
• Module: 7 Parallelism 7 hours
• Parallelization- Automatic Parallelization- Optimizations for Cache
Locality and Vectorization- Domain Specific Languages-Compilation-
Instruction Scheduling and Software Pipelining- Impact of Language
Design and Architecture Evolution on Compilers Static Single Assignment
• Module: 8 Contemporary Issues 2 hours
Text Books & References
• Text Book
• A. V. Aho, Monica S. Lam, Ravi Sethi and Jeffrey D. Ullman, Compilers:
Principles, techniques, & tools, 2007, Second Edition, Pearson Education,
Boston.
• Reference Books
• Watson, Des. A Practical Approach to Compiler Construction. Germany,
Springer International Publishing, 2017
Content - Module -1
• Introduction to Compilation And Lexical
Analysis
• Introduction to LLVM
• Structure and Phases of a Compiler
• Design Issues
• Patterns Lexemes
• Tokens-Attributes
• Specification of Tokens
• Extended Regular Expression
• Regular expression to Deterministic Finite Automata (Direct
method)
• Lex - A Lexical Analyzer Generator
Translator
• A translator is a program that takes one form of
program as input and converts it into another
form.
• Types of translators are:
1. Compiler Source Translator Target
Program Program
2. Interpreter
3. Assembler
Error
Messages
Compiler
• A compiler is a program that reads a program written
in source language and translates it into an equivalent
program in target language.
Target Assembly
• Preprocessor Program
• Assembler Assembler
• Linker Relocatable Object
• Loader Code
Libraries & Linker / Loader
Object Files
Absolute Machine
Code
Context of compiler (Cousins of compiler)
Source Program
Preprocessor
Some of the task performed by Preprocessor
Absolute Machine
Code
Context of compiler (Cousins of compiler)
Skeletal Source Program
Compiler
A compiler is a program that reads a program Preprocessor
Target Assembly
Program
Assembler
Relocatable Object
Code
Libraries & Linker / Loader
Object Files
Absolute Machine
Code
Context of compiler (Cousins of compiler)
Skeletal Source Program
Assembler
Assembler is a translator which takes the assembly Preprocessor
Target Assembly
Program
Assembler
Relocatable Object
Code
Libraries & Linker / Loader
Object Files
Absolute Machine
Code
Context of compiler (Cousins of compiler)
Skeletal Source Program
Linker
Linker makes a single program from a several files Preprocessor
Lexical analysis
Intermediate Code
code optimization
Syntax analysis generation
Code
Semantic analysis generation
Lexical analysis
• Lexical Analysis is also called linear analysis
or scanning. Position = initial + rate*60
• Lexical Analyzer divides the given source
statement into the tokens.
Lexical analysis
• Ex: Position = initial + rate * 60 would be
grouped into the following tokens:
<id,1><=><id,2><+><id,3><*><60>
Position (identifier) <id,1>
Reads the stream of char
1 – points to an entry in the symbol table for this making up the source
token program &
= (Assignment symbol) <=>
Lexical analyzer groups the
initial (identifier) characters into meaningful
sequences called lexemes.
+ (Plus symbol)
rate (identifier) For each lexeme, the lexical
analyzer produces as output
* (Multiplication symbol) a token of the form
<token-name, attribute-
Phases of Compiler
Compiler
Lexical analysis
Intermediate Code
code optimization
Syntax analysis generation
Code
Semantic analysis generation
Syntax analysis
Position = initial + rate*60
• Syntax Analysis is also called Parsing or
Lexical analysis
Hierarchical Analysis.
id1 = id2 + id3 *
• It takes token produced by lexical analyzer as 60
Input & generates the parse tree.
Syntax analysis
• Matching of parenthesis.
• The syntax analyzer checks each line of the =
code and spots every tiny mistake.
id1 +
• If code is error free then syntax analyzer
generates the tree. id2 *
id3 60
Phases of compiler
Compiler
Lexical analysis
Intermediate Code
code optimization
Syntax analysis generation
Code
Semantic analysis generation
Semantic analysis
• Semantic analyzer determines the =
meaning of a source string. id1 +
• It performs following operations: id2 * int to
1. Type checking, Coercions. float
id3 60
2. Array index should be int. Typecasting
3. Performing arithmetic operation that
Semantic analysis
are type compatible.
4. Checking the scope of operation. =
*Note: Consider id1, id2 and id3 are real
id1 +
id2 *
id3 inttofloat
60
Phases of compiler
Compiler
Lexical analysis
Intermediate Code
code optimization
Syntax analysis generation
Code
Semantic analysis generation
Intermediate code generator
• Two important properties of =
intermediate code : id1 +
1. It should be easy to produce.
id2 *
2. Easy to translate into target
t3 id3 inttofloat
program. t2 t1
60
• Intermediate form can be represented
Intermediate code
using “three address code”.
• Three address code consist of a t1= inttofloat(60)
sequence of instruction, each of t2= id3 * t1
t3= t2 + id2
which has at most three operands. id1= t3
Phases of compiler
Compiler
Lexical analysis
Intermediate Code
code optimization
Syntax analysis generation
Code
Semantic analysis generation
Code optimization
• It improves the intermediate code.
• This is necessary to have a faster Intermediate code
execution of code or less
t1= int to real(60)
consumption of memory. t2= id3 * t1
t3= t2 + id2
id1= t3
Code optimization
Lexical analysis
Intermediate Code
code optimization
Syntax analysis generation
Code
Semantic analysis generation
Code generation
• The intermediate code instructions
are translated into sequence of Code optimization
machine instruction.
t1= id3 * 60.0
id1 = id2 + t1
Code generation
MOV id3, R2
MUL #60.0, R2
MOV id2, R1
ADD R2,R1
MOV R1, id1
Id3R2
Id2R1
Phases of compiler
Source
program
Analysis
Lexical analysis Phase
Syntax analysis
Semantic
analysis Error
Symbol
table detection
Intermediate
and recovery
code
Code
optimization
Code Synthesis
generation Phase
Target
Program
Symbol table
• Symbol table are data structures that are used by compilers to
hold information about source-program constructs.
• It is created and maintained by compiler.
• It is used to store information about the occurrences of various
entities such as, variable names, functions, objects, classes,
etc.,
• All these information are collected incrementally by analysis
phase and used by synthesis phase to generate target code.
• Symbol table is used for the following purposes
• It is used to store the name of all the entities in a structured form at
one place.
• It is used to verify if a variable has been declared.
• It is used to determine the scope of a name.
• It is used to implement type checking by verifying assignments and
expression in the source code are semantically correct.
Cont.,
• Symbol table can be a linear (Linked list) or hash table.
• Role of each phases of compiler with respect to symbol table.
• Lexical Analysis - Create new entry for new identifiers
• Syntax analysis - Add attributes information such as
type, dimension, scope etc.
• Semantic analysis - Check semantics and update the
information, if needed.
• ICG - Based on the available information in the symbol
table, add temporary variable information.
• Code optimization - As per the available information in
the symbol table, code optimization is done as per the
address and aliased information.
• TCG - Generate target code as per the identifier’s
address info that are present in the symbol table
Cont.,
• Example
int coursecode; - Line 1
char name[]=“Compiler”; - Line 2
printf(“%d”,coursecode) - Line 3
Name Type Size Dimensio LOD LO Addres
n U s
coursecode int 4 0 1 3 2024
name char 8 1 2 10 3056
Cont.,
Exercise 1
• Write output of all the phases of compiler for following
statements:
1. x = b-c*2
2. I=p*n*r/100
Grouping of Phases
Front end & back end (Grouping of
phases)
Front end
• Depends primarily on source language and largely independent of the target machine.
• It includes following phases:
1. Lexical analysis
2. Syntax analysis
3. Semantic analysis
4. Intermediate code generation
5. Creation of symbol table
Back end
Depends on target machine and do not depends on source program.
It includes following phases:
1. Code optimization
2. Code generation phase
3. Error handling and symbol table operation
Pass structure
Pass structure
• Several phases are grouped into pass that reads an input file and
writes an output file.
• One complete scan of a source program is called pass.
• In a single pass compiler, analysis of source statement is
immediately followed by synthesis - equivalent target statement.
• While in a two pass compiler intermediate code is generated
between analysis and synthesis phase.
• Some compiler collection have been created around carefully
designed intermediate representations that allow the front end for
a particular language to interface with the back end for a certain
target machine.
• With these collections, we can produce compilers for different
target machines
Pass structure
It is difficult to compile the source program into single pass due
to: forward reference.
Forward reference: A forward reference of a program entity is
a reference to the entity which precedes its definition in
the program.
• This problem can be solved by postponing the generation of
target
Pass I: code until more information concerning the entity
becomes available.
•Pass
It II:
leads to multi pass model of compilation.
Types of compiler
Types of compiler
1. One pass compiler - Turbo Pascal
• It is a type of compiler that compiles whole process in one-pass.
2. Two pass compiler
• It is a type of compiler that compiles whole process in two-pass.
• It generates intermediate code.
3. Incremental compiler
• The compiler which compiles only the changed line from the source code and
update the object code.
4. Native code compiler
• The compiler used to compile a source code for a same type of platform only.
5. Cross compiler
• The compiler used to compile a source code for a different kinds
platform.
Token, Pattern &
Lexemes
Interaction of scanner & parser
Toke
Source Lexical n
Parser
Progra Analyzer
m Get next
token
Symbol Table
= Operator1
Tokens
sum Identifier2
+ Operator2
45 Constant1
Lexemes
Lexemes of identifier: total, sum
Lexemes of operator: =, +
Lexemes of constant: 45
Example: Token, Pattern & Lexemes
C code:
printf("Total = %d\n", score);
printf and score are lexemes matching the pattern for token
id,
"Total = %d\n" is a lexeme matching literal
In many programming languages, the following classes
cover most or all of the tokens:
Attributes of Tokens
When more than one lexeme can match a pattern, the lexical analyzer
must provide the subsequent compiler phases additional information
about the particular lexeme that matched.
For example, the pattern for token number matches both 0 and 1
Thus, in many cases the lexical analyzer returns to the parser not
only a token name, but an attribute value that describes the lexeme
represented by the token;
Normally, information about an identifier e.g., its lexeme, its type,
and the location at which it is first found is kept in the symbol
table. Thus, the appropriate attribute value for an identifier is a
Attributes of Tokens
The token names and associated attribute values for the Fortran
statement
E = M * C ** 2
are written below as a sequence of pairs.
<id, pointer to symbol-table entry for E>
<assign op>
<id, pointer to symbol-table entry for M>
<mult op>
<id, pointer to symbol-table entry for C>
<exp op>
<float> <id, limitedSquaare> <(> <id, x> <)> <{>
<float> <id, x>
<return> <(> <id, x> <op,"<="> <num, -10.0> <op, "||"> <id, x> <op, ">="> <num, 10.0>
<)> <op, "?"> <num, 100> <op, ":"> <id, x> <op, "*"> <id, x> <}>
Specification of tokens
Written L+
Regular Expression &
Regular Definition
Regular expression
• A regular expression is a sequence of characters that define
a pattern.
Notational shorthand's
1. One or more instances: +
2. Zero or more instances: *
3. Zero or one instances: ?
4. Alphabets: Σ
Rules to define regular expression
1. is a regular expression that denotes , the set containing empty
string.
2. If is a symbol in then is a regular expression,
3. Suppose and are regular expression denoting the languages
and . Then,
a. is a regular expression denoting
b. is a regular expression denoting
c. * is a regular expression denoting
d. is a regular expression denoting
*
𝜖
a
aa Infinite
aaa
aaa …..
a
aaaaa
…..
Regular expression
+
a +
• L = One or More Occurrences of a =
a
aa
aaa Infinite …..
aaaa
aaaaa…..
Precedence and associativity of operators
Operator Precedence Associative
Kleene * 1 left
Concatenation 2 left
Union | 3 left
Regular expression examples
1. 0 or 1
𝐒𝐭𝐫𝐢𝐧𝐠𝐬 :𝟎 ,𝟏𝐑 . 𝐄 .=𝟎∨𝟏
2. 0 or 11 or 111
𝐒𝐭𝐫𝐢𝐧𝐠𝐬 :𝟎 ,𝟏𝟏, 𝟏𝟏𝟏 𝐑 . 𝐄 .=𝟎|𝟏𝟏|𝟏𝟏𝟏
3. String having zero or more a.
𝐑 . 𝐄 .= 𝐚 ∗
𝐒𝐭𝐫𝐢𝐧𝐠𝐬 : 𝛜 , 𝐚 , 𝐚𝐚 , 𝐚𝐚𝐚 , 𝐚𝐚𝐚𝐚 …..
4. String having one or more a.
𝐑 . 𝐄 .= 𝐚 +¿
𝐒𝐭𝐫𝐢𝐧𝐠𝐬 : 𝐚 , 𝐚𝐚 , 𝐚𝐚𝐚 , 𝐚𝐚𝐚𝐚 …..
5. Regular expression over that represent all string of length 3.
𝐒𝐭𝐫𝐢𝐧𝐠𝐬 : 𝐚𝐛𝐜 , 𝐛𝐜𝐚 , 𝐛𝐛𝐛 ,𝐜𝐚𝐛 ,𝐚𝐛𝐚 …. 𝐑 . 𝐄 .= ( 𝐚|𝐛|𝐜 )( 𝐚|𝐛|𝐜 ) (𝐚|𝐛|𝐜)
6. All binary string
𝐒𝐭𝐫𝐢𝐧𝐠𝐬 :𝟎,𝟏𝟏,𝟏𝟎𝟏,𝟏𝟎𝟏𝟎𝟏,𝟏𝟏𝟏𝟏… +
Regular expression examples
7. 0 or more occurrence of either a or b or both
𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝝐,𝒂,𝒂𝒂,𝒂𝒃𝒂𝒃,𝒃𝒂𝒃… 𝑹. 𝑬 .=(𝒂∨𝒃)∗
8. 1 or more occurrence of either a or b or both
𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝒂,𝒂𝒂,𝒂𝒃𝒂𝒃,𝒃𝒂𝒃,𝒃𝒃𝒃𝒂𝒂𝒂… +
25.Even no. of 0
∗ ∗ ∗ ∗
…. 𝑹 . 𝑬 .=(𝟏 𝟎 𝟏 𝟎 𝟏 )
26.String should have odd length
∗
…. 𝑹. 𝑬 .=( 𝟎∨𝟏 ) (( 𝟎|𝟏 ) (𝟎∨𝟏))
27.String should have even length
∗
…. 𝑹 . 𝑬 .=( ( 𝟎|𝟏 ) ( 𝟎∨𝟏))
28.String start with 0 and has odd length
∗
…. 𝑹. 𝑬 .=( 𝟎 ) ( ( 𝟎|𝟏 ) (𝟎∨𝟏))
30.String start with 1 and has even length
∗
…. 𝑹. 𝑬 .=𝟏(𝟎∨𝟏)(( 𝟎|𝟏 ) (𝟎∨𝟏))
31.All string begins or ends with 00 or 11
𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝟎𝟎𝟏𝟎𝟏,𝟏𝟎𝟏𝟎𝟎,𝟏𝟏𝟎,𝟎𝟏𝟎𝟏𝟏… 𝑹.𝑬.=(𝟎𝟎∨𝟏𝟏)(𝟎∨𝟏)∗∨( 𝟎|𝟏 ) ∗(𝟎𝟎∨𝟏𝟏)
Regular expression examples
31.Language of all string containing both 11 and 00 as
𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝟎𝟎𝟏𝟏,𝟏𝟏𝟎𝟎,𝟏𝟎𝟎𝟏𝟏𝟎,𝟎𝟏𝟎𝟎𝟏𝟏…
substring
𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝟎𝟏𝟏,𝟏𝟏𝟎𝟏,𝟏𝟎𝟏𝟏….
32.String ending with 𝑹1. and
𝑬 .=not( 𝟏|𝟎𝟏 )00
contain +¿
𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝒂𝒓𝒆𝒂,𝒊,𝒓𝒆𝒅𝒊𝒐𝒖𝒔,𝒈𝒓𝒂𝒅𝒆𝟏….
33.Language
∗
𝑹. 𝑬 .=(¿+𝑳)(¿+𝑳+𝑫)
of C identifier
𝒘𝒉𝒆𝒓𝒆 𝑳𝒊𝒔𝑳𝒆𝒕𝒕𝒆𝒓 ∧𝐃𝐢𝐬𝐝𝐢𝐠𝐢𝐭
Regular definition
• A regular definition gives names to certain regular expressions
and uses those names in other regular expressions.
• Regular definition is a sequence of definitions of the form:
……
optional_fraction .digits | 𝜖
digits digit digit*
optional_exponent (E(+|-|𝜖)digits)|𝜖
num digits optional_fraction optional_exponent
Transition Diagram
Transition Diagram
• A stylized flowchart is called transition diagram.
is a state
is a transition
is a start state
is a final state
Transition diagram : Unsigned number
3
5280
39.37
1.894 E - 4
2.56 E + 7
45 E + 6
96 E 2
Transition Diagram : Relational operator
<
0 1
=
2 return (relop,LE)
>
3 return (relop,NE)
=
other
5
4 return (relop,LT)
return (relop,EQ)
>
6 =
7 return (relop,GE)
other
8 return (relop,GT)
Finite Automata
• Finite Automata are recognizers.
• FA simply say “Yes” or “No” about each possible input string.
• Finite Automata is a mathematical model consist of:
1. Set of states
2. Set of input symbol
3. A transition function move
4. Initial state
5. Final states or accepting states
Types of finite automata
• Types of finite automata are:
DFA
b
Deterministic finite automata (DFA): have
for each state exactly one edge leaving out a b b
1 2 3 4
for each symbol.
a
a
b a
NFA DFA
Nondeterministic finite automata (NFA): a
There are no restrictions on the edges
leaving a state. There can be several with a b b
1 2 3 4
the same symbol as label and some edges
can be labeled with .
b NFA
Conversion from
regular expression to
DFA
Rules to compute nullable, firstpos,
lastpos
• nullable(n)
• The subtree at node generates languages including the empty string.
• firstpos(n)
• The set of positions that can match the first symbol of a string generated by
the subtree at node
• lastpos(n)
• The set of positions that can match the last symbol of a string generated be
the subtree at node
• followpos(i)
• The set of positions that can follow position in the tree.
Rules to compute nullable, firstpos,
lastpos
Node n nullable(n) firstpos(n) lastpos(n)
A leaf labeled
true
by with
A leaf
false
position
firstpos(c1) lastpos(c1)
n
¿ nullable(c1)
or
c c nullable(c2) firstpos(c2) lastpos(c2)
1 2
if
n . if (nullable(c1)) (nullable(c2))
c c nullable(c1) then firstpos(c1) then
1 2 and firstpos(c2) lastpos(c1)
nullable(c2) else lastpos(c2)
n ∗ firstpos(c else )
true firstpos(c1))
1 lastpos(c
c lastpos(c12)
1
Rules to compute followpos
1. If n is concatenation node with left child c1 and right child
c2 and i is a position in lastpos(c1), then all position in
firstpos(c2) are in followpos(i)
n∗
𝟑 firstpos(c1)
c
{1,2} ¿ 1
n if (nullable(c1))
.
𝑎 𝑏 thenfirstpos(c1)
{1}𝟏 {2 𝟐
} c c firstpos(c2)
1 2 else firstpos(c1)
Conversion from regular expression to
DFA
Step 4: Calculate lastpos
Lastpos
{1,2,3} . {6 }
{1,2,3} . {5 }
{6 }¿{6 } Lastpos( A leaf with position
{1,2,3} . {4 } 𝟔
{5 }𝑏{5 }
{1,2,3} . {3 } {4 }𝑏 𝟓
n
¿ lastpos(c1) lastpos(c2)
{4 } c1 c2
𝟒
{1,2} ∗{1,2} {3 }𝑎 {3 } n∗
𝟑 lastpos(c1)
{1,2} ¿
c1
{1,2}
n if (nullable(c2)) then
.
𝑎 𝑏 lastpos(c1) lastpos(c2)
{1} {2 𝟐
{1}𝟏 } {2 } c1 c2 else lastpos(c2)
Conversion from regular expression to
DFA
Step 5: Calculate followpos Position followpos
5 6
Firstpos {1,2,3} . {6 }
Lastpos
{1,2,3} .{5 }
{6 }¿{6 }
{1,2,3} .{4 } 𝟔
{5 }𝑏{5 }
{1,2,3} . {3 } {4 }𝑏 𝟓 .
{4 }
𝟒 1. If n is
{1,2} ∗{1,2} {3 }𝑎 {3 } {1,2,3} 𝒄 𝟏{5 } {6 } 𝒄 𝟐{6 } concatenation
𝟑 node with left child
{1,2} ¿{1,2} c1 and right child
c2 and i is a
position in
𝑎 𝑏 lastpos(c1), then
{1} {2 𝟐
{1}𝟏 } {2 } all position in
firstpos(c2) are in
followpos(i)
Conversion from regular expression to
DFA
Step 4: Calculate followpos Position followpos
5 6
{1,2,3} . {6 } 4 5
{1,2,3} .{5 }
{6 }¿{6 }
{1,2,3} .{4 } 𝟔
{5 }𝑏{5 }
{1,2,3} . {3 } {4 }𝑏 𝟓 .
{4 }
𝟒 1. If n is
{1,2} ∗{1,2} {3 }𝑎 {3 } {1,2,3} 𝒄 𝟏{4 } {5 } 𝒄 𝟐{5 } concatenation
𝟑 node with left child
{1,2} ¿{1,2} c1 and right child
c2 and i is a
position in
𝑎 𝑏 lastpos(c1), then
{1} {2 𝟐
{1}𝟏 } {2 } all position in
firstpos(c2) are in
followpos(i)
Conversion from regular expression to
DFA
Step 4: Calculate followpos Position followpos
5 6
Firstpos {1,2,3} . {6 } 4 5
Lastpos
{1,2,3} .{5 } 3 4
{6 }¿{6 }
{1,2,3} .{4 } 𝟔
{5 }𝑏{5 }
{1,2,3} . {3 } {4 }𝑏 𝟓 .
{4 }
𝟒 1. If n is
{1,2} ∗{1,2} {3 }𝑎 {3 } {1,2,3} 𝒄 𝟏{3 } {4 } 𝒄 𝟐{4 } concatenation
𝟑 node with left child
{1,2} ¿{1,2} c1 and right child
c2 and i is a
position in
𝑎 𝑏 lastpos(c1), then
{1} {2 𝟐
{1}𝟏 } {2 } all position in
firstpos(c2) are in
followpos(i)
Conversion from regular expression to
DFA
Step 4: Calculate followpos Position followpos
5 6
Firstpos {1,2,3} . {6 } 4 5
Lastpos
{1,2,3} .{5 } 3 4
{6 }¿{6 }
2 3
{1,2,3} .{4 } 𝟔
{5 }𝑏{5 } 1 3
{1,2,3} . {3 } {4 }𝑏 𝟓 .
{4 }
𝟒 1. If n is
{1,2} ∗{1,2} {3 }𝑎 {3 } {1,2} 𝒄 𝟏{1,2} {3 } 𝒄 𝟐{3 } concatenation
𝟑 node with left child
{1,2} ¿{1,2} c1 and right child
c2 and i is a
position in
𝑎 𝑏 lastpos(c1), then
{1} {2 𝟐
{1}𝟏 } {2 } all position in
firstpos(c2) are in
followpos(i)
Conversion from regular expression to
DFA
Step 4: Calculate followpos Position followpos
5 6
Firstpos {1,2,3} . {6 } 4 5
Lastpos
{1,2,3} .{5 } 3 4
{6 }¿{6 }
2 1,2,3
{1,2,3} .{4 } 𝟔
{5 }𝑏{5 } 1 1,2,3
{1,2,3} . {3 } {4 }𝑏 𝟓
{4 } If n is * node and i is
𝟒 {1,2} *{1,2}
position in lastpos(n),
{1,2} ∗{1,2} {3 }𝑎 {3 } 𝒏 then all position in
𝟑
firstpos(n) are in
{1,2} ¿{1,2} followpos(i)
𝑎 𝑏
{1} {2 𝟐
{1}𝟏 } {2 }
Conversion from regular expression to
DFA
Initial state = of root = {1,2,3} ----- A Position followpos
b 5 6
State A b 4 5
δ( (1,2,3),a) = followpos(1) U a 3 4
followpos(3) b 2 1,2,3
1 1,2,3
=(1,2,3) U (4) = {1,2,3,4} a
----- B
States a b
δ( (1,2,3),b) = followpos(2) A={1,2,3} B A
B={1,2,3,4}
=(1,2,3) ----- A
Conversion from regular expression to
DFA
State B
Position followpos
δ( (1,2,3,4),a) = followpos(1) U followpos(3) b 5 6
=(1,2,3) U (4) = {1,2,3,4} ----- B b 4 5
a 3 4
DFA
Conversion from regular expression to
DFA
Construct DFA for following regular expression:
1. (c | d)*c
2. (a+b)*+(a.c)*
Exercise
Convert following regular expression to DFA:
1. abba
2. bb(a)*
3. (a|b)*
4. a* | b*
5. a(a)*ab
6. aa*+ bb*
7. (a+b)*abb
8. 10(0+1)*1
9. (a+b)*a(a+b)
10.(0+1)*010(0+1)*
11.(010+00)*(10)*
12. 100(1)*00(0+1)*