0% found this document useful (0 votes)

23 views

CC 2

This document discusses lexical analysis and tokens in compiler construction. It covers: 1. The goal of lexical analysis is to partition the input string into substrings and classify them according to their roles. 2. A token is a syntactic category that can represent things like identifiers, keywords, integers, floats, symbols, and strings in a programming language. 3. Regular expressions are used to define patterns to identify valid tokens in a language and represent the language's grammar. Operations like union, concatenation, and Kleene closure can manipulate regular expressions.

Uploaded by

Kami

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views

CC 2

Uploaded by

Kami

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 65

Compiler Construction

Lexical Analysis
Lexical Analyzer
Tokens

Example:
if( i == j )
z = 0;
else
z = 1;

3
Tokens
• Input is just a sequence of characters:

i f ( \b i \b = = \b j \n \t ....

4
Tokens

Goal:
• partition input string into
substrings
• classify them according to their
role

5
Tokens

• A token is a syntactic
category
• Natural language:
“He wrote the program”
• Words: “He”, “wrote”, “the”,
“program”

6
Tokens

• Programming language:
“if(b == 0) a = b”
• Words:
“if”, “(”, “b”, “==”, “0”,
“)”, “a”, “=”, “b”

7
Tokens

• Identifiers: x y11 maxsize

• Keywords: if else while for
• Integers: 2 1000 -44 5L
• Floats: 2.0 0.0034 1e5
• Symbols: ( ) + * / { } < > ==
• Strings: “enter x” “error”
8
Tokens
• Lexemes are said to be a sequence of characters (alphanumeric) in a token.
• There are some predefined rules for every lexeme to be identified as a valid
token.
• These rules are defined by grammar rules, by means of a pattern.
• A pattern explains what can be a token, and these patterns are defined by
means of regular expressions.
• In programming language, keywords, constants, identifiers, strings, numbers,
operators and punctuations symbols can be considered as tokens.
• For example, in C language, the variable declaration line ; int value = 100;

Contains following tokens

int (keyword), value (identifier), = (operator), 100 (constant) and ; (symbol)
Specification of Tokens
• Let us understand how the language theory undertakes the following terms:

Alphabets
• Any finite set of symbols {0,1} is a set of binary alphabets,
{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F} is a set of Hexadecimal alphabets, {a-z, A-Z} is
a set of English language alphabets.

Strings
• Any finite sequence of alphabets (characters) is called a string. Length of the
string is the total number of occurrence of alphabets, e.g., the length of the
string Pakistan is 8 and is denoted by |Pakistan| = 8. A string having no
alphabets, i.e. a string of zero length is known as an empty string and is
denoted by ε (epsilon)
Specification of Tokens
Special symbols
A typical high-level language contains the following symbols:-
Regular Expression
Language
• A language is considered as a finite set of strings over some finite set of alphabets.
• Computer languages are considered as finite sets, and mathematically set operations can be
performed on them.
• Finite languages can be described by means of regular expressions.
Regular Expression
Regular Expressions
• The lexical analyzer needs to scan and identify only a finite set of valid string/token/lexeme that
belong to the language in hand.
• It searches for the pattern defined by the language rules.
• Regular expressions have the capability to express finite languages by defining a pattern for finite
strings of symbols.
• The grammar defined by regular expressions is known as regular grammar.
• The language defined by regular grammar is known as regular language.
• Regular expression is an important notation for specifying patterns.
• Each pattern matches a set of strings, so regular expressions serve as names for a set of
strings.
• Programming language tokens can be described by regular languages.
• The specification of regular expressions is an example of a recursive definition.
• Regular languages are easy to understand and have efficient implementation.
• There are a number of algebraic laws that are obeyed by regular expressions, which can be
used to manipulate regular expressions into equivalent forms.
Regular Expression
Operations
The various operations on languages are:
• Union of two languages L and M is written as
• L U M = {s | s is in L or s is in M}
• Concatenation of two languages L and M is written as
• LM = {st | s is in L and t is in M}
• The Kleene Closure of a language L is written as
• L* = Zero or more occurrence of language L.

Notations
If r and s are regular expressions denoting the languages L(r) and L(s), then
•Union : (r)|(s) is a regular expression denoting L(r) U L(s)
•Concatenation : (r)(s) is a regular expression denoting L(r)L(s)
•Kleene closure : (r)* is a regular expression denoting (L(r))*
•(r) is a regular expression denoting L(r)
Regular Expression
Precedence and Associativity
• *, concatenation (.), and | (pipe sign) are left associative
• * has the highest precedence
• Concatenation (.) has the second highest precedence.
• | (pipe sign) has the lowest precedence of all.

Representing valid tokens of a language in regular expression

If x is a regular expression, then:
• x* means zero or more occurrence of x.
• i.e., it can generate { e, x, xx, xxx, xxxx, … }
• x+ means one or more occurrence of x.
• i.e., it can generate { x, xx, xxx, xxxx … } or x.x*
• x? means at most one occurrence of x
• i.e., it can generate either {x} or {e}.
• [a-z] is all lower-case alphabets of English language.
• [A-Z] is all upper-case alphabets of English language.
• [0-9] is all natural digits used in mathematics.
Regular Expression – Examples
Regular Expression – Examples

UNIX style Regular Expression

Regular Expression – Examples

UNIX style Regular Expression

Regular Expression – Examples
UNIX style Regular Expression examples with outcomes
Regular Expression – Examples
UNIX style Regular Expression examples with outcomes
Regular Expression – Matching simple expressions
Most characters match themselves. The only exceptions are called special characters:
• asterisk (*),
• plus sign (+),
• question mark (?),
• backslash (\), matches but not
• period (.), This expression...
this... this...
• caret (^), a a b
• square brackets ([ and ]), \.\* .* dog
• dollar sign ($),
ABCDEF
• ampersand (&). 100 100
G
• or sign (|).

To match a special character, precede it with a backslash, like this \*.

Regular Expression – Matching any character
A period (.) matches any character except a newline character.

matches but not

This expression...
this... this...
.art dart art
cart hurt
tart dark
Regular Expression – Repeating expressions
You can repeat expressions with an asterisk or plus sign.
• A regular expression followed by an asterisk (*) matches zero or more occurrences of the regular
expression.
• A regular expression followed by a plus sign (+) matches one or more occurrences of the one-character
regular expression.
• A regular expression followed by a question mark (?) matches zero or one occurrence of the one-
character regular expression.
This expression... matches this... but not this...
a+b ab b
aaab baa
a*b b daa
ab
aaab
.*cat cat dog
9393cat
the old cat
c7sb@#puiercat
So to match any series of zero or more characters, use ".*"
Regular Expression – Grouping expressions
If an expression is enclosed in parentheses (( and )), the editor treats it as one expression and applies
any asterisk (*) or plus (+) to the whole expression.

This
matches this... but not this...
expression...
(ab)*c abc ababab
ababababc ababd
(.a)+b xab b
ra5afab aagb
Regular Expression – Choosing one character from many
• A string of characters enclosed in square brackets ([]) matches any one character in that string.
• If the first character in the brackets is a caret (^), it matches any character except those in the
string.
• For example, [abc] matches a, b, or c, but not x, y, or z.
• However, [^abc] matches x, y, or z, but not a, b, or c.
• A minus sign (-) within square brackets indicates a range of consecutive ASCII characters.
• For example, [0-9] is the same as [0123456789].
• If a right square bracket is immediately after a left square bracket, it does not terminate the string but is
considered to be one of the characters to match.
• If any special character, such as backslash (\), asterisk (*), or plus sign (+), is immediately after the left
square bracket, it doesn't have its special meaning and is considered to be one of the characters to
match.
Regular Expression – Choosing one character from many

This expression... matches this... but not this...

[aeiou][0-9] a6 ex
i3 9a
u2 $6
[^cfl]og dog cog
bog fog
END[.] END. END;
END DO
ENDIAN
Regular Expression – Matching the beginning or end of a line
• You can specify that a regular expression match only the beginning or end of the line. These are called
anchor characters:
• If a caret (^) is at the beginning of the entire regular expression, it matches the beginning of a line.
• If a dollar sign ($) is at the end of the entire regular expression, it matches the end of a line.
• If an entire regular expression is enclosed by a caret and dollar sign (^like this$), it matches an entire
line.

This expression... matches this... but not this...

^(the cat).+ the cat runs see the cat run
.+(the cat)$ watch the cat the cat eats
The Lex and Flex Scanner Generators

• lex and its newer cousin flex are scanner generators

• Input is a set of regular expressions and associated actions (written in C).
• Output is table-driven scanner (lex.yy.c)
• flex: an open source implementation of the original UNIX lex utility

28
The Lex and Flex Scanner Generators
Creating a Lexical Analyzer with Lex and
Flex

lex
source lex or flex
lex.yy.c
program compiler
lex.l

C
lex.yy.c a.out
compiler

input sequence
a.out
stream of tokens

29
The Lex and Flex Scanner Generators
Lex Specification

A LEX program consists of three sections : Declarations, Rules and Auxiliary functions
DECLARATIONS
%%
RULES
%%
AUXILIARY FUNCTIONS
30
The Lex and Flex Scanner Generators
The Lex and Flex Scanner Generators
The Lex and Flex Scanner Generators
Lex Specification
Declarations
• The declarations section consists of two parts, auxiliary
declarations and regular definitions.

• The auxiliary declarations are copied as such by LEX to the

output lex.yy.c file. This C code consists of instructions to the C
compiler and are not processed by the LEX tool.

• The auxiliary declarations (which are optional) are written in C

language and are enclosed within ' %{ ' and ' %} ' . It is generally used
to declare functions, include header files, or define global variables
and constants.

33
The Lex and Flex Scanner Generators
Lex Specification
Rules
Rules in a LEX program consists of two parts :
1. The pattern to be matched
2. The corresponding action to be executed

• LEX obtains the regular expressions of the symbols

'number' and 'op' from the declarations section and
generates code into a function yylex() in
the lex.yy.c file.

• This function checks the input stream for the first

match to one of the patterns specified and executes
code in the action part corresponding to the
pattern.

34
The Lex and Flex Scanner Generators
Regular Expressions in Lex
x match the character x
\. match the character .
“string”match contents of string of characters
. match any character except newline
^ match beginning of a line
$ match the end of a line
[xyz] match one character x, y, or z (use \ to escape -)
[^xyz]match any character except x, y, and z
[a-z] match one of a to z
r* closure (match zero or more occurrences)
r+ positive closure (match one or more occurrences)
r? optional (match zero or one occurrence)
r1r2 match r1 then r2 (concatenation)
r1|r2 match r1 or r2 (union)
(r) grouping
{d} match the regular expression defined by d
35
The Lex and Flex Scanner Generators
Lex Specification
Auxiliary functions
• LEX generates C code for the rules specified in the Rules
section and places this code into a single function
called yylex().

• In addition to this LEX generated code, the programmer

may wish to add his own code to the lex.yy.c file.

• The auxiliary functions section allows the programmer to

achieve this.

• The auxiliary declarations and auxiliary functions are

copied as such to the lex.yy.c file

• Once the code is written, lex.yy.c maybe generated using the

command lex "filename.l" and compiled as gcc lex.yy.c

36
The Lex and Flex Scanner Generators
Lab Assignment # 2
Write a lex file to check whether user enter a VALID
operator or INVALID operator. Output Should look like
below;

Deadline: ???

For Assignment:
1. https://www.youtube.com/watch?v=54bo1qaHAfk
2. https://www.youtube.com/watch?v=ilwXAchl4uw37
3. https://codedost.com/flex/flex-programs/
Regular Expression
Representation occurrence of symbols using regular expressions
• letter = [a – z] or [A – Z]
• digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 or [0-9]
• sign = [ + | - ]

Representation of language tokens using regular expressions

• Decimal = (sign)?(digit)+
• Identifier = (letter)(letter | digit)*

• The only problem left with the lexical analyzer is how to verify the validity of a regular expression
used in specifying the patterns of keywords of a language.
• A well-accepted solution is to use finite automata for verification.
Finite Automata
• Finite automata is a state machine that takes a string of symbols as input and changes its
state accordingly.
• Finite automata is a recognizer for regular expressions.
• When a regular expression string is fed into finite automata, it changes its state for each literal.
• If the input string is successfully processed and the automata reaches its final state, it is
accepted, i.e., the string just fed was said to be a valid token of the language in hand.

The mathematical model of finite automata consists of:

• Finite set of states (Q)

• Finite set of input symbols (Σ)
• One Start state (q0)
• Set of final states (qf)
• Transition function (δ)

The transition function (δ) maps the finite set of state (Q) to a finite set of input symbols (Σ),
Q×Σ➔Q
Finite Automata Construction
Let L(r) be a regular language recognized by some finite automata (FA).

•States : States of FA are represented by circles. State names are written inside circles.
•Start state : The state from where the automata starts, is known as the start state. Start state
has an arrow pointed towards it.
•Intermediate states : All intermediate states have at least two arrows; one pointing to and
another pointing out from them.
•Final state : If the input string is successfully parsed, the automata is expected to be in this state.
Final state is represented by double circles.
•Transition : The transition from one state to another state happens when a desired symbol in the
input is found. Upon transition, automata can either move to the next state or stay in the same
state. Movement from one state to another is shown as a directed arrow, where the arrows points
to the destination state. If automata stays on the same state, an arrow pointing from a state to itself
is drawn.
Finite Automata Construction - Example
We assume FA accepts any three digit binary value ending in digit 1.
FA = {Q(q0, qf), Σ(0,1), q0, qf, δ}
Finite Automata

State Graphs
A state

The start state

An accepting
state
42
Finite Automata

State Graphs
a

A transition

43
Finite Automata

• A finite automaton accepts a string

if we can follow transitions
labelled with characters in the
string from start state to some
accepting state.

44
Finite Automata - Example

A FA that accepts only “1”

45
Finite Automata - Example

• A FA that accepts any number of 1’s followed by a

single 0

1
0

46
Finite Automata - Example

• A FA that accepts ab*a

• Alphabet: {a,b}

b
a a

47
Finite Automata – Transition Table

48
Nondeterministic Finite Automaton (NFA)

• NFA stands for non-deterministic finite automata. It is easy to construct an NFA than DFA for a given
regular language.
• The finite automata are called NFA when there exist many paths for specific input from the current state
to the next state.
• Every NFA is not DFA, but each NFA can be translated into DFA.
• NFA is defined in the same way as DFA but with the following two exceptions, it contains multiple next
states, and it contains ε transition.
Nondeterministic Finite Automaton (NFA)

• We can see that from state q0 for input a, there are two next states q1 and q2, similarly, from q0 for input b, the next
states are q0 and q1.

• Thus it is not fixed or determined that with a particular input where to go next. Hence this FA is called non-deterministic
finite automata.
Comparison of NFA/DFA
• One transition per input per state.
• No e – moves
• Can take only one path through the state graph.
• Completely determined by input.
• DFAs are easier to implement – table driven.
• NFAs and DFAs recognize the same set of languages (regular languages)
• For a given language, the NFA can be simpler than the DFA.
• DFA can be exponentially larger than NFA.
• NFAs are the key to automating RE → DFA construction.
RE to Finite Automata
• We can use Thompson's Construction to find out a Finite Automaton from a Regular Expression.
• We will reduce the regular expression into smallest regular expressions and converting these to
NFA, combine NFAs with e moves and finally to DFA.
• Some basic RA expressions are the following −

Case 1 − For a regular expression ‘a’, we can construct the following FA −

Case 2 − For a regular expression ‘ab’, we can construct

the following FA −

52
RE to Finite Automata
Case 3 − For a regular expression (a+b), we can construct the following FA −

Case 4 − For a regular expression (a+b)*, we can construct the following FA −

RE to Finite Automata - Examples
RE to Finite Automata - Examples
building NFA for a ( b|c )*

building NFA for a ( b|c)

Conversion from NFA to DFA
• In this section, we will discuss the method of converting NFA to its equivalent DFA.
• In NFA, when a specific input is given to the current state, the machine goes to multiple states.
• It can have zero, one or more than one move on a given input symbol.
• On the other hand, in DFA, when a specific input is given to the current state, the machine goes to only
one state.
• DFA has only one move on a given input symbol.

• Let, M = (Q, ∑, δ, q0, F) is an NFA which accepts the language L(M).

• There should be equivalent DFA denoted by M' = (Q', ∑', q0', δ', F') such that L(M) = L(M').

Steps for converting NFA to DFA:

Step 1: Initially Q' = ϕ

Step 2: Add q0 of NFA to Q'. Then find the transitions from this start state.
Step 3: In Q', find the possible set of states for each input symbol. If this set of states is not in
Q', then add it to Q'.

Step 4: In DFA, the final state will be all the states which contain F(final states of NFA)
Conversion from NFA to DFA- Example - 1
Now we will obtain δ' transition for state q0.
Convert the given NFA to DFA.

Now we will obtain δ' transition for state q1.

Solution: For the given transition diagram we will first construct the
transition table
Now we will obtain δ' transition for state q1.

State 0 1

→q0 q0 q1
q1 {q1, q2} q1
*q2 q2 {q1, q2}
Conversion from NFA to DFA- Example - 1
Now we will obtain δ' transition for state [q1,q2].
Convert the given NFA to DFA.

State 0 1

→q0 q0 q1
q1 {q1, q2} q1
*q2 q2 {q1, q2}
Conversion from NFA to DFA- Example - 1
Convert the given NFA to DFA.

transition table for the constructed DFA

State 0 1

→[q0] [q0] [q1]

[q1] [q1, q2] [q1]
*[q2] [q2] [q1, q2]
*[q1, q2] [q1, q2] [q1, q2] The state q2 can be eliminated because q2 is an unreachable
state.
Conversion from NFA to DFA- Example - 2
Now we will obtain δ' transition for state q1.
Convert the given NFA to DFA.

Now we will obtain δ' transition for state q0,q1.

State 0 1

→q0 {q0, q1} {q1}

*q1 ϕ {q0, q1}

Now we will obtain δ' transition for state q0.

Conversion from NFA to DFA- Example - 2
Convert the given NFA to DFA.

transition table for the constructed DFA

State 0 1

→[q0] [q0, q1] [q1]

*[q1] ϕ [q0, q1]
*[q0, q1] [q0, q1] [q0, q1]

With these new names the DFA will be as follows:

RE to Finite Automata – A complete example
Design a FA from given regular expression 10 + (0 + 11)0* 1.
RE to Finite Automata - Examples
Design a FA from given regular expression 10 + (0 + 11)0* 1.

Transition Table
RE to Finite Automata - Examples
Design a FA from given regular expression 10 + (0 + 11)0* 1.

Equivalent DFA will be

RE to Finite Automata - Examples
Theory Assignment No. 1

Convert following DFA/NFA to RE- show all steps

Deadline: ???

Linz - Chapter 2 & 3 - Exer PDF
No ratings yet
Linz - Chapter 2 & 3 - Exer PDF
25 pages
Chapter 2 - Lexical Analysis_Regular Expressions(1)
No ratings yet
Chapter 2 - Lexical Analysis_Regular Expressions(1)
27 pages
Lexical Analysis
No ratings yet
Lexical Analysis
57 pages
ch3 M.PPTX - 0
No ratings yet
ch3 M.PPTX - 0
46 pages
Lecture02 Scanning 1
No ratings yet
Lecture02 Scanning 1
72 pages
Unit22pdf 2021 03 13 13 38 11
No ratings yet
Unit22pdf 2021 03 13 13 38 11
114 pages
Lexical Analyzer 1
No ratings yet
Lexical Analyzer 1
37 pages
Lexical Analyzer 2023
No ratings yet
Lexical Analyzer 2023
38 pages
Compiler Design Unit-1 - 4
No ratings yet
Compiler Design Unit-1 - 4
4 pages
Compiler Design Chapter-2
60% (5)
Compiler Design Chapter-2
105 pages
Chapter2-Lexical Analysis
No ratings yet
Chapter2-Lexical Analysis
64 pages
Chapter THREE
No ratings yet
Chapter THREE
24 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
56 pages
Chap-2 2 (RegularExpression)
No ratings yet
Chap-2 2 (RegularExpression)
46 pages
Language About Complier Construction
No ratings yet
Language About Complier Construction
23 pages
CD ch2
No ratings yet
CD ch2
104 pages
Chapter 7 Lexical Analysis
No ratings yet
Chapter 7 Lexical Analysis
61 pages
Chapter Two (3) (Autosaved)
No ratings yet
Chapter Two (3) (Autosaved)
29 pages
Lect2 Lexical
No ratings yet
Lect2 Lexical
9 pages
Lecture 3a and 3b
No ratings yet
Lecture 3a and 3b
21 pages
2. Regular Expressions
No ratings yet
2. Regular Expressions
4 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
69 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
CD_UNIT-2
No ratings yet
CD_UNIT-2
64 pages
WINSEM2023-24_CSI2005_TH_VL2023240501823_2024-01-08_Reference-Material-I
No ratings yet
WINSEM2023-24_CSI2005_TH_VL2023240501823_2024-01-08_Reference-Material-I
23 pages
Chapter 2
No ratings yet
Chapter 2
56 pages
2_Lexical Analysis
No ratings yet
2_Lexical Analysis
52 pages
Lexical Analysis
No ratings yet
Lexical Analysis
41 pages
2_scanning-slides-sanyal-part2
No ratings yet
2_scanning-slides-sanyal-part2
14 pages
Ch3 Modified
No ratings yet
Ch3 Modified
80 pages
Ch3 - Lexical Analysis
No ratings yet
Ch3 - Lexical Analysis
52 pages
Module 3
No ratings yet
Module 3
7 pages
Specification of Tokens
No ratings yet
Specification of Tokens
21 pages
Lexical Analysis
No ratings yet
Lexical Analysis
31 pages
Compilers - Week 2
No ratings yet
Compilers - Week 2
14 pages
Lexical Analysis
No ratings yet
Lexical Analysis
47 pages
Lexical Analysis
No ratings yet
Lexical Analysis
44 pages
Compiler Design Assignment
No ratings yet
Compiler Design Assignment
6 pages
COS 320 Compilers: David Walker
No ratings yet
COS 320 Compilers: David Walker
38 pages
Chapter-2[1]
No ratings yet
Chapter-2[1]
77 pages
Exercises For Section 3.3
No ratings yet
Exercises For Section 3.3
8 pages
SPECIFICATION OF TOKENS - Unit 1
No ratings yet
SPECIFICATION OF TOKENS - Unit 1
13 pages
Ch2+3 Compiler
No ratings yet
Ch2+3 Compiler
21 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
40 pages
Lexical Analysis-1
No ratings yet
Lexical Analysis-1
9 pages
2 Regular Expressions
No ratings yet
2 Regular Expressions
34 pages
cd1
No ratings yet
cd1
92 pages
Lexical Analyzer in Perspective: Parser Source Program Token
No ratings yet
Lexical Analyzer in Perspective: Parser Source Program Token
22 pages
TPL lect 15 - 16
No ratings yet
TPL lect 15 - 16
5 pages
Ss Lab Viva Questions
67% (3)
Ss Lab Viva Questions
3 pages
2 Regular Expression
No ratings yet
2 Regular Expression
23 pages
Chapter 3 - Regular Expressions
No ratings yet
Chapter 3 - Regular Expressions
49 pages
Regular Expression: Anab Batool Kazmi
No ratings yet
Regular Expression: Anab Batool Kazmi
32 pages
Ch3myppt
No ratings yet
Ch3myppt
59 pages
ch-2 Compiler Design
No ratings yet
ch-2 Compiler Design
9 pages
Regular Expressions: SESSION - 14 - 15 - 16
No ratings yet
Regular Expressions: SESSION - 14 - 15 - 16
42 pages
2 Lex
No ratings yet
2 Lex
45 pages
COMP3.RegEx
No ratings yet
COMP3.RegEx
10 pages
Chapter 2
No ratings yet
Chapter 2
91 pages
Compiler Design - Lexical Analysis: University of Salford, UK
No ratings yet
Compiler Design - Lexical Analysis: University of Salford, UK
1 page
Ian Talks Regex A-Z
From Everand
Ian Talks Regex A-Z
Ian Eress
No ratings yet
Cse RGPV Syllabus
No ratings yet
Cse RGPV Syllabus
54 pages
Linear Algebra (18BS4CS01) : 4 Sem - 2018 Batch
No ratings yet
Linear Algebra (18BS4CS01) : 4 Sem - 2018 Batch
17 pages
04 Transition Graphs
No ratings yet
04 Transition Graphs
32 pages
Compilers Design Syllabus 5-25-2017
No ratings yet
Compilers Design Syllabus 5-25-2017
3 pages
TE Syllabus SEM I 2019 Pattern
No ratings yet
TE Syllabus SEM I 2019 Pattern
15 pages
Adaptive Automata For Grammar Based Text Compression
No ratings yet
Adaptive Automata For Grammar Based Text Compression
11 pages
DSD Full Text
No ratings yet
DSD Full Text
140 pages
Quick Start Guide
No ratings yet
Quick Start Guide
4 pages
The Physical Signature of Computation A Robust Mapping Account 1st Edition Neal G. Anderson instant download
100% (1)
The Physical Signature of Computation A Robust Mapping Account 1st Edition Neal G. Anderson instant download
47 pages
Structure of A Typical Digital System
No ratings yet
Structure of A Typical Digital System
29 pages
CD Unit1
No ratings yet
CD Unit1
21 pages
VHDL
No ratings yet
VHDL
19 pages
CMP3008 LN6 PushdownAutomata PDF
No ratings yet
CMP3008 LN6 PushdownAutomata PDF
38 pages
Model JUnit Tutorial
No ratings yet
Model JUnit Tutorial
3 pages
ATC-21CS51 Module 1 To 5 Notes
No ratings yet
ATC-21CS51 Module 1 To 5 Notes
419 pages
Finite State Machines: State Diagrams vs. Algorithmic State Machine (ASM) Charts
No ratings yet
Finite State Machines: State Diagrams vs. Algorithmic State Machine (ASM) Charts
46 pages
Flat Unit 1 Notes
0% (1)
Flat Unit 1 Notes
18 pages
200 Problem Set 6
No ratings yet
200 Problem Set 6
7 pages
Using Modelsim To Simulate Logic Circuits in Verilog Designs
No ratings yet
Using Modelsim To Simulate Logic Circuits in Verilog Designs
31 pages
State Machine and Concurrent Process Model
100% (1)
State Machine and Concurrent Process Model
52 pages
Digital System Design Syllabus For EC 3 Sem 2018 Scheme - VTU CBCS 18EC34 Syllabus
100% (1)
Digital System Design Syllabus For EC 3 Sem 2018 Scheme - VTU CBCS 18EC34 Syllabus
2 pages
Online Hotel Management System
100% (1)
Online Hotel Management System
5 pages
Automata - Lecture19 - Moore and Mealy Machines
No ratings yet
Automata - Lecture19 - Moore and Mealy Machines
18 pages
Thiet Ke Vi Mach Voi HDL Pham Quoc Cuong Chapter6 FSM Verilog (Cuuduongthancong - Com)
No ratings yet
Thiet Ke Vi Mach Voi HDL Pham Quoc Cuong Chapter6 FSM Verilog (Cuuduongthancong - Com)
18 pages
Turing Machine Basics
No ratings yet
Turing Machine Basics
112 pages
FSM Design
100% (1)
FSM Design
24 pages
TOC Mid 1 Descriptive Paper
No ratings yet
TOC Mid 1 Descriptive Paper
3 pages
Laboratory Objectives: CS/EE 3710 - Computer Design Lab
No ratings yet
Laboratory Objectives: CS/EE 3710 - Computer Design Lab
5 pages
MCQs - CSE322
100% (1)
MCQs - CSE322
19 pages

CC 2

Uploaded by

CC 2

Uploaded by

Compiler Construction

• Identifiers: x y11 maxsize

Contains following tokens

Representing valid tokens of a language in regular expression

UNIX style Regular Expression

UNIX style Regular Expression

To match a special character, precede it with a backslash, like this \*.

matches but not

This expression... matches this... but not this...

This expression... matches this... but not this...

• lex and its newer cousin flex are scanner generators

• The auxiliary declarations are copied as such by LEX to the

• The auxiliary declarations (which are optional) are written in C

• LEX obtains the regular expressions of the symbols

• This function checks the input stream for the first

• In addition to this LEX generated code, the programmer

• The auxiliary functions section allows the programmer to

• The auxiliary declarations and auxiliary functions are

• Once the code is written, lex.yy.c maybe generated using the

Representation of language tokens using regular expressions

The mathematical model of finite automata consists of:

• Finite set of states (Q)

The start state

• A finite automaton accepts a string

A FA that accepts only “1”

• A FA that accepts any number of 1’s followed by a

• A FA that accepts ab*a

Case 1 − For a regular expression ‘a’, we can construct the following FA −

Case 2 − For a regular expression ‘ab’, we can construct

Case 4 − For a regular expression (a+b)*, we can construct the following FA −

building NFA for a ( b|c)

• Let, M = (Q, ∑, δ, q0, F) is an NFA which accepts the language L(M).

Steps for converting NFA to DFA:

Step 1: Initially Q' = ϕ

Now we will obtain δ' transition for state q1.

transition table for the constructed DFA

→[q0] [q0] [q1]

Now we will obtain δ' transition for state q0,q1.

→q0 {q0, q1} {q1}

Now we will obtain δ' transition for state q0.

transition table for the constructed DFA

→[q0] [q0, q1] [q1]

With these new names the DFA will be as follows:

Equivalent DFA will be

Convert following DFA/NFA to RE- show all steps

You might also like