0% found this document useful (0 votes)

16 views69 pages

Chapter 2 (Lexical Analysis)

Uploaded by

newsetup48

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views69 pages

Chapter 2 (Lexical Analysis)

Uploaded by

newsetup48

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 69

Lexical Analysis

A lexical analyser, or lexer for short, will as its input take a string of
individual letters and divide this string into tokens. Additionally, it will
filter out whatever separates the tokens (the so-called white-space),
i.e., lay-out characters (spaces, newlines etc.) and comments.

The main purpose of lexical analysis is to make life easier for the
subsequent syntax analysis phase.

1
Lexical Analysis

• The work that is done during lexical analysis can be made an integral
part of syntax analysis, and in simple systems this is indeed often
done. However, there are reasons for keeping the phases separate:

2
Lexical Analysis

 Efficiency: A lexer may do the simple parts of the work faster than the
more general parser can. Furthermore, the size of a system that is split in
two may be smaller than a combined system.

 Tradition: Languages are often designed with separate lexical and syntaxical
phases in mind, and the standard documents of such languages typically
separate lexical and syntaxical elements of the languages.

3
Lexical Analysis

• For lexical analysis, specifications are traditionally written using

regular expressions: An algebraic notation for describing sets of
strings.

• The generated lexers are in a class of extremely simple programs

called finite automata.

4
Example of Natural language
1- Introduction: Structure of natural language
Three level approach
- alphabet ={letters} Alphabet of
= {a,b, …z} letters

- building words by concatenation of letters Vocabulary of

cedar is the concatenation of c and e and d and a and r words ={words}

-construction of sentences according to rules: French Language

subject verb complement ={sentences}
the computer makes calculations
is a correct sentence of the French language

French language = { sentences}

- usefulness of the language:

dialogue between people

5
Example of Formal languages
Computer languages = programming languages

Building a programming language :

3-level approach
- choosing an alphabet  ={symbols}
Alphabet of
example : ={a,…,z,0,…9,$, -, _, …}
symbols
={Keyboard chracters}

-construction of words by concatenation of symbols

language of words (keywords)
Vocabulary of
examples of words: int, char, scanf, printf, for,
words={words}
keyword language = {int, char, scanf, printf,
for, …}
- Construction of instructions according to rules: :
instruction language = programming language
sample rules: int identifier; printf("format", Instruction
identifier);example statements: int i; printf("format",i); Language
={instructions}

6
Example of Formal languages

Computer languages = programming languages

- usefulness of programming languages

dialogue between a person and a
computer

7
Example of Formal languages
Computer languages = formal languages

Construction of a formal language:

3-level approach
- choice of an alphabet  ={symbols} Alphabet of
example : ={0,1} symbols

a single operation : the concatenation . Construction

0.1=01 of words
0.0.1.0=0010
0.=0 ( is the neutral element for concatenation)

- construction of words by concatenation of symbols language of

words language words={words}
Example: the word 100 is obtained by concatenating 1 with 0 with 0
The word 110 is obtained by concatenating 1 with 1 with 0
…
language of binary numbers = {0, 1, 00, 01, 10, 11, 000, …}

8
Example of Formal languages
Another example:
alphabet: ={a,b}
words : a.b.a=aba
a.=a
Definition : A word is obtained by concatenating various symbols of the
alphabet

Definition : The free monoid * o n a n alphabet  is the set of all words

obtained by concatenating of various symbols of the alphabet

Example : ={a,b}
*={,a,b,aa,ab,ba,bb,aaa,aab,aba, …}

Definition : A langage L on an alphabet  is a subset of , L 

Example : ={a,b}
L1={,a,aa,aaa,…} The set off words built with 0 or n times a
L2={,ab,abab,ababab,…} The set off words built with 0 or n times
ab
9
Example of Formal languages
2- Operations on words:

Definition : The length of a word noted || is the number of symbols that

constitutes
Example : |abbab|=5

We can count the number of a particular symbol in a word |abbab|a=2

Note: Concatenation is non-commutative:

abba

Definition : The mirror image of a word, denoted ~, consists in reversing the order of the
symbols of this word
Example : m=abaaab et m ~ =baaaba

Exercise : Given ={a,b} and L is the set of words  having the form mm ~.

Give eight words of the language L.

10
Example of Formal languages
Steps:
Alphabet

concatenation

word

Set of words

Language

- * is the set or all words

- Each language L on  verifie : L*

11
Example of Formal languages
3- Operations on languages:


complement : L ={m / m* et mL}
union : L '  L'' ={m / mL' or mL''}
intersection : L'  L'' ={m / mL' and mL''}
Product/ concatenation : L'.L'' ={m / m =m'm'' such as m'L' and m''L''}
produit cartésien : L'L'' ={(m',m'')/ m'L' and m''L''}
power : Li ={m / m=m1 … mi t.q m1L, …, miL}
Star : L* = {}  L  L2 … Li  …

12
Example of Formal languages

Exercise 1: Given  = {a,b} and L= {anbm/n>=0 and m>=0} a language .

Question 1.1: Compare L with *.

Exercise 2: Given  = {a,b} and L1= {anbm/n>=0 et m>=0}, L2=

{(ab)n/ n>=0} two languages .
Question 2.1: Give five of L1 and five words of L2.
Question 2.2: determine L1  L2.
Question 2.2: Compare L1 and L2.

Exercise 3: Given  = {a,b} and L isa langage. Prove que (L)=L*.

13
Example
4-Belonging Problem:
of Formal languages
Given a language L and a word m, how to test if mL?
Difficulty : L is an infinite langage  L is not implementable by an
in-memory data structure
Solution : Describe the infinite language by a finite and
programmable formalism
world m

Language L construction Automate A such that L(A)=L

mL or
mL

14
Example of Formal languages
5- The automaton

Definition: An automaton is an abstract machine with states which

recognizes words composed with symbols of an alphabet

Definition : an automaton is a 5-uplet A=<, Q, q0, F, > where:

 is the alphabet
Q is the set of states
q0 is the initial state
F is the set of terminal states
 is the transition function
:QQ

15
Example of Formal languages
Example: A=<={a,b}, Q ={q0, q1}, q0, F={q1}, ={(q0,a)=q0, (q0,b)=q1}>
The transition function can be represented by a table :
 a b
q0 q0 q1

The transition function can be represented by a graph :

b
q0 q1

The automaton can be represented by a graph with one entry and

many outputs:
a
q0 q1
b

16
Example of Formal languages
6- Words recognized by an automaton:
Extension of 

 is defined on the symbols of  :

:QQ

Question :How to extend  on the words of *:

 : Q  *  Q
m* with m=xm’, x and m’* and qQ, alors
(q,m)= (q,xm’)= ((x,q),m’)

Example : (q0,aab)= ((a,q0),ab)= (q0,ab)=((q0, a),b)=(q0,b)=q1

As q1F so aab is recognized by the automaton or aabL(A).

Le langage recognized by the automaton A is:

L(A)={m/ m* and (m,q0)F}.

Example : L(A)={b,ab,aab,aaab, …}={anb/n>=0} 17

Example of Formal languages
Example: A=< ={a,b,c}, Q={q0,q1},q0, F={q0,q1}, ={(q0,a)=q0,(q0,c)=q1,
(q1,b)=q1>
The transition function can be represented by a table:
 a b c
q0 q0 q1
q1 q1

The transition function can be represented by a graph:

a b

c
q0 q1

The automaton can be represented with the same graph with one entry and
many outputs:
a b

c
q0 q1

18
Example of Formal languages
The word recognized by the automaton A
a b

c
q0 q1

are :
the empty word  because q0 is both initial and terminal
states
The words without letters b : ac, aac, aaac, …
The words without letters a : cb, cbb, cbbb, …
The words with letters a and b :
acb,acbb, acbbb, …
aacb, aacbb, aacbbb, …
…

L(A)={ancbm/n>=0, m>=0}

Définition : For an automaton A, its language L(A) is called rational language

19
Example of Formal languages
Example : A=< ={a,b,c}, Q={q0,q1},q0, F={q0,q1}, ={(q0,a)=q0, (q0,b)=q0,
(q0,c)=q1> :
b
a
c
q0 q1

The words recognized by the automaton A are :

c, ac, bc, aac, abc, bac, bbc, aaac, aabc, abac, baac, abbc, babc, bbac, …

20
Formal languages
7- Implémentation du test d’appartenance:

word m
Program?

Language L construction
Automat A such as L(A)=L

mL or
mL
Problem : How especially to implement the function ?
(q,m)=q’
How to implement the full automaton?

 : Q  *  Q
m* with m=xm’, x and m’* and qQ, then
(q,m)= (q,xm’)= ((x,q),m’)

21
Example of Formal languages
Implementation of the automaton:

The variables of the automaton

Program
The automaton tests if an input word belongs to a
language variable word
The automaton performs the transitions
(q,a)=q’: State variable :
Symbol variable

Initialization of the l’automaton :

word input : scanf(mot)
initialization of the symbol variable :
symbol=mot[0] state variable initialization :
State=q0
22
Example of Formal languages
/ ** Implementation of * **/
Repeat function *
/** implementation of  **/
switch state of
case q0: transitions over q0 function 
case q1: transitions over q1
…
case qn: transitions over qn

/ ** Implementation de * **/
Repeat
/** implementation of  **/
switch state of
case q0: if (*symbol='a') {state=(q0,a); break;}
else if (*symbol='b') {state=(q0,b); break;}
else {printf("the word is not in the language"); exit(1); }
case q1: transitions over q1
…

case qn: transitions over qn

23
Languages
Example : Write the program of the automat

a b

q0 c q1

24
Example of Formal languages
char word[30]; char symbol; int state;
main(){
scanf("%",word);
symbol=word;
state=q0;
while(1) {
Switch state of {
case q0: if (*symbol='a') {state=q0; break;}
else if (*symbol='c') {state=q1; break;}
else {printf("Lexical error"); exit(1); }
case q1: if (*symbol='b') {state=q1; break;}
else if (*symbol='\0') {printf("The word is in the language");
exit(1); }
else {printf("Lexical error"); exit(1); }
}
/** go to the next symbol **/ symbol++;
}
}
25
Regular expressions

The set of all integer constants or the set of all variable names are
sets of strings, where the individual letters are taken from a particular
alphabet. Such a set of strings is called a language.

 For integers, the alphabet consists of the digits 0-9 and for variable
names the alphabet contains both letters and digits.

26
Regular expressions

Given an alphabet, we will describe sets of strings by regular

expressions, an algebraic notation that is compact and easy for
humans to use and understand.

The idea is that regular expressions that describe simple sets of

strings can be combined to form regular expressions that describe
more complex sets of strings.

27
Regular expressions

When talking about regular expressions, we will use the letters (r, s
and t) in italics to denote unspecified regular expressions. When
letters stand for themselves (i.e., in regular expressions that describe
strings that use these letters) we will use typewriter font, e.g., a or b.

A single letter describes the language that has the one-letter string
consisting of that letter as its only element.

28
Regular expressions

29
Regular expressions
The symbol ε (the Greek letter epsilon) describes the language that consists
solely of the empty string. Note that this is not the empty set of strings.

s|t (pronounced “s or t”) describes the union of the languages described by s and
t.

st (pronounced “s t”) describes the concatenation of the languages L(s) and L(t),
i.e., the sets of strings obtained by taking a string from L(s) and putting this in
front of a string from L(t).

For example, if L(s) is {“a”, “b”} and L(t) is {“c”, “d”}, then L(st) is the set {“ac”, “ad”,
“bc”, “bd”}.
30
Regular expressions
• The language for s* (pronounced “s star”) is described recursively: It consists of
the empty string plus whatever can be obtained by concatenating a string from
L(s) to a string from L(s* ).
• This is equivalent to saying that L(s* ) consists of strings that can be obtained by
concatenating zero or more (possibly different) strings from L(s).
• If, for example, L(s) is {“a”, “b”} then L(s* ) is {“”, “a”, “b”, “aa”, “ab”, “ba”, “bb”,
“aaa”, . . . }, i.e., any string (including the empty) that consists entirely of as and
bs.

31
Precedence rules
• We combine different constructor symbols, e.g., in the regular expression a|ab* , it is
not a priori clear how the different subexpressions are grouped. We can use
parentheses to make the grouping of symbols explicit such as in (a|(ab))* .

• Additionally, we use precedence rules, similar to the algebraic convention that

3+ 4 ∗ 5 means 3 added to the product of 4 and 5 and not multiplying the sum of 3
and 4 by 5.

• For regular expressions, we use the following conventions: ∗ binds tighter than
concatenation, which binds tighter than alternative (|). The example a|ab* from
above, hence, is equivalent to a|(a(b* )).
32
Precedence rules

• The | operator is associative and commutative (as it corresponds to

set union, which has these properties).

• Concatenation is associative (but obviously not commutative) and

distributes over |.

33
Short hands

If we want to describe non-negative integer constants, we can do so

by saying that it is one or more digits, which is expressed by the
regular expression (0|1|2|3|4|5|6|7|8|9)(0|1|2|3|4|5|6|7|8|9)*.

It gets even worse when we get to variable names, where we must
enumerate all alphabetic letters.

34
Short hands
We introduce a shorthand for sets of letters. Sequences of letters within
square brackets represent the set of these letters. For example, we use
[ab01] as a shorthand for a|b|0|1.

Additionally, we can use interval notation to abbreviate [0123456789] to

[0-9].

We can combine several intervals within one bracket and for example write
[a-zA-Z] to denote all alphabetic letters in both lower and upper case.

35
Short hands
• Getting back to the example of integer constants above, we can now write this
much shorter as [0-9][0-9]* . Since s* denotes zero or more occurrences of s, we
needed to write the set of digits twice to describe that one or more digits are
allowed.
• Such non-zero repetition is quite common, so we introduce another shorthand,
s+, to denote one or more occurrences of s. With this notation, we can abbreviate
our description of integers to [0-9]+.
• On a similar note, it is common that we can have zero or one occurrence of
something (e.g., an optional sign to a number). Hence we introduce the
shorthand s? for s|ε.
• + and ? bind with the same precedence as ∗ . 36
Properties of Regular expression

37
Examples
• Keywords. A keyword like if is described by a regular expression that
looks exactly like that keyword, e.g., the regular expression if (which is
the concatenation of the two regular expressions i and f).

• Variable names. In the programming language C, a variable name

consists of letters, digits and the underscore symbol and it must begin
with a letter or underscore. This can be described by the regular
expression [a-zA-Z_][a-zA-Z_0-9]* .

38
Examples
• Integers. An integer constant is an optional sign followed by a non-empty
sequence of digits: [+-]?[0-9]+. In some languages, the sign is a separate
symbol and not part of the constant itself. This will allow whitespace
between the sign and the number, which is not possible with the above.

• Floats. A floating-point constant can have an optional sign. After this, the
mantissa part is described as a sequence of digits followed by a decimal
point and then another sequence of digits.

39
Examples
• Finally, there is an optional exponent part, which is the letter e (in upper or lower
case) followed by an (optionally signed) integer constant.
• If there is an exponent part to the constant, the mantissa part can be written as an
integer constant (i.e., without the decimal point). Some examples: 3.14, -3., .23, 3e+4
11.22e-3.
• We can make the description simpler if we make the regular expression for floats
also include integers, and instead use other means of distinguishing integers from
floats.
• If we do this, the regular expression can be simplified to:
[+-]?( ([0-9]+ (.[0-9]*)?|.[0-9]+) ([eE][+-]?[0-9]+)?)
40
Examples
• String constants. A string constant starts with a quotation mark followed
by a sequence of symbols and finally another quotation mark.
• There are usually some restrictions on the symbols allowed between the
quotation marks. For example, line-feed characters or quotes are typically
not allowed, though these may be represented by special “escape”
sequences of other characters, such as "\n\n" for a string containing two
line-feeds.
"([a-zA-Z0-9]|\[a-zA-Z])* "

41
Nondeterministic finite automata (NFA)
• A finite automaton is, in the abstract sense, a machine that has a finite number of
states and a finite number of transitions between these. A transition between
states is usually labelled by a character from the input alphabet, but we will also
use transitions marked with ε, the so-called epsilon transitions.

• A finite automaton can be used to decide if an input string is a member in some

particular set of strings. To do this, we select one of the states of the automaton
as the starting state. We start in this state and in each step, we can do one of the
following:

42
Nondeterministic finite automata

• Follow an epsilon transition to another state, or

• Read a character from the input and follow a transition labelled by

that character.

When all characters from the input are read, we see if the current state
is marked as being accepting. If so, the string we have read from the
input is in the language defined by the automaton.

43
Nondeterministic finite automata

• We may have a choice of several actions at each step: We can choose

between either an epsilon transition or a transition on an alphabet
character, and if there are several transitions with the same symbol,
we can choose between these.

• This makes the automaton nondeterministic, as the choice of action

is not determined solely by looking at the current state and input.

44
Nondeterministic finite automata

• Definition 2.1 A nondeterministic finite automaton consists of a set S of states.

One of these states, s0 ∈ S, is called the starting state of the automaton and a
subset F ⊆ S of the states are accepting states. Additionally, we have a set T of
transitions.
• Each transition t connects a pair of states s1 and s2 and is labelled with a symbol,
which is either a character c from the alphabet Σ, or the symbol ε, which indicates
an epsilon-transition.
• A transition from state s to state t on the symbol c is written as sct.

45
Nondeterministic finite automata
• We will mostly use a graphical notation to describe finite automata. States
are denoted by circles, possibly containing a number or name that
identifies the state.
• This name or number has, however, no operational significance, it is solely
used for identification purposes.
• Accepting states are denoted by using a double circle instead of a single
circle. The initial state is marked by an arrow pointing to it from outside the
automaton.
46
Nondeterministic finite automata
• A transition is denoted by an arrow connecting two states. Near its midpoint, the
arrow is labelled by the symbol (possibly ε) that triggers the transition. Note that
the arrow that marks the initial state is not a transition and is, hence, not marked
by a symbol.
• Figure 2.3 shows an example of a nondeterministic finite automaton having three
states. State 1 is the starting state and state 3 is accepting. There is an epsilon
transition from state 1 to state 2, transitions on the symbol a from state 2 to
states 1 and 3 and a transition on the symbol b from state 1 to state 3.
• This NFA recognises the language described by the regular expression a*(a|b).
• As an example, the string aab is recognised by the following sequence of
transitions: 47
Nondeterministic finite automata

48
Nondeterministic finite automata

At the end of the input we are in state 3, which is accepting. Hence, the string is
accepted by the NFA. You can check this by placing a coin at the starting state and
follow the transitions by moving the coin.

49
Nondeterministic finite automata

• If we in the example above had chosen to follow the a-transition to

state 3 instead of state 1, we would have been stuck: We would have
no legal transition and yet we would not be at the end of the input.

• But, as previously stated, it is enough that there exists a path

leading to acceptance, so the string aab is still accepted.

51
Nondeterministic finite automata
• A program that decides if a string is accepted by a given NFA will have to check all
possible paths to see if any of these accepts the string.
• This requires either backtracking until a successful path found or simultaneously
following all possible paths, both of which are too time-consuming to make NFAs
suitable for efficient recognisers.
• We will, hence, use NFAs only as a stepping stone between regular expressions
and the more efficient DFAs. We use this stepping stone because it makes the
construction simpler than direct construction of a DFA from a regular expression.

52
Converting a regular expression to an NFA
• We will construct an NFA compositionally from a regular expression,
i.e., we will construct the NFA for a composite regular expression
from the NFAs constructed from its subexpressions.

• To be precise, we will from each subexpression construct an NFA

fragment and then combine these fragments into bigger fragments.

• A fragment is not a complete NFA, so we complete the construction

by adding the necessary components to make a complete NFA.

53
Converting a regular expression to an NFA

• To be precise, we will from each subexpression construct an NFA

fragment and then combine these fragments into bigger fragments.

• A fragment is not a complete NFA, so we complete the construction

by adding the necessary components to make a complete NFA.

54
55
NFA for the regular expression (a|b)*ac

56
Optimisations
We can use the construction in figure 2.4 for any regular expression
by expanding out all shorthand, e.g. converting s+ to ss* , [0-9] to
0|1|2|···|9 and s? to s|ε, etc.

The optimised constructions are shown in figure 2.6. As an example,

an NFA for [0-9]+ is shown in figure 2.7.

Note that while this is optimised, it is not optimal. You can make an
NFA for this language using only two states.

57
58
Example of Optimised NFA for [0-9]+

59
Deterministic finite automata

• Nondeterministic automata are, as mentioned earlier, not quite as

close to “the machine” as we would like. Hence, we now introduce a
more restricted form of finite automaton:

• The deterministic finite automaton, or DFA for short. DFAs are NFAs,
but obey a number of additional restrictions:

60
Deterministic finite automata
• There are no epsilon-transitions.
• There may not be two identically labelled transitions out of the same state
This means that we never have a choice of several next-states: The state
and the next input symbol uniquely determine the transition (or lack of
same).
This is why these automata are called deterministic. Figure 2.8 shows a DFA
equivalent to the NFA in figure 2.3

61
Example of a DFA

62
Examples of Automata and Regular
Introduction
Expressions
Question: Does an algebraic formalism exist to represent the rational
language other than the formalism of the automata ?

Example : L(A)={ancbm/n>=0, m>=0}. How to représente L?

Answer: L can be represented by the regular expression L=acb.

In addition to the representation by the automaton: :
a b

c
q0 q1

What is a regular expression?

63
Examples of Automata and Regular
Introduction
Expressions
Definition:The formalism of rational expressions is an algebraic
formalism allowing to represent part of the languages built on a given
alphabet and which are called rational languages.

Example : Given ={a,b} an alphabet.

The language L1= {anbm/n>=0 et m>=0} is represented by the rational
expression a*b*
The langage L2= {(ab)n/ n>=0} is represented by the rational expression
(ab)*
The language L3= * is represented by the regular expression (a|b)*
The language L4= {anbn/n>=0} cannot be represented with a regular
expression.

How to define a regular expression?

64
Examples of Automata and Regular
1- Regular expressions Expressions
Définition: Given  an alphabet. A regular expression is defined by:
RE Language automata
 is a RE L()={}
q0  q1

a L(a)={a}

q0 a q1
If r and s are RE
then
-r | s is a RE L(r|s)=L(r)  L(s) ?
-r s is a RE L(r.s)=L(r) . L(s) ?
-r* is a RE (L(r))* ?
-(r) is a RE (L((R))=L(R) ?

65
Examples of Automata
and Regular Expressions
Automata of r|s :

q1,s
q0,s
automata de s …

 qn,s

 q1,r
q0,r
automata de r …

qm,r

66
Examples of Automata
and Regular Expressions
Automata of r.s :

q1,s q1,r
q0,s  q0,r
automata of s … automata of r …

qn,s qm,r

67
Examples of Automata
and Regular Expressions
Automata of r*:

q1,r
q0 q0,r 
… qf
 automata of r

qm,r


68
Examples of Automaton and Regular
Expressions
1.1- Algorithm for switching from ER to automata
- The Automaton for the regular expression  is:
A=<{},{q0,q1},q0,{(q0,,q1)},{q1}>

- The Automata for the regular expression a is :

A=<{a},{q0,q1},q0,{(q0,a,q1)},{q1}>

- The Automata for r | s:

Ar=<r,Qr,qr,0,r,Tr> As=<s,Qs,qs,0,s,Ts>

Ar|s=<rs,QrQs{qrs,o}, qrs,o,
rs{(qrs,o, , qr,o),(qrs,o, , qs,o)}, Tr  Ts>

69
Examples of Automata and Regular
Expressions
- Automata for r . s:
Ar=<r,Qr,qr,0,r,Tr> As=<s,Qs,qs,0,s,Ts>

Ar.s=<rs,QrQs, qr,o,
rs{(q, , qs,o)/qTr}, Ts >

- Automata for r*:

Ar=<r,Qr,qr,0,r,Tr>

Ar=<r,Qr{qr,o , qr,f}, qr,o,

r{(q, , qr*,f)/qTr} 
{(qr*,o, , qr*,f),(qr*,f, , qr*,o)}, {qr*,f}>

Year 7 Maths Revision Booklet
100% (2)
Year 7 Maths Revision Booklet
20 pages
IGCSE Computer Science - 2210 - Chapter 4
No ratings yet
IGCSE Computer Science - 2210 - Chapter 4
22 pages
03CDT0902 - Eurotherm - 902 - 904 - Handbook
100% (1)
03CDT0902 - Eurotherm - 902 - 904 - Handbook
157 pages
UNIT 1 (2)
No ratings yet
UNIT 1 (2)
10 pages
CIT316-Summary
No ratings yet
CIT316-Summary
21 pages
Lecture 01 - Introduction To LT & FA-2024
No ratings yet
Lecture 01 - Introduction To LT & FA-2024
34 pages
Formal Systems & Programming Languages: An Introduction
No ratings yet
Formal Systems & Programming Languages: An Introduction
20 pages
Automata Theory LecturesSlides Compressed
No ratings yet
Automata Theory LecturesSlides Compressed
141 pages
Formal Languages and Chomsky Hierarchy
No ratings yet
Formal Languages and Chomsky Hierarchy
41 pages
Toa Handout 2
No ratings yet
Toa Handout 2
41 pages
ToA - Lecture 03 04 - Language Preliminaries Regular Expressions
No ratings yet
ToA - Lecture 03 04 - Language Preliminaries Regular Expressions
35 pages
Intro. to Formal Languages
No ratings yet
Intro. to Formal Languages
35 pages
Theory of Automata: Dr. S. M. Gilani
No ratings yet
Theory of Automata: Dr. S. M. Gilani
29 pages
Lecture 1
No ratings yet
Lecture 1
35 pages
Formal Languages and Chomsky Hierarchy
No ratings yet
Formal Languages and Chomsky Hierarchy
36 pages
Lecture Two: Formal Languages: Amjad Ali
No ratings yet
Lecture Two: Formal Languages: Amjad Ali
27 pages
Formal Systems: Objectives
No ratings yet
Formal Systems: Objectives
13 pages
Formal Languages and Chomsky Hierarchy
No ratings yet
Formal Languages and Chomsky Hierarchy
36 pages
Intro. To Formal Languages
No ratings yet
Intro. To Formal Languages
32 pages
Automata Theory: Digital Notes by
No ratings yet
Automata Theory: Digital Notes by
77 pages
Chapter One - Introduction
No ratings yet
Chapter One - Introduction
30 pages
TOA - Lecture 1
No ratings yet
TOA - Lecture 1
42 pages
Introduction, DFA and NFA
No ratings yet
Introduction, DFA and NFA
191 pages
02a Regular Languages
No ratings yet
02a Regular Languages
30 pages
Theory of Automata
50% (2)
Theory of Automata
44 pages
Ivan Stojmenovic: Ivan@site - Uottawa.ca WWW - Site.uottawa - Ca/ Ivan
No ratings yet
Ivan Stojmenovic: Ivan@site - Uottawa.ca WWW - Site.uottawa - Ca/ Ivan
42 pages
Flat-Notes
No ratings yet
Flat-Notes
110 pages
Theory of Automata: Lecture - 03 Spring 2021 Waqas Tariq Dar UOL
No ratings yet
Theory of Automata: Lecture - 03 Spring 2021 Waqas Tariq Dar UOL
18 pages
Lesson 01
No ratings yet
Lesson 01
31 pages
FLAT Complete Notes
No ratings yet
FLAT Complete Notes
110 pages
Module 1_Chapter 1
No ratings yet
Module 1_Chapter 1
52 pages
Lecture 1 - Chapter 1-Introduction (1)
No ratings yet
Lecture 1 - Chapter 1-Introduction (1)
152 pages
Lesson 01
No ratings yet
Lesson 01
39 pages
CS-603 Ch1
No ratings yet
CS-603 Ch1
30 pages
Course Overview: - What Are The Practical Benefits/application of Formal Languages and Automata Theory?
No ratings yet
Course Overview: - What Are The Practical Benefits/application of Formal Languages and Automata Theory?
10 pages
TOC-L01-Languages-S25
No ratings yet
TOC-L01-Languages-S25
37 pages
DLC CSC 351 Course Material-2020-Ayorinde
No ratings yet
DLC CSC 351 Course Material-2020-Ayorinde
90 pages
Fall_Semester_2023-24_CSE1013_TH_AP2023242000613_Reference_Material_I_02-Aug-2023_Module_-_I_Part_-_I
No ratings yet
Fall_Semester_2023-24_CSE1013_TH_AP2023242000613_Reference_Material_I_02-Aug-2023_Module_-_I_Part_-_I
38 pages
Keijo Ruohonen: Formal Languages
No ratings yet
Keijo Ruohonen: Formal Languages
97 pages
Tcs in Concise
No ratings yet
Tcs in Concise
98 pages
Lecture 1 - Chapter 1-Introduction
No ratings yet
Lecture 1 - Chapter 1-Introduction
152 pages
Lesson 07 Finite Automata III (1)
No ratings yet
Lesson 07 Finite Automata III (1)
20 pages
An Introduction To Formal Language Theory That Integrates Experimentation and Proof - Allen Stoughton
No ratings yet
An Introduction To Formal Language Theory That Integrates Experimentation and Proof - Allen Stoughton
288 pages
AUTOMATA Good Book
No ratings yet
AUTOMATA Good Book
60 pages
CSC312 Automata Theory Languages: Lecture # 2
No ratings yet
CSC312 Automata Theory Languages: Lecture # 2
50 pages
01-Introduction&Languages (1) 112431
No ratings yet
01-Introduction&Languages (1) 112431
21 pages
Module 1 FLAT
No ratings yet
Module 1 FLAT
84 pages
Chapter 3 Finite Automata and Lexical Analysis
No ratings yet
Chapter 3 Finite Automata and Lexical Analysis
100 pages
Automata
No ratings yet
Automata
17 pages
Automata Theory-Introduction MAIT TOC GGSIPU
No ratings yet
Automata Theory-Introduction MAIT TOC GGSIPU
23 pages
Introduction To Languages There Are Two Types of Languages Formal Languages (Syntactic Languages)
No ratings yet
Introduction To Languages There Are Two Types of Languages Formal Languages (Syntactic Languages)
29 pages
Baker CS341 Packet PDF
0% (1)
Baker CS341 Packet PDF
373 pages
Baker CS341 Packet PDF
No ratings yet
Baker CS341 Packet PDF
373 pages
Theory of Computation
No ratings yet
Theory of Computation
373 pages
Lesson 1
No ratings yet
Lesson 1
28 pages
Theory of Automata and Formal Languages
No ratings yet
Theory of Automata and Formal Languages
39 pages
Toc Unit 1 Finite Automata
No ratings yet
Toc Unit 1 Finite Automata
132 pages
CMP3008 LN1 CourseOverview Introduction
No ratings yet
CMP3008 LN1 CourseOverview Introduction
49 pages
Common LISP: A Gentle Introduction to Symbolic Computation
From Everand
Common LISP: A Gentle Introduction to Symbolic Computation
David S. Touretzky
4/5 (18)
Introduction to Formal Languages
From Everand
Introduction to Formal Languages
György E. Révész
2/5 (1)
The Genetic Code of All Languages; Part-5 (Hebrew)
From Everand
The Genetic Code of All Languages; Part-5 (Hebrew)
Moni Kanchan Panda
No ratings yet
The Genetic Code of All Languages,(Part 2.1; Numerals)
From Everand
The Genetic Code of All Languages,(Part 2.1; Numerals)
Moni Kanchan Panda
No ratings yet
The Magic of Formal Languages
From Everand
The Magic of Formal Languages
Pasquale De Marco
No ratings yet
Very Large Scale Integration
No ratings yet
Very Large Scale Integration
4 pages
Timer Digital Theben OperatingInstructions_TR-687-3-top2_en
No ratings yet
Timer Digital Theben OperatingInstructions_TR-687-3-top2_en
16 pages
Circular Xiii
No ratings yet
Circular Xiii
82 pages
3.0 Java Programming Tutorial OOP Exercises 3.4 Understanding Objects PDF
No ratings yet
3.0 Java Programming Tutorial OOP Exercises 3.4 Understanding Objects PDF
61 pages
SF 1808 Release Overview BizX
No ratings yet
SF 1808 Release Overview BizX
74 pages
Autonomous BOM
No ratings yet
Autonomous BOM
6 pages
Common Diffusion Noise Schedules and Sample Steps Are Flawed
No ratings yet
Common Diffusion Noise Schedules and Sample Steps Are Flawed
8 pages
Datasheet TimeProvider 5000
No ratings yet
Datasheet TimeProvider 5000
3 pages
Bahasa Inggeris Tahap 1: Program Khas Kecemerlangan Murid Tp3-6 Swot, Strategi Dan Sasaran
No ratings yet
Bahasa Inggeris Tahap 1: Program Khas Kecemerlangan Murid Tp3-6 Swot, Strategi Dan Sasaran
12 pages
Manual OF User: Karaoke
0% (1)
Manual OF User: Karaoke
24 pages
Raju PPT Compresse
No ratings yet
Raju PPT Compresse
18 pages
Library Management System Project
No ratings yet
Library Management System Project
32 pages
Hpe Simplivity - Software 4.0.1 Release: Richard Greenwalt
No ratings yet
Hpe Simplivity - Software 4.0.1 Release: Richard Greenwalt
70 pages
File5983othdoc0 1284352860890
No ratings yet
File5983othdoc0 1284352860890
21 pages
Presale Id: 239648: #Slno Category Product Quantity Unit Price Total
No ratings yet
Presale Id: 239648: #Slno Category Product Quantity Unit Price Total
2 pages
Deep Web Thesis
100% (3)
Deep Web Thesis
8 pages
Case Study
No ratings yet
Case Study
8 pages
PD 63711
No ratings yet
PD 63711
64 pages
Email and Online Communication
No ratings yet
Email and Online Communication
29 pages
Link Bank Account Application Form
No ratings yet
Link Bank Account Application Form
4 pages
PLUS+1 Controllers: MC050-020 and MC050-022
No ratings yet
PLUS+1 Controllers: MC050-020 and MC050-022
4 pages
Unit-8: Applications and Trends of Microprocessor Technology
No ratings yet
Unit-8: Applications and Trends of Microprocessor Technology
3 pages
HDSet Operating Instructions V3.0
No ratings yet
HDSet Operating Instructions V3.0
33 pages
Minecraft Keywords
No ratings yet
Minecraft Keywords
4 pages
Introduction To AI and ML
100% (1)
Introduction To AI and ML
68 pages
Speech Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching
No ratings yet
Speech Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching
15 pages
Router 6273: Meeting The Strictest Radio Requirements
No ratings yet
Router 6273: Meeting The Strictest Radio Requirements
2 pages

Chapter 2 (Lexical Analysis)

Uploaded by

Chapter 2 (Lexical Analysis)

Uploaded by

Lexical Analysis

• For lexical analysis, specifications are traditionally written using

• The generated lexers are in a class of extremely simple programs

- building words by concatenation of letters Vocabulary of

-construction of sentences according to rules: French Language

French language = { sentences}

- usefulness of the language:

Building a programming language :

-construction of words by concatenation of symbols

Computer languages = programming languages

- usefulness of programming languages

Construction of a formal language:

a single operation : the concatenation . Construction

- construction of words by concatenation of symbols language of

Definition : The free monoid * o n a n alphabet  is the set of all words

Definition : A langage L on an alphabet  is a subset of *, L *

Definition : The length of a word noted || is the number of symbols that

We can count the number of a particular symbol in a word |abbab|a=2

Note: Concatenation is non-commutative:

Give eight words of the language L.

- * is the set or all words

Exercise 1: Given  = {a,b} and L= {anbm/n>=0 and m>=0} a language .

Exercise 2: Given  = {a,b} and L1= {anbm/n>=0 et m>=0}, L2=

Exercise 3: Given  = {a,b} and L isa langage. Prove que (L*)*=L*.

Language L construction Automate A such that L(A)=L

Definition: An automaton is an abstract machine with states which

Definition : an automaton is a 5-uplet A=<, Q, q0, F, > where:

The transition function can be represented by a graph :

The automaton can be represented by a graph with one entry and

 is defined on the symbols of  :

Question :How to extend  on the words of *:

Example : (q0,aab)= ((a,q0),ab)= (q0,ab)=((q0, a),b)=(q0,b)=q1

Le langage recognized by the automaton A is:

Example : L(A)={b,ab,aab,aaab, …}={anb/n>=0} 17

The transition function can be represented by a graph:

Définition : For an automaton A, its language L(A) is called rational language

The words recognized by the automaton A are :

The variables of the automaton

Initialization of the l’automaton :

case qn: transitions over qn

Given an alphabet, we will describe sets of strings by regular

The idea is that regular expressions that describe simple sets of

• Additionally, we use precedence rules, similar to the algebraic convention that

• The | operator is associative and commutative (as it corresponds to

• Concatenation is associative (but obviously not commutative) and

If we want to describe non-negative integer constants, we can do so

Additionally, we can use interval notation to abbreviate [0123456789] to

• Variable names. In the programming language C, a variable name

• A finite automaton can be used to decide if an input string is a member in some

• Follow an epsilon transition to another state, or

• Read a character from the input and follow a transition labelled by

• We may have a choice of several actions at each step: We can choose

• This makes the automaton nondeterministic, as the choice of action

• Definition 2.1 A nondeterministic finite automaton consists of a set S of states.

• If we in the example above had chosen to follow the a-transition to

• But, as previously stated, it is enough that there exists a path

• To be precise, we will from each subexpression construct an NFA

• A fragment is not a complete NFA, so we complete the construction

• To be precise, we will from each subexpression construct an NFA

• A fragment is not a complete NFA, so we complete the construction

The optimised constructions are shown in figure 2.6. As an example,

• Nondeterministic automata are, as mentioned earlier, not quite as

Example : L(A)={ancbm/n>=0, m>=0}. How to représente L?

Answer: L can be represented by the regular expression L=a*cb*.

What is a regular expression?

Example : Given ={a,b} an alphabet.

How to define a regular expression?

- The Automata for the regular expression a is :

- The Automata for r | s:

- Automata for r*:

Ar*=<r,Qr{qr*,o , qr*,f}, qr*,o,

You might also like

Definition : A langage L on an alphabet  is a subset of , L 

Exercise 3: Given  = {a,b} and L isa langage. Prove que (L)=L*.

Answer: L can be represented by the regular expression L=acb.

Ar=<r,Qr{qr,o , qr,f}, qr,o,