0% found this document useful (0 votes)
18 views

Lectures Merged

This document discusses the course "Formal Languages and Compiler Design" taught by Simona Motogna. It covers topics like the history of formal languages, compiler design, and how learning these topics can help students become better programmers. The course organization and requirements are also outlined, including attendance policies, grading breakdown, and lab work expectations. References and resources are provided to aid student learning.

Uploaded by

Veko Boy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Lectures Merged

This document discusses the course "Formal Languages and Compiler Design" taught by Simona Motogna. It covers topics like the history of formal languages, compiler design, and how learning these topics can help students become better programmers. The course organization and requirements are also outlined, including attendance policies, grading breakdown, and lab work expectations. References and resources are provided to aid student learning.

Uploaded by

Veko Boy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 291

Formal Languages and

Compiler Design

Simona Motogna

S. Motogna - LFTC
Compiler
Historical reasons
Formal Design
Languages

Be a better programmer

Performant algorithms
FLCD

S. Motogna - LFTC
Organization Issues

• Course – 2 h/ week 10 presences – seminar


• Seminar – 2h/week 12 presences - lab
• Laboratory - 2 h/week

PRESENCE IS MANDATORY

S. Motogna - LFTC
Most interesting stuff for students
• Moodle:
• All course resources
• Homeworks
• Assignments
• Labs
• Points / grades

• MsTeams – labs (maybe)

S. Motogna - LFTC
Minimal Conditions to Pass

• Minimum 10 presences at seminar


• Minimum 12 presences at laboratory

• Minimum grade 6 at lab


• Minimum grade 5 at final exam
Final grade

60% final exam


+
30% lab
+
10% seminar

Bonus
Lab work

• 10 laboratory tasks

• !!! Must be completed and loaded during lab hours

• Weighted grades:
Lab grade
Bonus points:
- “awesome” solutions
- Extra work
I wish …

Effective communication Interactive experience Learning fun


References
• See fișa disciplinei

S. Motogna - LFTC
S. Motogna - LFTC
What is a compiler?
Interpreter?

Source code / Object code /


program program
Compiler
Assembler?

S. Motogna - LFTC
S. Motogna - LFTC
A little bit of history …

Java
1995
C
Pascal J. Gosling
1969 - 1973
1968 - 1970 D. Ritchie
Lisp
N. Wirth
1962
Fortran McCarthy
1954-1957
Backus
S. Motogna - LFTC
Structure of a compiler Take
notes!
Source code/
analysis Error handling
Scanning (lexical
program analysis)

Parsing (syntactical
Tokens & ST analysis)

Syntax tree Semantic analysis

Intermediary code
Adnotated syntax generation synthesis
tree
Intermediary
Symbol Table Intermediary code optimization
management code
Object code /
Optimized Object code program
generation
intermediary code

S. Motogna - LFTC
Chapter 1. Scanning
Definition = treats the source program as a sequence of characters,
detect lexical tokens, classify and codify them

INPUT: source program


OUTPUT: PIF + ST

Algorithm Scanning v1
While (not(eof)) do
detect(token);
classify(token);
codify(token);
End_while

S. Motogna - LFTC
Take

Detect notes!

I am a student.I am
Simona

- Separators => Remark 1)

if (x==y) {x=y+2}

- Look-ahead => Remark 2)

S. Motogna - LFTC
Classify
• Classes of tokens:
• Identifiers
• Constants
• Reserved words (keywords)
• Separators
• Operators

• If a token can NOT be classified => LEXICAL ERROR

S. Motogna - LFTC
Codify
• May be codification table
OR
code for identifiers and constants

• Identifier, constant => Symbol Table (ST)

• PIF = Program Internal Form = array of pairs

• pairs (token, position in ST)

identifier, constant

S. Motogna - LFTC
Algorithm Scanning v2
While (not(eof)) do
detect(token);
if token is reserved word OR operator OR separator
then genPIF(token, 0)
else a=a+b

if token is identifier OR constant FIP


(id,1)
then index = pos(token, ST); (=,0)
(id,1)
genPIF(token, index) (+,0)
else message “Lexical error” (id,2)

endif ST
1 a
endif 2 b
endwhile
S. Motogna - LFTC
Remarks:
• genPIF = adds a pair (token, position) to PIF

• Pos(token,ST) – searches token in symbol table ST; if found then


return position; if not found insert in SR and return position
• Order of classification (reserved word, then identifier)
• If-then-else imbricate => detect error if a token cannot be classified

S. Motogna - LFTC
Example (sem?)
• https://babeljs.io/docs/en/
• https://www.antlr.org/ and https://github.com/antlr/antlr4
• https://www.programiz.com/python-programming/online-compiler/
• https://www.w3schools.com/python/python_compiler.asp

S. Motogna - LFTC
Course 2

S. Motogna - FL&CD 2020


Algorithm Scanning v2
While (not(eof)) do
detect(token);
if token is reserved word OR operator OR separator
then genPIF(token, 0)
else
if token is identifier OR constant
then index = pos(token, ST);
genPIF(token_type, index)
else message “Lexical error”
endif
endif
endwhile
S. Motogna - LFTC
Remarks:
• Also comments are eliminated
• Most important operations: SEARCH and INSERT

S. Motogna - LFTC
Symbol Table
Definition = contains all information collected during compiling
regarding the symbolic names from the source program

identifiers, constants, etc.

Variants:
- Unique symbol table – contains all symbolic names
- distinct symbol tables: IT (identifiers table) + CT (constants
table)

S. Motogna - LFTC
ST organization
Remark: search and insert

1. Unsorted table – in order of detection in source code O(n)


2. Sorted table: alphabetic (numeric) O(lg n)
3. Binary search tree (balanced) O(lg n)
4. Hash table O(1)

S. Motogna - LFTC
Hash table
• K = set of keys (symbolic names)
• A = set of positions (|A| = m; m –prime number)

h:K→A
h(k) = (val(k) mod m) + 1
Toy hash function to use at
lab:
Sum of ASCII codes of chars
• Conflicts: k1 ≠ k2 , h(k1) = h(k2)

S. Motogna - LFTC
Visibility domain (scope) Example:
Int main(){
… int a;
• Each scope – separate ST void f()
• Structure -> inclusion tree {float a;
… int h() {…}
Hierachical structure of STs: }

main void g()
{char a;
f g …
}
}
h

S. Motogna - LFTC
Formal Languages
- basic notions-

S. Motogna - FL&CD
Examples of languages
• natural (ex. English, Romanian)
• programming (ex. C,C++, Java, Python)
• formal

A formal language is a set


Ex.:
L = {anbn| n>0} L = {ab, aabb, aaabbb, …}
L’ = {01n|n>=0} L’ = {0, 01, 011, …}

S. Motogna - FL&CD
Example
a boy has a dog • A→ α = rule
• S,P,V,N,Q,C,B = nonterminal symbols
S→PV • a, boy,dog,has = terminal symbols
P→ a N
N→ boy or N→ dog Remarks
(N→boy|dog) 1. Sentence = word, sequence (contains only
V → QC terminal symbols) ; denoted w.
Q → has
C → BN 2. S⇒PV⇒a NV⇒a NQC⇒a N has C -
sentential form
B→a
In general : w=a1a2. . . an
3. The rule guarantees syntactical correctness,
but not the semantical correctness (A dog has
a boy)

S. Motogna - FL&CD
Grammar
• Definition: A (formal) grammar is a 4-tuple: G=(N,Σ,P,S)
with the following meanings:
• N – set of nonterminal symbols and |N| < ∞
• Σ - set of terminal symbols (alphabet) and |Σ|<∞
• P – finite set of productions (rules), with the propriety:
P⊆(N∪Σ)∗ N(N∪Σ)∗ x (N∪Σ)∗
• S∈N – start symbol /axiom A* = transitive and
Remarks : reflexive closure =
{a,aa,aaa,…} {a0}
1. (α,β)∈P is a production denoted α→β
A = {a}
2. N ∩ Σ = ∅ A+ = {a,aa,aaa,…}

X0 =𝜀
S. Motogna - FL&CD
Binary relations defined on (N ∪ Σ)∗
• Direct derivation
α ⇒ β , α,β ∈ (N ∪ Σ)∗ if α=x1xy1 , β=x1yy1 and x→y∈P
(x is transformed in y)
• k derivation
k
α ⇒ β ,α,β ∈ (N ∪ Σ)∗
sequence of k direct derivations α⇒α1 ⇒α2 ⇒...⇒αk−1 ⇒β, α,α1,α2,...αk−1,β∈(N∪Σ)∗
• + derivation
+ k
α ⇒ β if ∃ k>0 such that α ⇒ β (there exists at least one direct derivation)
• * derivation
* k * + 0
α ⇒ β if ∃ k≥0 such that α ⇒ β namely, α⇒ β⇔α⇒ β OR α⇒ β (α=β)

S. Motogna - FL&CD
Definition: Language generated by a grammar G=(N,Σ,P,S) is:

* w}
L(G)={w∈Σ∗ | S ⇒
Remarks: L1 = {a,b,aa}
L2 = {c,d,cd}
L1L2 = {ac,ad,acd,bc,bd,bcd,aac,aad,aacd}
1. S⇒* α,α∈(N∪Σ)∗ = sentential form
* w, w∈Σ∗ = word / sequence
S⇒
2. Operations defined for languages (sets) :
L1∪L2 , L1∩L2 , L1-L2 , 𝐿 (complement) , L+=⋃!"# 𝐿! , L∗=⋃!$# 𝐿!
Concatenation: L=L1L2 = {w1w2 | w1∈L1 , w2∈L2}
3. |w|=0 (empty word - denoted ε)

Definition: Two grammar G1 and G2 are equivalent if they generate the same language
L(G1)=L(G2)

S. Motogna - FL&CD
Chomsky hierarchy(based on form α → β∈ P)

• type 0 : no restriction
• type 1 : context dependent grammar (x1Ay1 → x1γy1)
• type 2 : context free grammar (A → α ∈ P ,where A∈N and α∈(N ∪ Σ)∗
)
• type 3 : regular grammar ( A → aB|a ∈ P)
Remark :
type 3 ⊆ type 2 ⊆ type 1 ⊆ type 0

S. Motogna - FL&CD
Notations
o A,B,C,... – nonterminal symbols
o S ∈ N – start symbol
o a,b,c,... ∈ Σ – terminal symbol
o α,β,γ ∈ (N ∪ Σ)∗ - sentential forms
o ε – empty word
ox,y,z,w ∈ Σ∗ - words
o X,Y,U,... ∈ (N ∪ Σ) – grammar symbols (nonterminal or terminal)

S. Motogna - FL&CD
u

d a

a n a

r c a
a

Problem: The door to the tower is closed by the Red Dragon, using a
complicated machinery. Prince Charming has managed to steal the
plans and is asking for your help. Can you help him determining all
the person names that can unlock the door
Course 3&4
Formal Languages
- Basic notions -

S. Motogna - FL&CD
Regular languages

S. Motogna - FL&CD
S.Motogna - Formal Languages & Compiler Design

1. Search engine – succes of


Google
Why? 2. Unix commands
3. Programming languages – new
feature
Reg exp

Finite Regular
Automata grammars
S.Motogna - Formal Languages & Compiler Design
u

d a

a n a

r c a
a

Problem: The door to the tower is closed by the Red Dragon, using a
complicated machinery. Prince Charming has managed to steal the
plans and is asking for your help. Can you help him determining all
the person names that can unlock the door
Finite Automata
• Intuitive model
a n a Σ

q
CU

S. Motogna - FL&CD
Definition: A finite automaton (FA) is a 5-tuple
M = (Q,Σ,δ,q0,F)
where:
• Q - finite set of states (|Q|<∞)
• Σ - finite alphabet (|Σ|<∞)
• δ – transition function : δ:Q×Σ→P(Q)
• q0 – initial state q0 ∊ Q
• F⊆Q – set of final states

S. Motogna - FL&CD
Remarks

1. Q∩Σ=∅
2. δ:Q×Σ→P(Q) , ε∈Σ0 - relation δ(q,ε)=p NOT allowed
3. If |δ(q,a)|≤1 => deterministic finite automaton (DFA)
4. If |δ(q,a)|>1 (more than a state obtained as result) =>
nondeterministic finite automaton (NFA)

Property: For any NFA M there exists a DFA M’ equivalent to M

S. Motogna - FL&CD
Configuration C=(q,x)
where:
- q state
- x unread sequence from input: x ∊ ∑*

Initial configuration : (q0,w) , w - whole sequence


Final configuration: (qf ,ε) , qf ∈ F, ε –empty sequence
(corresponds to accept)

S. Motogna - FL&CD
Relations between configurations
• ⊢ move / transition (simple, one step)
(q,ax) ⊢ (p,x) , p ∈ δ(q,a)

k
• ⊢ k move = a sequence of k simple transitions) C0 ⊢ C1 ⊢... ⊢ Ck
+
• ⊢ + move
+ k
C ⊢ Cʹ : ∃ k>0 such that C ⊢ Cʹ
*
•⊢ * move (star move)
k
⊢ ∃
C Cʹ : k≥0 such that
* ⊢
C Cʹ

S. Motogna - FL&CD
Definition : Language accepted by FA M = (Q,Σ,δ,q0,F) is:
*
L(M)={ w ∈ Σ∗ | (q0,w) ⊢ (qf ,ε) , qf ∈F }

Remarks
1. 2 finite automata M1 and M2 are equivalent if and only if they
accept the same language
L(M1)=L(M2)
1. ε ∈ L(M) ó q0∈F (initial state is final state)

S. Motogna - FL&CD
Representing FA b

a b
1. List of all elements p q r
2. Table a

3. Graphical representation
M=(Q,Σ,δ,p,F)
M=(Q,Σ,δ,p,F)
Q = {p,q,r}
F = {r}
Σ = {a,b}
δ(p,a) = q a b
δ(q,a)=q p q r (p,aab)|-(q,ab)|-(q,b)|-(r,ε) => aab accepted
δ(q,b)=r
q q r (p,aba)|-(q,ba)|-(r,a) => aba not accepted
δ(p,b)=r
F = {r} r - -

S. Motogna - FL&CD
Remember
• Finite automaton

M = (Q,Σ,δ,q0,F)

*
L(M)={ w ∈ Σ∗ | (q0,w) ⊢ (qf ,ε) , qf ∈F }

S.Motogna - Formal Languages & Compiler Design


Regular grammars
• G = (N, 𝚺, P, S) right linear grammar if
∀p∊P: A→aB or A →b, where A,B ∊N and a,b ∊ 𝚺
S->aA|ε; A-> a reg
• G = (N, 𝚺, P, S) regular grammar if S->aS|aA; A->bS|b reg
• G is right linear grammar S->aA; A->aA|ε NOT reg
and S->aA|ε; A->aS NOT reg
• A→𝜀 ∉ P, with the exception that S →𝜀 ∊ P, in which case S does not appear in
the rhs (right hand side) of any other production

• L(G) = {w ∊ 𝚺* | S *=> w} - right linear language

S.Motogna - Formal Languages & Compiler Design


Theorem 1: For any regular grammar G=(N, 𝚺, P, S) there exists
a FA M=(Q, 𝚺, 𝛿, q0,F) such that L(G) = L(M)
Proof: construct M based on G
Q = N U {K}, K ∉ N 𝛿: if A →aB ∊ P then 𝛿(A,a) = B
q0 = S if A →a ∊ P then 𝛿(A,a) = K
F = {K} U {S| if S→𝜺 ∊ P}
Prove that L(G) = L(M) (w ∈L(G) ⇔w ∈L(M)):
* * (qf , 𝜺)
S⇒w ⇔ (S,w) ⊢
w= 𝜺: S⇒* 𝜺 ⇔ (S, 𝜺) ⊢ * (S, 𝜺) – true
w=a1a2. . .an: S ⇒* w ⇔ (S,w) ⊢ * (K, 𝜺)
S ⇒ a1A1 ⇒ a1a2A2 ⇒ . . . ⇒ a1a2. . .an−1An−1 ⇒ a1a2. . .an−1an
S ⇒ a1A1 exists if S → a1A1 and then δ(S,a1)=A1
A1 → a2A2 : δ(A1,a2)=A2 . . .
An−1 → an : δ(An−1,an)=K
(S,a1a2. . .an) ⊢ (A1,a2. . .an) ⊢ (A2,a3. . .an) ⊢ . . . ⊢ (An−1,an) ⊢ (K, 𝜺) , K∈F

S.Motogna - Formal Languages & Compiler Design


Theorem 2: For any FA M=(Q, 𝚺, 𝛿, q0,F) there exists a right
linear grammar G=(N, 𝚺, P, S) such that L(G) = L(M)
Proof: construct G based on M P: if 𝛿(q,a) = p then q →ap ∊ P
N=Q if p ∊ F then q →a ∊ P
S = q0 if q0 ∊ F then S → 𝜺
Prove that L(M) = L(G) (w ∈L(M) ⇔w ∈L(G)):
P(i): q i+1
⇒ x ⇔ (q,x) ⊢i (qf , 𝜺) , qf∈F -prove by induction
i+1 i
Apply P : q0 ⇒ w ⇔ (q0,w) ⊢ (qf , 𝜺) , qf∈F
If i=0: q⇒x ó (q,x) ⊢0 (qf , 𝜺) (x= 𝜺,q=qf ) q⇒ 𝜺 ó q0→ 𝜺 , q0∈F
Assume ∀ k≤i P is true
i+1
q ⇒ x ⇔ (q,x) ⊢i (qf , 𝜺)
i
For q ∊ N apply ”⇒” : q ⇒ ap ⇒ ax
i i-1
If q ⇒ ap then 𝛿(q,a)= p ; if p ⇒ ax then (p,x) ⊢ (qf , 𝜺) , qf∈F
THEN (q,ax) ⊢i (qf , 𝜺) , qf∈F

S.Motogna - Formal Languages & Compiler Design


Regular sets
Definition: Let 𝚺 be a finite alphabet. We define regular sets over 𝚺
recursively in the following way:
1. 𝞥 is a regular set over 𝚺 (empty set)
2. {𝞮} is a regular set over 𝚺
3. {a} is a regular set over 𝚺, ∀ a∊𝚺
4. If P, Q are regular sets over 𝚺, then P∪Q, PQ, P* are regular sets
over 𝚺
5. Nothing else is a regular set over 𝚺

S.Motogna - Formal Languages & Compiler Design


Regular expressions
Definition: Let 𝚺 be a finite alphabet. We define regular expressions
over 𝚺 recursively in the following way:
1. 𝞥 is a regular expression denoting the regular set 𝞥 (empty set)
2. 𝞮 is a regular expression denoting the regular set {𝞮}
3. a is a regular expression denoting the regular set {a}, ∀ a∊𝚺
4. If p,q are regular expression denoting the regular sets P, Q then:
• p+q is a regular expression denoting the regular set P∪Q,
• pq is a regular expression denoting the regular set PQ,
• p* is a regular expression denoting the regular set P*
5. Nothing else is a regular expression
S.Motogna - Formal Languages & Compiler Design
Remarks:
Examples
1. p+ = pp*
2. Use paranthesis to avoid ambiguity
3. Priority of operations: *, concat, + (from high to low)
4. For each regular set we can find at least one regular exp to denote
it (there is an infinity of reg exp denoting them)
5. For each regular exp, we can construct the corresponding regular
set
6. 2 regular expressions are equivalent iff they denote the same
regular set

S.Motogna - Formal Languages & Compiler Design


Algebraic properties of regular exp
Let 𝛂, 𝛃, 𝛄 be regular expressions.
1. 𝛂+𝛃=𝛃+𝛂
2. 𝞥* = 𝞮 9. 𝛂*= 𝛂 + 𝛂*
3. 𝛂 + (𝛃 + 𝛄) = (𝛂 + 𝛃) + 𝛄 10.(𝛂*)* = 𝛂*
4. 𝛂(𝛃𝛄) = (𝛂𝛃)𝛄 11.𝛂 + 𝛂 = 𝛂
5. 𝛂 (𝛃 + 𝛄) = 𝛂𝛃 + 𝛂𝛄 12.𝛂 + 𝞥 = 𝛂
6. (𝛂 + 𝛃)𝛄 = 𝛂𝛄 + 𝛃𝛄
7. 𝛂𝞮=𝞮𝛂=𝛂
8. 𝞥𝛂 = 𝛂𝞥 = 𝞥
S.Motogna - Formal Languages & Compiler Design
Reg exp equations
• Normal form: X = aX + b
where a,b – reg exp a a*b + b = (aa* +𝛆)b = a*b
• Solution: X = a*b

• System of reg exp equations:


𝑋 = 𝑎! 𝑋 + 𝑎 " 𝑌 + 𝑎 #
/
𝑌 = 𝑏! 𝑋 + 𝑏 " 𝑌 + 𝑏 #
• Solution: Gauss method (replace Xi and solve Xn)

S.Motogna - Formal Languages & Compiler Design


Why?

Reg exp

Maths

Finite Regular
Automata grammars

S.Motogna - Formal Languages & Compiler Design


Prop:Regular sets are right linear languages
Lemma 1: 𝞥,{𝞮}, {a},∀a∊𝚺 are right linear languages

Proof: constructive
i. G = ({S}, 𝚺, 𝞥, S) – regular grammar such that L(G) = 𝞥

ii. G = ({S}, 𝚺,{S→𝞮}, S) – regular grammar such that L(G) ={𝞮}

iii. G = ({S}, 𝚺,{S→a}, S) – regular grammar such that L(G) ={a}

S.Motogna - Formal Languages & Compiler Design


Lemma 2: If L1 and L2 are right linear languages then:
L1 ∪ L2, L1L2 and L1* are right linear languages.

Proof: constructive
L1,L2 right linear languages => ∃G1, G2 such that
G1 = (N1, 𝚺1,P1,S1) and L1 = L(G1)
G2 = (N2, 𝚺2,P2,S2) and L2 = L(G2) assume N1∩N2 = ∅

S.Motogna - Formal Languages & Compiler Design


i. G3 = (N3, 𝚺,P3,S3)

N3 = N1U N2U {S3}; ∑3 = ∑1 U ∑2

P3 = P1U P2U {S3→ S1| S2}

{S3→ 𝛂1| S1→ 𝛂 1∊ P1} U {S3→ 𝛂2| S2→ 𝛂 2∊ P2}


G3 – right linear language
and
L(G3) = L(G1) U L(G2) PROOF!!! Homework

S.Motogna - Formal Languages & Compiler Design


ii. G4 = (N4, 𝚺,P4,S4)

N4 = N1U N2; S4= S1;∑4=∑1U∑2

P4 = {A→ aB| if A→ aB ∊ P1} U


{A→ aS2| if A→ a ∊ P1} U
P2 U
{S1→ 𝛂2| if S1 → 𝝴 ∊ P1 and S2→ 𝛂 2∊ P2 }

G4 – right linear language


and
L(G4) = L(G1) L(G2) PROOF!!! Homework

S.Motogna - Formal Languages & Compiler Design


iii. G5 = (N5, 𝚺1,P5,S5)
//IDEA: concatenate L1 with itself
N4 = N1U {S5};

P5 = P1 U {S5 → 𝝴} U
{S5→ 𝛂1| S1→ 𝛂 1∊ P1} U
{A→ aS1| if A→ a ∊ P1}

G5 – right linear language


and
L(G5) = L(G1)* PROOF!!! Homework

S.Motogna - Formal Languages & Compiler Design


Theorem: A language is a regular set if and
only if is a right linear language
Proof:
=> Apply lemma 1 and lemma 2
<= construct a system of regular exp equations where:
- Indeterminants – nonterminals
- Coefficients – terminals
- Equation for A: all the possible rewritings of A
Example: G=({S,A,B},{0,1}, P, S)
P: S → 0A | 1B | 𝝴
A → 0B | 1A 𝑆 = 0𝐴 + 1𝐵 + 𝝴 Regular exp = solution
- 𝐴 = 0𝐵 + 1𝐴 corresponding to S
B → 0S | 1
𝐵 = 0𝑆 + 1

S.Motogna - Formal Languages & Compiler Design


Theorem: A language is a
regular set if and only if is
accepted by a FA

Proof: 𝑞1 = 𝑞 30 + 𝝴
=> Apply lemma 1 and lemma 2 (to follow, similar to RG) ! 𝑞2 = 𝑞 10 + 𝑞11 + 𝑞20 + 𝑞30
<= construct a system of regular exp equations where: 𝑞 3 = 𝑞 21
- Indeterminants – states
- Coefficients – terminals Regular exp = union of
- Equation for A: all the possibilities that put the FA in solutions corresponding
state A
- Equation of the form: X=Xa+b => solution X=ba*
to final states

S.Motogna - Formal Languages & Compiler Design


Lemma 1’:𝞥,{𝞮}, {a},∀a∊𝚺 are accepted by FA
Reg exp FA
𝞥 M = (Q, 𝚺, 𝛿, q0, 𝞥)
𝞮 M = (Q, 𝚺, 𝞥, q0, {q0})
a,∀a∊𝚺 M = ({q0,q1}, 𝚺, {𝛿(q0,a) = q1}, q0, {q1})

S.Motogna - Formal Languages & Compiler Design


Lemma 2’:If L1 and L2 are accepted by a FA then:
L1 ∪ L2, L1L2 and L1* are accepted by FA
Proof:
M1 = (Q1, 𝚺1, 𝛿1, q01, F1) such that L1= L(M1)
M2 = (Q2, 𝚺2, 𝛿2, q02, F2) such that L2 = L(M2)

M3 = (Q3, 𝚺1U, 𝛿3, q03, F3)


Q3 = Q1 U Q2 U {q03}; ∑3 = ∑1 U ∑2 L(M3) = L(M1) U L(M2)
F3 = F1 U F2 U {q03 | if q01 ∊ F1 or q02 ∊ F2}
𝛿3 = 𝛿1 U 𝛿2 U {𝛿3(q03,a) = p | ∃𝛿1(q01,a) = p} U
PROOF!!! Homework
{𝛿3(q03,a) = p | ∃𝛿2(q02,a) = p}
S.Motogna - Formal Languages & Compiler Design
M4 = (Q4, 𝚺4, 𝛿4, q04, F4)
Q4 = Q1 U Q2; q04 = q01;

F3 = F2 U {q ∊ F1 | if q02 ∊ F2}
𝛿3(q,a) = 𝛿1(q,a), if q ∊ Q1-F1
𝛿1(q,a) U 𝛿2(q02,a) if q ∊ F1
𝛿2(q,a), if q ∊ Q2 L(M3) = L(M1)L(M2)

PROOF!!! Homework

S.Motogna - Formal Languages & Compiler Design


M5 = (Q5, 𝚺1, 𝛿5, q05, F5) //IDEA: concatenate with itself
Q5 = Q1; q05 = q01
F5 = F1 U {q01}
𝛿5(q,a) = 𝛿1(q,a), if q ∊ Q1-F1
𝛿1(q,a) U 𝛿1(q01,a) if q ∊ F1

L(M3) = L(M1)*

PROOF!!! Homework

S.Motogna - Formal Languages & Compiler Design


Course 5
Pumping Lemma
• Not all languages are regular
• How to decide if a language is regular or not?

• Idea: pump symbols

Example: L = {0n1n | n>= 0}

S.Motogna - Formal Languages & Compiler Design


Theorem: (Pumping lemma, Bar-Hillel)

Let L be a regular language. ∃p ∊N, such that if w ∊L with |w|>p, then


w = xyz, where 0<|y|<=p
and
xyiz ∊L, ∀i ≥ 0

S.Motogna - Formal Languages & Compiler Design


Proof
L regular => ∃ M = (Q,𝜮,𝜹, q0, F) such that L= L(M)
Let |Q| = p
If w ∊ L(M): (q0,w) ⊢* (qf , 𝜺) , qf∈F process at least p+1 symbols
and
|w|>p p states

Þ ∃ q1 that appear in at least 2 configurations


(q0,xyz) ⊢* (q1,yz) ⊢+ (q1,z) ⊢* (qf , 𝜺) , qf∈F => 0<=|y|<=p

S.Motogna - Formal Languages & Compiler Design


Proof (cont)
(q0,xyiz) ⊢* (q1,yiz)
⊢* (q1,yi-1z)
⊢* ...
⊢* (q1,yz)
⊢* (q1, z)
⊢(q
*
f , 𝜺) , qf∈F
So, if w=xyz ∈ L then xyiz ∈ L, for all i>0
If i=0: (q0,xz) ⊢
* (q , z) ⊢(q
1
*
f , 𝜺) , qf∈F

S.Motogna - Formal Languages & Compiler Design


Example: L = {0n1n | n>= 0}
Suppose L is regular => w= xyz = 0n1n
Consider all possible decomposition =>
Case 1. y = 0k
xyz = 0n-k0k1n; xyiz = 0n-k0ik1n ∉ L
Case 2. y = 1k
xyz = 0n1k1n-k; xyiz = 0n1ik1n-k ∉ L => L is not regular
Case 3. y = 0k 1l
xyz = 0n-k0k1l1n-l; xyiz = 0n-k(0k1l)i1n-l ∉ L
Case 4. y = 0k 1K
xyz = 0n-k0k1k1n-k; xyiz = 0n-k0k1k0k1k...1n-l ∉ L

S.Motogna - Formal Languages & Compiler Design


Context free grammars (cfg)

S. Motogna - FL&CD
Context free grammar (cfg)
• Procdutions of the form: A →𝛼, A∊N, 𝛼∊(N∪𝜮)*

• More powerful

• Can model programming language:


G = (N, 𝜮,P,S) s.t. L(G) = programming language

S. Motogna - FL&CD
Syntax tree
Definition: A syntax tree corresponding to a cfg G = (N, 𝜮,P,S) is a tree
obtained in the following way:
1. Root is the starting symbol S
2. Nodes ∊ N∪𝜮:
1. Internal nodes ∊N
2. Leaves ∊𝜮
3. For a node A the descendants in order from left to right are X1, X2, ..., Xn
only if A → X1X2... Xn∊ P
Remarks:
a) Parse tree = syntax tree – result of parsing (syntatic analysis)
b) Derivation tree – condition 2.2 not satisfied
c) Abstract syntax tree (AST) ≠ syntax tree (semantic analysis)

S. Motogna - FL&CD
Syntax tree (cont)
Property: In a cfg G = (N, 𝜮,P,S), w ∊ L(G) if and only if there exists a
syntax tree with frontier w.

Proof: HomeWork

S. Motogna - FL&CD
Example: S-> aSbS | c; w = aacbcbc

Leftmost derivations Rightmost derivations

S => aSbS => aaSbSbS => aacbSbS S => aSbS => aSbc => aaSbSbc
=> aacbcbS => aacbcbc => aaSbcbc => aacbcbc
Definition: A cfg G = (N, 𝜮,P,S) is ambigous if for a w ∊ L(G) there exists
2 distinct syntax tree with frontier w.

Example:

S. Motogna - FL&CD
Parsing (syntax analysis) modeled with cfg:

cfg G = (N, 𝜮,P,S):


• N – nonterminal: syntactical constructions: declaration, statement, expression,
a.s.o.
• 𝜮 – terminals; elements of the language: identifiers, constants, reserved words,
operators, separators
• P – syntactical rules – expressed in BNF – simple transformation
• S – syntactical construct corresponding to program

THEN

Program syntactically correct <=> w ∊ L(G)


S. Motogna - FL&CD
Equivalent transformation of cfg

S. Motogna - FL&CD
• Unproductive symbols
• Inaccesible symbols
1. Determine elements (symbols/
productions): Greedy alg
• e - productions
2. eliminate them: construct equivalent
• Single productions grammar
Definition
Unproductive symbols A nonterminal A este unproductive in a cfg if
does not generate any word: {w| A =>* w, w Î
S*} = Æ.
Algorithm 1: Elimination of unproductive symbols
input: G = (N, S,P,S)
output: G’ = (N’, S,P’,S), L(G) = L(G’)
// idea: build N0,N1,... recursively (until saturation)
step 1: N0 = Æ; i:=1;
step 2: Ni = Ni-1 U {A| A®a Î P, a Î(Ni-1 U S)*}
step 3: if Ni <> Ni-1 then i:=i+1; goto step 2
else N’ = Ni
step 4: if S Ï N’ then L(G) = Æ
else P’ = {A®a | A®a Î P and A Î N’}

S. Motogna - FL&CD
Example
G = ({S,A,B,C,D}, {a,b,c}, P,S)
P: S ® aA | aC
A ® AB
B® b
C ® aC | CD
D® b

S. Motogna - FL&CD
Inaccesible symbols Definition
A symbol X Î NUS is inaccesible in a cfg if X does not
appear in any sentential form: ∀ S =>* a, X ∉ a

Algorithm 2: Elimination of inaccessible symbols


input: G = (N, S,P,S)
output: G’ = (N’, S’,P’,S), L(G) = L(G’) and
"X ÎNU S $a , b Î(N’U S’)* s.t. S =>*G’ aX b.
step 1: V0 = {S}; i:=1;
step 2: Vi = Vi-1 U {X| $ A ® aXb ÎP, A Î Vi-1}
step 3: if Vi <> Vi-1 then i:=i+1; goto step 2
else N’ = N Ç Vi
S’ = S ÇVi
P’ = {A®a | A®a Î P, A Î N’, a Î(N US)* }
S. Motogna - FL&CD
Example
G = ({S,A,B,C,D}, {a,b,c,d}, P,S)
P: S ® aA | aC
A ® AB
B® b
C ® aC | bCb
D ® bB | d

S. Motogna - FL&CD
Definition
e-productions A cfg G=(N,S,P,S) is without e-productions if
1. P ∌ A -> e (e-productions)
OR
2. ∃ S®e si S ∉ rhs(p),∀p ∊ P
Algorithm 3: Elimination of e-productions
input: cfg G = (N, S,P,S)
output: cfg G’ = (N’, S,P’,S’) step 2: Let P’ = set of productions built:
2.a. if A®a 0B1a1B2a2 . . . Bkak Î P, k>=0
and for i := 1,k Bi Î `N
step 1: construct `N = {A| A Î N, A=>+ e}
and aj Ï`N, j:=0,k
1.a. N0 := {A| A®e Î P};
i := 1; then add to P’ all prod of the form
1.b. Ni := Ni-1 U {A| A®a Î P, a ÎN*i-1} A®a 0X1a1X2a2 . . . Xkak
1.c. if Ni <> Ni-1 then i:=i+1; goto step 1.b where Xi is Bi or e (not A®e )
A->BC else `N = Ni 2.b if S ÎN’ then add S’ to N’ and S’® S|e to P
B->e else N’ := N; S’ := S.
C->e

S. Motogna - FL&CD
Example
G = ({S,A,B}, {a,b},P,S)
P: S ® aA | aAbB
A ® aA | B
B ® bB | 𝛆

S. Motogna - FL&CD
Definition
Single productions O production of the form A® B is called single production or
renaming rule.

Algorithm 4 : Elimination of single productions


Input: cfg G, without e-productions
Output: G’ s.t. L(G) = L(G’)
For each AÎN build the set NA ={B| AÞ *B} :
1.a. N0:={A}, i:=1
1.b. Ni:= Ni-1 È {C | B® C Î P si B Î Ni-1}
1.c. if Ni ¹Ni-1 then i:=i+1 goto 1.b.
else NA:= Ni
P’: for all AÎN do
for all BÎNA do
if B®a Î P and not “single” then A®a Î P’ G’ =(N,S,P’,S)

S. Motogna - FL&CD
Example
G = ({E,T,F},{a,(,),+,*},P,E)
P: E ® E+T | T
T ® T*F | F
F ® (E) | a

S. Motogna - FL&CD
Parsing
• Cfg G = (N, 𝚺, P,S) check if w ∊ L(G) S
• Construct parse tree

{
• How:
1. Top-down vs. Bottom-up
2. Recursive vs. linear a a a
1 i-1 i
Figura 3.2:Construct
¸ia arborelui prin analiza sintactic˘a LL(1)

S.Motogna - FL&CD
3.2.1. Gramatici de tip LL(k)
Course 6

S.Motogna - FL&CD
Problem: Parsing (construct the parsee tree)
if the source program is sintactically correct
then construct syntax tree
else ”syntax error”

*
source program is sintactically correct = w ∈ L(G) ó S => w

S.Motogna - FL&CD
Parsing
S

• How:
1. Top-down vs. Bottom-up A

{
2. Recursive vs. linear

a a a
1 i-1 i
Figura 3.2:Construct
¸ia arborelui prin analiza sintactic˘a LL(1)

S.Motogna - FL&CD
3.2.1. Gramatici de tip LL(k)
Descendent Ascendent

Recursive Descendent recursive Ascendent recursive


parser parser

Linear LL(k): LL(1) LR(k): LR(0), SLR, LR(1),


LALR

S.Motogna - FL&CD
Result – parse tree -representation
• Arbitrary tree – child sybling representation

• Sequence of derivations S => 𝜶1 => 𝜶2 =>… => 𝜶n = w

• String of production – index associated to prod – which prod is used


at each derivation step: 1,4,3,…

S.Motogna - FL&CD
1
S
index Info Parent Right
sibling
1 S 0 0 2 3 4 5
a S b S
2 a 1 0
3 S 1 2
6 7
4 b 1 3 c c
5 S 1 4
6 c 3 0
7 c 5 0

S.Motogna - FL&CD
Descendent recursive parser
• Example
S -> aSbS | aS | c

S.Motogna - FL&CD
Formal model Initial configuration:
• Configuration (q,1,𝜀, S)
(s, i, 𝛼, 𝛽)

where:
• s = state of the parsing, can be: Define moves between
• q = normal state configurations
• b = back state
• f = final state - corresponding to success: w ∊ L(G)
• e = error state – corresponding to insuccess: w ∉ L(G)
• i – position of current symbol in input sequence
w = a1a2…an, i ∊ {1,...,n+1}
• 𝛼 = working stack, stores the way the parse is built Final configuration:
• 𝛽 = input stack, part of the tree to be built (f,n+1, 𝛼,𝜀)
S.Motogna - FL&CD
Expand
WHEN: head of input stack is a nonterminal

(q,i, 𝜶, A𝜷) ⊢ (q,i, 𝜶A1, 𝜸1𝜷)

where:
A → 𝜸1 | 𝜸2 | … represents the productions corresponding to A
1 = first prod of A

S.Motogna - FL&CD
Advance
WHEN: head of input stack is a terminal = current symbol from input

(q,i, 𝜶, ai𝜷) ⊢ (q,i+1, 𝜶ai, 𝜷)

S.Motogna - FL&CD
Momentary insuccess
WHEN: head of input stack is a terminal ≠ current symbol from input

(q,i, 𝜶, ai𝜷) ⊢ (b,i, 𝜶, ai𝜷)

S.Motogna - FL&CD
Back
WHEN: head of working stack is a terminal

(b,i, 𝜶a, 𝜷) ⊢ (b,i-1, 𝜶, a𝜷)

S.Motogna - FL&CD
Another try
WHEN: head of working stack is a nonterminal

(b,i, 𝜶 Aj, 𝜸j 𝜷) ⊢ (q,i, 𝜶Aj+1, 𝜸j+1𝜷) , if ∃ A → 𝜸j+1


(b,i, 𝜶, A 𝜷), otherwise with the exception
(e,i, 𝜶, 𝜷), if i=1, A =S, ERROR

S.Motogna - FL&CD
Success

(q,n+1, 𝜶, 𝜀) ⊢ (f,n+1, 𝜶, 𝜀)

S.Motogna - FL&CD
Algorithm

S.Motogna - FL&CD
S.Motogna - FL&CD
w ∊ L(G) - HOW
• Process 𝜶:
• From left to right (reverse if stored as stack)
• Skip terminal symbols
• Nonterminals – index of prod

• Example: 𝜶 = S1 a S2 a S3 c b S3 c

S.Motogna - FL&CD
When the algorithm never stops?
• S->S𝝰 – expand infinitely (left recursive)

S.Motogna - FL&CD
LL(1) Parser

S.Motogna - FL&CD
S

{
a a a
1 i-1 i
Linear algorithm
Figura 3.2:Construct
¸ia arborelui prin analiza sintactic˘a LL(1)

3.2.1. Gramatici de tip LL(k)


Definit
¸ia 3.1.[AU73]O gramatic˘a G = (N,
S.Motogna ß, P, S) este de tip LL(k)
- FL&CD
• predict
¸ia de lungime k:
ai+1 . . . ia+k ,

FIRST
dup˘a kcum se observ˘a ¸si din alegerea
¸ieiproduct
A ! Ø ˆın figura 3.2.
Predict ¸ia de lungime k reprezint˘a urm˘atoarele k simbolur
•bui generate
≈ first k terminaldin configurat
symbols ¸ia can
that curent˘a. Pentrufrom
be generated aceasta
𝛼 se introduce o
•FDefinition:
IRST k [ASU86], care calculeaz˘a primele k simboluri ce¸ine se pp
deriv˘ari succesive dintr-o anumit˘ a form˘a propozit
¸ional˘a:
F IRST k : (N [ ß) § ! P(ß k )
k § §
F IRST k (Æ) = {u|u 2 ß, Æ) ux, |u| = k sau Æ) u, |u| ∑ k}
(primele k simboluri ale lui Æ)

S.Motogna - FL&CD
LL(1) Parser

S.Motogna - FL&CD
S

{
a a a
1 i-1 i
Linear algorithm
Figura 3.2:Construct
¸ia arborelui prin analiza sintactic˘a LL(1)

3.2.1. Gramatici de tip LL(k)


Definit
¸ia 3.1.[AU73]O gramatic˘a G = (N,
S.Motogna ß, P, S) este de tip LL(k)
- FL&CD
Operation: ⊕ = concatenation of length 1
L1 = {aa,ab,ba}
L2 = {00,01}
L1⊕L2 = {a,0}

L1={a, 𝛆}
L2={0,1}
L1⨁L2 ={a,0,1}

S.Motogna - FL&CD
• predict
¸ia de lungime k:
ai+1 . . . ia+k ,

FIRST
dup˘a kcum se observ˘a ¸si din alegerea
¸ieiproduct
A ! Ø ˆın figura 3.2.
Predict ¸ia de lungime k reprezint˘a urm˘atoarele k simbolur
•bui generate
≈ first k terminaldin configurat
symbols ¸ia can
that curent˘a. Pentrufrom
be generated aceasta
𝛼 se introduce o
•FDefinition:
IRST k [ASU86], care calculeaz˘a primele k simboluri ce¸ine se pp
deriv˘ari succesive dintr-o anumit˘ a form˘a propozit
¸ional˘a:
F IRST k : (N [ ß) § ! P(ß k )
k § §
F IRST k (Æ) = {u|u 2 ß, Æ) ux, |u| = k sau Æ) u, |u| ∑ k}
(primele k simboluri ale lui Æ)

S.Motogna - FL&CD
FIRSTk
• Which are the first k terminal symbols that can be generated from A?

• https://forms.office.com/r/kNHNGW7XtC

S.Motogna - FL&CD
3.2.3. Construirea tabelului de analiz˘a LL(1)
Construct FIRST din tabeldepinde de valorile funct
Calculul elementelor ¸iei F IRST .
Pentru a putea descrie o metod˘a de calcul Concatenation
a lui F IRST avem nevoie de
urm˘atoarea
ØFIRST denotedproprietate:
FIRST of length 1
1
Observat
ØRemarks: ¸ii [GJ90]:

• Dac˘a
If L, L2 sunt are
1 dou˘a limbaje
2 languages overpeste alfabetul
alphabet 𝛴, then ß, atunci:
L 1 © L2 =
{w|x 2 L1, y 2 L2, xy = w, |w| ∑ 1 sau xy = wz, |w| =and 1} ¸si

• F IRST (ÆØ) = F IRST (Æ) © F IRST (Ø)


F IRST (X 1 . . . Xn ) = F IRST (X1) © . . . © F IRST n(X
)

55
S.Motogna - FL&CD
A -> BC
B -> DA
D -> a
F0(A)=F0(B)=∅; F0(D)={a}
A F1(A) =F0(A) U {…| A->BC F0(B)⊕F(D)}= ∅
F1(B) ={a}

S.Motogna - FL&CD
• predict
¸ia ai - urm˘atorul simbol de pe banda de int

determin˘a ˆın mod unic alegerea unei¸ii


product
A ! Æ.
FOLLOW Teorema 3.2.[S¸er87] O gramatic˘a este de tip LL(1) d
pentru Afiecare
→ 𝛆 neterminal A cu product ¸iile A ! Æ1|Æ2| . . . |Æ
n, F
§
\F IRST k (Æj ) = ; ¸si dacˇa
i Æ
) ≤, F IRST (Æi )\F OLLOW (A)
S 1, n, i 6= j.
Ø FOLLOWk(A)≈ next k symbols
generated
Dup˘a cum after/ teorema,
sugereaz˘a following¸ie
Ao situat
mai special˘a ap
A primulsimboldin rescrierea lui A, F IRST (A), este secven
Follow(A)
¸a
lizˆand arborele deja construit, S=>* xBy se poate observa c˘a ˆ
{ lu˘am ˆın considerare simbolul care
A. Pentru aceasta se introduce o nou˘a
=> xaAy
What ifse obt
¸ine din ceea ce ”u
B->uA
¸iefunct
[AU73]:
a a ai F OLLOW : (N [ ß) § ! P(ß)
1 i-1 §
F OLLOW (Ø) = {w 2 ß|S) ÆØ∞, w 2 F IRST (∞)}.
Figura 3.2:Construct
¸ia arborelui prin analiza sintactic˘a LL(1)
Pentru a construi un analizor sintactic LL(1) avem n
care pot ap˘area pe parcursul analizei ¸si care se mem
3.2.1. Gramatici de tip LL(k) numit tabel de- FL&CD
S.Motogna analiz˘a LL(1).
Definit
¸ia 3.1.[AU73]O gramatic˘a G = (N, ß, P, S) este de tip LL(k)
S =>0 S // 𝜺 after S

S => aAc=> abBc


A -> bB

S.Motogna - FL&CD
FIRST
• ≈ first terminal symbols that can be generated from 𝛼

FOLLOW
• ≈ next symbol generated after/ following A

S.Motogna - FL&CD
LL(k) LL(k) Principle
• L = left (sequence is read from
left to right)
• L = left (use leftmost derivation) • In any moment of parsing, acțion
• Prediction of length k is uniquely determinde by:
• Closed part (a1…ai)
S • Current symbol A
• Prediction ai+1…ai+k (length k)
A
{

a a a ai+1…ai+k
1 i-1 i S.Motogna - FL&CD
Definition
3.2.1. Gramatici de tip LL(k)
Definit¸ia 3.1.[AU73]O gramatic˘a G = (N, ß, P, S) este de tip LL(k
• Adac˘a
cfg is pentru
LL(k) iforicare
for anydou˘a deriv˘ari
2 leftmost de stˆanga:
derivation we have:
§ §
1. S ) wAÆ
st
) st wØÆ) wx;
st

§ §
2. S ) wAÆ
st
) st w∞Æ) wy;
st

astfel
such ˆıncˆat
that F IRST
k (x) = F IRST k (y) avem
thenc˘a:
Ø = ∞.

Definit
¸ia poate fi reformulat˘a astfel: pentru orice form˘a propozit
¸ional˘a
wAÆ, primele k simboluri derivabile din AÆ definesc ˆın mod unic
ţie care se poate aplica lui A pentru ¸ine a obto derivare a unui cuvˆant
secvent¸˘a de simboluri terminale) care ˆıncepe cu w ¸si se continu
S.Motogna - FL&CD
simboluri.Aceast˘a condit¸ie este uneori dificil de verificat ¸si ˆın majo
Theorem
The necessary and sufficient condition for a grammar to be LL (
𝛽,
that for any pair of distinct productions of a nonterminal (A→
A→ 𝛾, 𝛽≠𝛾) the condition holds:

*
FIRSTk(𝛽𝛼) ⋂ FIRSTk(𝛾𝛼)= 𝛷,∀𝛼 astfel încât S => uA𝛼
such that

Theorem: A grammar is LL(1) if and only if for any nonterminal A with


productions A →𝛼1| 𝛼2|...| 𝛼n , FIRST(𝛼i) ∩ FIRST(𝛼j) = ∅ and if 𝛼i⇒𝜀, FIRST(𝛼i)
∩ FOLLOW(A)= ∅, ∀i,j = 1,n,i≠j

S.Motogna - FL&CD
LL(1) Parser
• Prediction of length 1

• Steps:
1) construct FIRST, FOLLOW Executed 1 time
2) Construct LL(1) parse table
3) Analyse sequence based on moves between configurations

S.Motogna - FL&CD
Step 2: Construct LL(1) parse table
• Possible action depend on:
• Current symbol ∈ N∪𝚺
• Possible prediction ∈ 𝚺
• Add a special character “$” ( ∉ N∪𝚺) – marking for “empty stack”

= > table:
• One line for each symbol ∈ N∪𝚺 ∪{$}
• One column for each symbol ∈ 𝚺 ∪{$}

S.Motogna - FL&CD
pentru fiecare predict
¸ie posibil˘a.
In plus, se adaug˘a un caracter special, de
obicei notat ’$’2 (N/ [ß), al c˘arui scop este s˘a marcheze sfˆar¸situl
¸ei secvent
¸si c˘aruia i se aloc˘a o linie ¸si o coloan˘a Efectul
ˆın tabel.
acestui simbol ˆın
Rules LL(1) table
faza de analiz˘a propriu-zis˘a este de a elimina verific˘arile de stiv˘a goal˘
Regulile de completare a tabelului sunt:
1. M (A, a) = (Æ, i), 8a 2 F IRST (Æ), a 6= ≤, A ! production
Æ product
¸ie ˆın
in PP cu
num˘arul
with numberi;i
M (A, b) = (Æ, i),dac˘a
if ≤ 2 F IRST (Æ), 8b 2 F OLLOW (A), A ! Æ
product¸ie in
production ˆın P cunumber
P with num˘arul
i i;

57
2. M (a, a) = pop, 8a 22ß;
3. M ($, $) = acc;
(error)ˆın
4. M(x,a)=err (eroare) otherwise
celelalte cazuri.
Pentru gramatica din exemplul precedent, construct ¸ia tabelului de ana-
liz˘a LL(1) necesit˘a ¸si calculul¸imilormult F OLLOW pentru neterminalele
A ¸si C, deoarece ≤ 2 F IRST (A) ¸si ≤ 2 F IRSTAplicarea
S.Motogna - FL&CD
(C). algoritmului
Remark

A grammar is LL(1) if the LL(1) parse table does


NOT contain conflicts – there exists at most one
value in each cell of the table M(A,a)

S.Motogna - FL&CD
Step 3: Definire configurations and moves
• INPUT:
• Language grammar G = (N, 𝚺, P,S)
• LL(1) parse table
• Sequence to be parsed w =a1…an
• OUTPUT:
If (w ∈L(G)) then string of productions
else error & location of error

S.Motogna - FL&CD
LL(1) configurations
Initial configuration:
(w$,S$,𝜀)
(𝛼 , 𝛽 , 𝜋 )
where:
• 𝛼 = input stack
• 𝛽 = working stack
• 𝜋 = output (result) Final configuration:
($, $, 𝜋)

S.Motogna - FL&CD
product ¸ii folosit. Se observ˘a c˘a acceptareasecvent unei¸e se face pe baza
1. push - operat ¸ia de punere ˆın stiv˘a:
criteriului
1. push stivei vide.
- operat ¸ia de punere ˆın stiv˘a:
Moves (ux,
Tranzit AÆ$,
¸iile º) ` (ux, ØÆ$,
se definesc ºi),urm˘ator:
ˆın felul dacˇa M (A, u) = (Ø, i);
(ux, AÆ$, º) ` (ux, ØÆ$, ºi), dacˇa M (A, u) = (Ø, i);
de fapt, ˆın stiva de lucru se efectueaz˘a urm˘atoarele ¸ii: se scoate
operat
1. 1.Push
push
dedin
A –fapt,
put inˆın
-stiv˘a
operatstack
¸ia de
sepunere
stiva
¸si ؈ın
de lucru
pune stiv˘a:
ˆınse efectueaz˘a urm˘atoarele
stiv˘a; ¸ii: se scoate
operat
A din
(ux, AÆ$, stiv˘a
º) ` ¸si
(ux,se pune
ØÆ$, ºi),Ø ˆın stiv˘a;
if dacˇa M (A, u) = (Ø, i);
2. pop - operat ¸ia de scoatere din stiv˘a, se elimin˘a vˆarfurile ambelor
2.de(pop
fapt,
stive
pop A and
-(dac˘a
operat push
ˆın stiva
ele de
¸ia lucruofse𝛽)efectueaz˘a
symbols
de scoatere
coincid): din stiv˘a, urm˘atoarele
se elimin˘a ¸ii:vˆarfurile
se scoate
operatambe
2. PopAstive
din stiv˘a ¸siele
se pune Ø ˆın stiv˘a;
(ux, aÆ$, º) ` (x, Æ$, º), dac˘a stacks)
– take off
(dac˘afrom stack (from
coincid): both M(a,u)=pop ;
2. pop
(ux,- aÆ$,
operat ¸ia de scoatere ifdin stiv˘a,
se elimin˘a vˆarfurile ambelor
3. tranzit ¸ia de º) ` (x,
acceptare,Æ$, º),
dac˘a dac˘a
s-a obt M(a,u)=pop
¸inut configurat ;
¸ia final˘a, notat˘a
stive (dac˘a ele coincid):
3. Accept
3. acc:
tranzit ¸ia de acceptare, dac˘a s-a obt ¸inut configurat¸ia final˘a, notat˘a
(ux, aÆ$, º) ` (x, Æ$, º), dac˘a M(a,u)=pop ;
($,
acc: $, º) ` acc ;
4. 3.
Error
($,- otherwise
tranzit ¸iaº)de
$, acceptare,
` acc dac˘a s-a obt
; eroare, ¸inut configurat
¸ia final˘a, notat˘a
4. ˆın celelalte cazuri notat˘a err:
acc:
(nÆ,
4.($,
ˆın xØ$, º) `cazuri
celelalte err. eroare, notat˘a err:
$, º) ` acc ;
(nÆ, xØ$, º) `tranzit
Corespunz˘ator err.
¸iilor deS.Motogna
mai - sus, analiza sintactic˘a LL(1) se face
4. ˆın celelalte cazuri eroare, notat˘a err:
FL&CD
Algorithm LL(1) parsing
• INPUT:
§ LL(1) table with NO conflicts;
§ G –grammar (productions)
§ Input sequence w = a1a2 . . . an

• OUTPUT:
§ sequence accepted or not?
§ If yes then string of productions

S.Motogna - FL&CD
Algorithm LL(1) parsing (cont)
alpha := w$;beta := S$;pi := ɛ; config =(alpha,beta, pi)
go := true;
while go do
if M(head(beta),head(alfa))=(b,i) then
ActionPush(config)
else
if M(head(beta),head(alfa))=pop then
ActionPop(config)
else
if M(head(beta),head(alfa))=acc then
go:=false; s:=”acc”;
else go:=false; s:=”err”;
if s=”’acc”’ then
end if
write(”Sequence accepted”);
end if write(pi)
end if else
end while write(” Sequence not accepted”)

S.Motogna - FL&CD
Remarks
1) LL(1) parser provides location of the error

2) Grammars can be transformed to be LL(1)


example:
I -> if C then S | if C then S else S // is not LL(1)

I -> if C then S T
T -> ɛ | else S // is LL(1)

S.Motogna - FL&CD
Play time!!!
• Menti.com cod: 42 60 49

S.Motogna - FL&CD
Curs 8
LR(k) parsing

S.Motogna - FL&CD
Reminder:

Terms rhp = right handside of production


lhp = left handside of production

• Prediction – see LL(1)


• Handle = symbols from the head of the working stack that form (in order)
a rhp

• Shift – reduce parser:


• shift symbols to form a handle
• When a rhp is formed – reduce to the corresponding lhp

S.Motogna - FL&CD
LR(k)
• L = left – sequence is read from left to right
• R = right – use rightmost derivations
• k = length of prediction

• Enhanced grammar

• G = (N, Σ,P,S)
• G’ =(N ∪ {S’},Σ,P ∪ {Sʹ → S},Sʹ), S’∉ N S’ does NOT appear in any rhp

S.Motogna - FL&CD
LR(k)
• Ascendent

• Linear – COST? – what we compute to obtain linear algorithm?

S.Motogna - FL&CD
• Definition 1: If in a cfg G = (N, Σ, P, S) we have
* r αAw ⇒r αβw, where α ∈ (N ∪Σ)∗,A ∈ N,w ∈ Σ∗, then
S =>
any prefix of sequence αβ is called live prefix in G.

• Definition 2: LR(k) item is defined as [A → α.β,u], where A → αβ is a


production, u ∈ Σk and describe the moment in which, considering
the production A → αβ, α was detected (α is in head of stack) and it is
expected to detect β.

• Definition 3: LR(k) item [A → α.β,u] is valid for the live prefix γα if:
*
S⇒r γAw ⇒r γαβw
u = FIRSTk(w)

S.Motogna - FL&CD
Definition 4: A cfg G = (N, Σ, P, S) is LR(k), for k>=0, if
*
1. S’⇒r αAw ⇒r αβw
*
2. S’⇒r γBx ⇒r αβy => α = γ AND A =B AND x=y

3. FIRSTk(w) = FIRSTk(y)

S.Motogna - FL&CD
• [A → αβ.,u] – special case: prefix is all rhp - apply reduce

• Otherwise [A → α.β,u] – apply shift

Consequence 1: state is important –


should be stored by parsing method state
Þ Working stack:
decide
$sinitX1s1 . . . Xmsm

where: $ - mark empty stack action


Xi ∈N∪∑
si - states goto
Consequence 2: the action takes the
parsing process to another state (goto)
state
S.Motogna - FL&CD
LR(k) principle
• Current state
• Current symbol
• prediction
uniquely determines:
• Action to be applied
• Move to a new state

=> LR(k) table – 2 parts: action part + goto part


S.Motogna - FL&CD
States
What a state contains? How to go from one state to another
state? How many states?
• LR items – all items • goto
corresponding to same live • Canonical collection
prefix
• closure

S.Motogna - FL&CD
determina
• tranzit comportamentul
¸ia ˆın alt˘a stare. analizorului, caracterizat p
De
De aceea
•aceea
act tabelele
¸iunea
tabelelecaredese analiz˘a
de va efectua
analiz˘a LR(k)
¸si au
LR(k) dou˘a compone
au dou˘a compone
de aç
de aç
Whatde LR item will
deplasare,
de deplasare, be in
numit˘a
numit˘athe same
”goto“.
”goto“. state?
Care • tranzit ¸ia
sunt ¸si cumˆın se
alt˘a stare.
determin˘a aceste st˘ari?
Pentru a r˘asp
Care sunt ¸sicum se determin˘a aceste st˘ari? Pentru a r˘asp
consider˘am elementul de analiz˘a [A ! Æ.BØ, u] care,conform d
• [A → α.Bβ,u] De aceea
consider˘am
valid for live tabelele
elementul deγα
prefix de analiz˘a
analiz˘a
=> LR(k) u]
[A ! Æ.BØ, au dou˘a comp
care,conform de
d
implic˘a:
de§ deplasare, numit˘a ”goto“.
implic˘a:
S ) § dr ∞Aw ) dr ∞ÆBØw ¸si
S ) Care
dr ∞Aw sunt ) dr¸si
cum se determin˘a
∞ÆBØw ¸si aceste st˘ari?
Pentru a r˘
uconsider˘am
= F IRSTk (w)elementul
valabil
de pentru
analiz˘a prefixul
[A ! viabilu]
Æ.BØ, ∞Æ.
care,conform
u = F IRSTk (w) valabil pentru prefixul viabil ∞Æ.
Dac˘a
implic˘a:ˆın gramatic˘a exist˘a¸ie o product
B ! ± atunci elementul
Dac˘a§ ˆın gramatic˘a exist˘a¸ie o product
B ! ± atunci elementul
[B ! .±,S u]) are,
∞Aw de )asemenea,
∞ÆBØw *u
¸sivalid pentru prefixul viabil
• B → δ[B∈P!=>
.±, u] dr are, de asemenea,
dr => 𝜸𝜶𝜹w’
u drvalid pentru prefixul viabil
u = F IRSTk (w) valabil pentru prefixul viabil ∞Æ.
Dac˘a ˆın gramatic˘a exist˘a¸ie o product
B ! ± atunci elemen
=> [B → .δ,u] valid for live prefix γα
[B ! .±, u] are, de asemenea, u valid pentru prefixul via

S.Motogna - FL&CD
LR(k) parsing:
LR(0), SLR, LR(1), LALR
• Define item
• Construct set of states Executed 1 time
• Construct table

• Parse sequence based on moves between configurations

S.Motogna - FL&CD
LR(0) Parser

• Prediction of length 0 (ignored)

1. LR(0) item: [A → α.β]

S.Motogna - FL&CD
2. Construct set of states
• What a state contains – Algorithm closure_LR(0)
• How to move from a state to another – Function goto_LR(0)
• Construct set of states – Algorithm ColCan_LR(0)

Canonical collection

S.Motogna - FL&CD
2. closure(I) = I [ {[B ! .±]|[A ! Æ.BØ] 2 I}, conform observat
¸iei
din paragraful anterior.

Algorithm Closure_LR(0)
Algoritmul 3.8 ClosureLR0
INPUT: I-element de analiz˘a; G’- gramatica ˆımbog˘at
¸it˘a
OUTPUT: C = closure(I);
C := {I};
repeat
for 8[A ! Æ.BØ] 2 C do
for 8B ! ∞ 2 P do
if [B ! .∞] 2
/ C then
C = C [ [B ! .∞]
end if
end for
end for
until C nu se mai modific˘a

Pentru a determina st˘arile ¸si cum se deplaseaz˘a automatul dint


S.Motogna - FL&CD
Function goto_LR(0)
goto : P(ℰ0) × (N ∪ Σ) → P(ℰ0)
where ℰ0 = set of LR(0) items

goto(s, X) = closure({[A → αX.β]|[A → α.Xβ] ∈ s})

S.Motogna - FL&CD
Algorithm ColCan_LR(0)
Algoritmul 3.9 Col stariLR0
INPUT: G’- gramatica ˆımbog˘at ¸ită S-> aS|bSc|dA
OUTPUT: C - colecţia canonic˘a de st˘ari A -> dc
C := ;;
0
s0 := closure({[S ! .S]}) Goto(s0,S)
C := C [ {s 0}; Goto(s0,A)
repeat Goto(s0,a)
for 8s 2 C do Goto(s0,b)
Goto(s0,c) =∅
for 8X 2 N [ ß do
Goto(so,d)
if goto(s, X)=6 ; and goto(s, X)2/C then
C = C [ goto(s, X)
end if
end for
end for
until C nu se mai modific˘a

S.Motogna - FL&CD
A!c
3. Construct LR(0) table
• one line for each state

• 2 parts:
• Action: one column (for a state, action is unique because prediction is
ignored)
• Goto: one column for each symbol X ∈ N ∪ Σ

S.Motogna - FL&CD
Rules LR(0) table
1. if [A → α.β] ∈ si then action(si)=shift
2. if [A → β.] ∈ si and A ≠ Sʹ then action(si)=reduce l, where l =
number of production A → β
3. if [Sʹ → S.] ∈ si then action(si)=acc
4. if goto(si, X) = sj then goto(si, X) = sj
5. otherwise = error

S.Motogna - FL&CD
Remarks
1) Initial state of parser = state containing [Sʹ → .S]
2) No shift from accept state:
if s is accept state then goto(s, X) = ∅, ∀X ∈ N ∪ Σ.
3) If in state s action is reduce then goto(s, X) = ∅, ∀X ∈ N ∪ Σ.
4) Argument G’: Let G = ({S},{a,b,c},{S → aSbS,S → c},S)
states [S → aSbS.] and [S → c.] – accept / reduce ?

S.Motogna - FL&CD
Remarks (cont)
5) A grammar is NOT LR(0) if the LR(0) table contains conflicts:
• shift – reduce conflict: a state contains items of the form [A → α.β]
and [B → γ.], yielding to 2 distinct actions for that state

• reduce – reduce conflict: when a state contains items of the form


[A → αβ.] and [B → γ.], in which the action is reduce, but with
distinct productions

S.Motogna - FL&CD
4. Define configurations and moves
• INPUT:
• Grammar G’ = (NU{S’}, 𝚺, P U {S’->S},S’)
• LR(0) table
• Input sequence w =a1…an
• OUTPUT:
if (w ∈L(G)) then string of productions
else error & location of error

S.Motogna - FL&CD
LR(0) configurations
Initial configuration:
($s0,w$,𝜀)
(𝛼 , 𝛽 , 𝜋 )
where:
• 𝛼 = working stack
• 𝛽 = input stack
• 𝜋 = output (result) stack Final configuration:
($sacc, $, 𝜋)

S.Motogna - FL&CD
Moves
1. Shift
if action(sm)= shift AND head(𝛽)=ai AND goto(sm,ai)=Sj then
($s0x1 ...xmsm,ai ...an$, 𝜋) ⊢ ($s0x1 ...xmsmaisj,ai+1 ...an$, 𝜋)
2. Reduce
if action(sm) = reduce l AND (l) A → xm−p+1 ...xm AND goto(sm−p,A) = sj then
($s0 ...xmsm,ai ...an$, 𝜋) ⊢ ($s0 ...xm−psm−pAsj,ai ...an$,l 𝜋)
3. Accept
if action(sm) = accept then ($sm,$, 𝜋)=acc
4. Error - otherwise

S.Motogna - FL&CD
LR(0) Parsing Algorithm
INPUT:
- LR(0) table – conflict free
- grammar G’: production numbered
• - sequence = Input sequence w =a1…an
• OUTPUT:
if (w ∈L(G)) then string of productions
else error & location of error

S.Motogna - FL&CD
LR(0) Parsing Algorithm state :=0;
alpha := ‘$s0’; beta :=‘w$’; phi := ‘’; end:= false
Config := (alpha,beta,phi);
Repeat
if action(state)=‘shift’ then
ActionShift(config)
else
if action(state) =’reduce l” then
ActionReduce(config)
else
if action(state)=‘accept’ then
write(” success”,); write(phi);
end := true;
if action(state) = ‘error’ then
write(” error”)
end := true
Until end

S.Motogna - FL&CD
Course 9
LR(k) Parsing (cont.)

S.Motogna - FL&CD
LR(k) parsing:
LR(0), SLR, LR(1), LALR
• Define item
• Construct set of states Executed 1 time
• Construct table

• Parse sequence based on moves between configurations

S.Motogna - FL&CD
Algorithm ColCan_LR(0)
Algoritmul 3.9 Col stariLR0
INPUT: G’- gramatica ˆımbog˘at ¸ită
OUTPUT: C - colecţia canonic˘a de st˘ari
C := ;;
0
s0 := closure({[S ! .S]}) // state corresponding to prod. of S’ = initial state
C := C [ {s 0}; //initialize collection with s0
repeat
for 8s 2 C do
for 8X 2 N [ ß do
if goto(s, X)=6 ; and goto(s, X)2/C then
C = C [ goto(s, X) //add new state
end if
end for
end for
until C nu se mai modific˘a

S.Motogna - FL&CD
A!c
2. closure(I) = I [ {[B ! .±]|[A ! Æ.BØ] 2 I}, conform observat
¸iei
din paragraful anterior.

Algorithm Closure
Algoritmul 3.8 ClosureLR0
I = LR(0) item of the form [A->𝜶.𝜷]

INPUT: I-element de analiz˘a; G’- gramatica ˆımbog˘at ¸it˘a


OUTPUT: C = closure(I);
C := {I}; //initialize Closure with the LR(0) item
repeat
for 8[A ! Æ.BØ] 2 C do //search productions with dot in front of nonterminal
for 8B ! ∞ 2 P do //search productions of that nonterminal
if [B ! .∞] 2 / C then
C = C [ [B ! .∞] //adds item formed from production with dot in
end if //front of right hand side of the production
end for
end for
until C nu se mai modific˘a

Pentru a determina st˘arile ¸si cum se deplaseaz˘a automatul dint


S.Motogna - FL&CD
Function goto
goto : P(ℰ0) × (N ∪ Σ) → P(ℰ0) //creates new states

where ℰ0 = set of LR(0) items

goto(s, X) = closure({[A → αX.β]|[A → α.Xβ] ∈ s})

goto(s,X): in state s, search LR(0) item that has dot in front of symbol X.
Move the dot after symbol X and call closure for this new item.

S.Motogna - FL&CD
SLR Parser
Prediction = next symbols on
input sequence
• SLR = Simple LR

• Remark:
LR(0) – lots of conflicts – solved if considering prediction

=>
1. LR(0) canonical collection of states– prediction of length 0
2. Table and parsing sequence – prediction of length 1

S.Motogna - FL&CD
SLR Parsing:

• define item LR(0)


• Construct set of states LR(0)
• Construct table
• Parse sequence based on moves between configurations

S.Motogna - FL&CD
Construct SLR table
Remarks:
1. Prediction = next symbol from input sequence => FOLLOW
- see LL(1)
2. Structure – LR(k):
• Lines - states
• action + goto Optimize table structure:
merge action and goto
action – a column for each prediction ∈𝞢 columns for Σ
goto – a column for each symbol X ∈N∪𝞢
Remark (LR(0) table):
• if s is accept state then goto(s, X) = ∅, ∀X ∈ N ∪ Σ.
• If in state s action is reduce then goto(s, X) = ∅, ∀X ∈ N ∪ Σ.

S.Motogna - FL&CD
SLR table And goto

Action GOTO

a1 … an B1 … Bm
a1,…,an ∈𝞢
s0 B1,...,Bm ∈N
s0,…,sk - states
s1

sk

S.Motogna - FL&CD
Rules for SLR table
1. If [A → α.β] ∈ si and goto(si,a) = sj then action(si,a)=shift s j
// dot is not at the end

2. if [A → β.] ∈ si and A ≠ Sʹ then action(s i,u)=reduce l, where l –


number of production A → β, ∀u ∈ FOLLOW(A)
//dot is at the end, but not for S’

3. if [Sʹ → S.] ∈ si then action(si,$)=acc


// dot is at the end, prod. of S’

4. if goto(si, X) = sj then goto(si, X) = sj , ∀X ∈N


5. otherwise error

S.Motogna - FL&CD
Remarks
1. Similarity with LR(0)

2. A grammar is SLR if the SLR table does not contain conflicts (more
than one value in a cell)

S.Motogna - FL&CD
Parsing sequences
• INPUT:
• Grammar G’ = (NU{S’}, 𝚺, P U {S’->S},S’)
• SLR table
• Input sequence w =a1…an
• OUTPUT:
if (w ∈L(G)) then string of productions
else error & location of error

S.Motogna - FL&CD
SLR = LR(0) configurations
Initial configuration:
($s0,w$,𝜀)
(𝛼 , 𝛽 , 𝜋 )
where:
• 𝛼 = working stack
• 𝛽 = input stack
• 𝜋 = output (result) Final configuration:
($sacc, $, 𝜋)

S.Motogna - FL&CD
Moves
head(𝛽) = prediction
1. Shift
if action(sm,ai)= shift s j then
($s0x1 ...xmsm,ai ...an$, 𝜋) ⊢ ($s0x1 ...xmsmaisj,ai+1 ...an$, 𝜋)
2. Reduce
if action(sm,ai) = reduce t AND (t) A → xm−p+1 ...xm AND goto(sm−p,A) = sj
then
($s0 ...xmsm,ai ...an$, 𝜋) ⊢ ($s0 ...xm−psm−pAsj,ai ...an$,t 𝜋)
3. Accept
if action(sm,$) = accept then ($sm,$, 𝜋)=acc
4. Error - otherwise

S.Motogna - FL&CD
LR(1) Parser
[A→𝜶.𝜷,u]

1. Define item Kernel prediction


2. Construct set of states
3. Construct table
4. Parse sequence based on moves between configurations

S.Motogna - FL&CD
Construct LR(1) set of states
• Alg ColCan_LR1
• Function goto_LR1
• Alg Closure_LR1

S.Motogna - FL&CD
INPUT: G’ – enhanced grammar
Algorithm ColCan_LR1 OUTPUT: C1– cannonical collection of states
C1=∅
S0 = Closure_LR1({[S’→.S,$]})
C1:= C1U {s0}
Repeat
for ∀s ∊ C1do
for ∀ X ∊ N U𝛴 do
T = goto_LR1(s,X)
if T≠ ∅ and T ∉ C1then
C1= C1U T
endif
endfor
endfor
Until C1unchanged

S.Motogna - FL&CD
Function goto_LR1
Goto_LR1 : P(ℰ1) × (N ∪ Σ) → P(ℰ1)
where ℰ1 = set ofLR(1) items

Goto_LR1(s, X) = Closure_LR1({[A → αX.β,u]|[A → α.Xβ,u] ∈ s})

S.Motogna - FL&CD
• tranzit
¸ia ˆın alt˘a stare.
De
De aceea
aceea tabelele
tabelele de de analiz˘a
analiz˘a LR(k)
LR(k) au dou˘a compone
au dou˘a compone
de aç
de aç
Algorithm Closure_LR1
de deplasare, numit˘a
de deplasare, numit˘a ”goto“. ”goto“.
Care sunt ¸sicum se determin˘a aceste st˘ari? Pentru a r˘asp
Care sunt ¸sicum se determin˘a aceste st˘ari? Pentru a r˘asp
consider˘am elementul de analiz˘a [A ! Æ.BØ, u] care,conform d
consider˘am
• [A → α.Bβ,u] valid for elementul
live deγα
prefix analiz˘a
=> [A ! Æ.BØ, u]
care,conform d
implic˘a:
implic˘a:
§
S ) § dr ∞Aw ) dr ∞ÆBØw ¸si
S ) dr ∞Aw ) dr ∞ÆBØw ¸si
u = F IRSTk (w) valabil pentru prefixul viabil ∞Æ.
u = F IRSTk (w) valabil pentru prefixul viabil ∞Æ.
Dac˘a ˆın gramatic˘a exist˘a¸ie o product
B ! ± atunci elementul
Dac˘a ˆın gramatic˘a exist˘a¸ie o product
B ! ± atunci elementul
[B ! .±, u] are, de§ asemenea, u valid pentru prefixul viabil
• [B → .δ,
[Bsmth]∈P
! .±, u]=> are,Sde ) ∞Aw ) dr ∞ÆBØw
asemenea, )dr ∞ƱØw.
u valid pentru prefixul viabil
Aceast˘a observat ¸ie sugereaz˘a faptul
c˘a el
punz˘atoare
=> [B → .δ,b] valid for live prefix γα, unui acela¸si prefix viabil ar
∀b ∊ FIRST(𝛽u)aceast˘a//mult ¸ime
First( 𝛽w) caracterizeaz˘a
= First( 𝛽u) un pas
anali
al
Mulţimea care va cont ¸ine toate elementele d
prefix viabil va forma o stare a automatu
S.Motogna - FL&CD
8[A ! Æ.BØ, a] 2 closure(C), 8B ! ± 2 P, [B ! .±, b] 2 closure(C)
pentru 8b 2 F IRST (Øa)

Algorithm Closure_LR1
Algoritmul 3.11 ClosureLR1
INPUT: I-element de analiz˘a; G’- gramatica ˆımbog˘at
¸it˘a;
F IRST (X), 8X 2 N [ ß;
OUTPUT: C 1 = closure(I);
C1 := {I};
repeat
for 8[A ! Æ.BØ, a] 2 C 1 do
for 8B ! ∞ 2 P do
for 8b 2 F IRST (Øa) do
if [B ! .∞, b]2
/ C1 then
C1 = C1 [ [B ! .∞, b]
end if
end for
end for
end for
until C1 nu se mai modific˘a

Definit
¸ia funct
¸iei goto se actualizeaz˘a ˆın:
S.Motogna - FL&CD
Construct LR(1) table
• Structure – SLR
• Rules:
1. if [A → α.β,u] ∈ si and goto(si,a) = sj then action(si,a)=shift s j
2. if [A → β.,u] ∈ si and A ≠ Sʹ then action(s i,u)=reduce l, where l –
number of production A → β
3. if [Sʹ → S.,$] ∈ si then action(si,$)=acc
4. if goto(si, X) = sj then goto(si, X) = sj , ∀X ∈N
5. otherwise = error

S.Motogna - FL&CD
Remarks
1. A grammar is LR(1) if the LR(1) table does not contain conflicts

2. Number of states – significantly increase

S.Motogna - FL&CD
4. Define configurations and moves
• INPUT:
• Grammar G’ = (NU{S’}, 𝚺, P U {S’->S},S’)
• LR(1) table
• Input sequence w =a1…an
• OUTPUT:
if (w ∈L(G)) then string of productions
else error & location of error

S.Motogna - FL&CD
LR(1) configurations
Initial configuration:
($s0,w$,𝜀)
(𝛼 , 𝛽 , 𝜋 )
where:
• 𝛼 = working stack
• 𝛽 = input stack
• 𝜋 = output (result) Final configuration:
($sacc, $, 𝜋)

S.Motogna - FL&CD
Moves
head(𝛽) = prediction
1. Shift
if action(sm,ai)= shift s j then
($s0x1 ...xmsm,ai ...an$, 𝜋) ⊢ ($s0x1 ...xmsmaisj,ai+1 ...an$, 𝜋)
2. Reduce
if action(sm,ai) = reduce t AND (t) A → xm−p+1 ...xm AND goto(sm−p,A) = sj
then
($s0 ...xmsm,ai ...an$, 𝜋) ⊢ ($s0 ...xm−psm−pAsj,ai ...an$,t 𝜋)
3. Accept
if action(sm,$) = accept then ($sm,$, 𝜋)=acc
4. Error - otherwise

S.Motogna - FL&CD
LALR Parser
• LALR = Look Ahead LR(1)

• why?

S.Motogna - FL&CD
LALR principle [A → αβ.,u] ∈ si apply reduce (k) then goto(si,A) =sm
[A → αβ.,v] ∈ sj apply reduce (k) then goto(sj,A) =sn

[A → α.β,u] ∈ si
=> [A → α.β,u|v] ∈ si,j
[A → α.β,v] ∈ sj

• Merge states with the same kernel, conserving all predictions, if no


conflict is created

S.Motogna - FL&CD
LALR Parsing
• Same as LR(1)
• Number of LALR states = number of SLR / LR(0) states

• How? - LR(1) states

S.Motogna - FL&CD
LR(k) Parsers
• LR(0):
• Items ignore prediction
• Reduce can be applied only in singular states (contain one item)
• Lot of conflicts
• SLR:
• Use same items as LR(0)
• When reduce consider prediction
• Eliminate several LR(0) conflicts (not all)
• LR(1):
• Performant algorithm for set of states
• Generate few conflicts
• Generate lot of states
• LALR:
• Merge LR(1) states ccorresponding to same kernel
• Most used algorithm (most performant)

S.Motogna - FL&CD
Quiz time

S.Motogna - FL&CD
Parsing - recap

Descendent Ascendent
Recursive Descendent recursive Ascendent recursive parser
parser
Linear LL(1) LR(0), SLR, LR(1), LALR

S.Motogna - FL&CD
Eliminarea conflictelor nu este ˆıntotdeauna u¸sor de realizat ¸si de aceea se
dore¸ste evitarea Cea
lor. mai put
¸in restrictiv˘a clas˘a este cea a gramaticilor
LR(1), dar analizorulsintactic are alte dezavantaje,asupra c˘arora vom
reveni.Figura 3.4 ilustreaz˘a incluziunea dintre tipurile de gramatici luate

Parsing - recap
ˆın considerare ˆın analiza sintactic˘a.
evident˘a ˆıntre gramatici
Se observ˘a c˘a nu exist˘a o ¸ie
corelat
LL(1) ¸sigramaticile LR(k),o gramatic˘a LL(1)
poate s˘a fie LR(1), LALR, SLR sau chiar LR(0), dar orice gramatic˘a LL(1)
este LR(1).

LR(1)

LL(1) LALR(1)

SLR

LR(0)

Figura 3.5:Relat
¸ia dintre diferite clase de gramatici ˆın
¸iefunct
de metoda
de analiz˘a sintactic˘a
S.Motogna - FL&CD
Structure of compiler
Source program
analysis
scanning

parsing
Sequence of
tokens

semantic analysis
Parse tree

generate intermediary
Adnotated syntax code synthesis
tree
optimize
Intermediary intermediary code
code
Optimized generate object Object
code program
intermediary
code
S. Motogna - LFTC
Course 10

S.Motogna - FL&CD
Important notice
Ø9.12.2021
7.30 - Course Formal Languages and Compiler Design
9.20 - Course Formal Languages and Compiler Design

Ø16.12.2021
7.30 – Course Parallel and Distributed Programming
9.20 – Course Parallel and Distributed Programming

S.Motogna - FL&CD
LEX & YACC
1. Have you heard about these tools?

2. Have you used any of them?

S.Motogna - FL&CD
Scanning & Parsing Tools
• Scanning => lex
• Parsing => yacc

S.Motogna - FL&CD
Lex – Unix utilitary (flex – Windows version)

S.Motogna - FL&CD
INPUT FILE FORMAT

• The file containing the specification is a text file, that can


have any name. Due to historic reasons we recommend the
extension .lxi.
• Consists of 3 sections separated by a line containing %%:

definitions
%%
rules
%%
user code
Example 1:
%%

username printf( "%s", getlogin() );

specifies a scanner that, when finding the string


“username”, will replace it with the user login name
Definition Section:
• C declarations
+
• declarations of simple name definitions (used to simplify the scanner
specification), of the form
name definition
• where:
• name is a word formed by one or more letters, digits, '_' or '-', with the
remark that the first character MUST be letter or '_' and must be written
on the FIRST POSITION OF THE LINE.
• definition is a regular expression and is starting with the first nonblank
character after name until the end of line.
• declarations of start conditions.
Rules Section
- to associate semantic actions with regular expressions. It may also
contain user defined C code, in the following way:

pattern action
where:

• pattern is a regular expression, whose first character MUST BE ON THE


FIRST POSITION OF THE LINE;

• action is a sequence of one or more C statements that MUST START ON


THE SAME LINE WITH THE PATTERN. If there are more than one
statements they will be nested between {}. In particular, the action can
be a void statement.
User Defined Code Section:

• Is optional (if is missing, then the separator %% following the rules section can also
miss). If it exists, then its containing user defined C code is copied without any
change at the end of the file lex.yy.c.
• Normally, in the user defined code section, one may have:
- function main() containing call(s) to yylex(), if we want the scanner to work
autonomously (for ex., to test it);
- other called functions from yylex() (for ex. yywrap() or functions called during
actions); in this case, the user code from definitions section must contain:
either prototypes, either #include directives of the headers containing the
prototypes
Launching the execution:

lex [option] [name_specification _file]

where name_specification _file is an input file (implicitly,


stdin)

$ lex spec.lxi
$ gcc lex.yy.c -o your_lex
$ your_lex<input.txt
options: http://dinosaur.compilertools.net/flex/manpage.html
Example

S.Motogna - FL&CD
yacc

S.Motogna - FL&CD
Parsing (syntax analysis) modeled with cfg:

cfg G = (N, 𝜮,P,S):


• N – nonterminal: syntactical constructions: declaration, statement, expression,
a.s.o.
• 𝜮 – terminals; elements of the language: identifiers, constants, reserved words,
operators, separators
• P – syntactical rules – expressed in BNF – simple transformation
• S – syntactical construct corresponding to program

THEN

Program syntactical correct <=> w ∊ L(G)


S. Motogna - FL&CD
yacc – Unix tool (Bison – Window version)
• Yet Another Compiler Compiler

• LALR
• C code

S.Motogna - FL&CD
A yacc grammar file has four main sections

%{
C declarations
%}

yacc declarations
contains declarations that define terminal and nonterminal
symbols, specify precedence, and so on.
%%
Grammar rules
%%

Additional C code
The grammar rules section
• contains one or more yacc grammar rules of the following general form:
result : components... {C statements }

;
exp: exp '+' exp
;

result : rule1-components ...


| rule2-components ...
...
;
result : /*empty */
| rule2-components ...
;
Example: expression interpreter
• input

• Yacc has a stack of values - referenced ‘$i’ in semantic


actions
• Input file (desk0)
Conflict resolution in yacc
• Conflict shift-reduce – prefer shift

• Conflict reduce-reduce – chose first production


• Run yacc
• Run desk0
Operator priority in yacc
• From low to great
• Use
>lex spec.lxi
>yacc –d spec.y
>gcc lex.yy.c y.tab.c -o result –lfl
>result<InputProgram

• More on
http://catalog.compilertools.net/lexparse.html

Example
Course 11
Push-Down Automata
(PDA)

S.Motogna - FL&CD
Intuitive Model

S.Motogna - LFTC
Definition
• A push-down automaton (APD) is a 7-tuple M = (Q,𝞢,𝞒,𝞭,q0,Z0,F)
where:
• Q – finite set of states
• 𝞢 - alphabet (finite set of input symbols)
• 𝞒 – stack alphabet (finite set of stack symbols)
• 𝞭 : Q x (𝞢 U {𝜺}) x 𝞒 →𝒫(Qx 𝞒*) –transition function
• q0 ∈Q – initial state
• Z0 ∈ 𝞒 – initial stack symbol
• F ⊆Q – set of final states

S.Motogna - LFTC
Push-down automaton
Transition is determined by:
• Current state
• Current input symbol
• Head of stack

Reading head -> input band:


• Read symbol
• No action
Stack:
• Zero symbols => pop
• One symbol => push
• Several symbols => repeated push

S.Motogna - LFTC
Configurations and transition / moves
• Configuration:
(q, x, 𝞪) ∈Q x 𝜮* x 𝜞*

where:
• PDA is in state q
• Input band contains x
• Head of stack is 𝞪
• Initial configuration (q0, w, Z0)

S.Motogna - LFTC
Configurations and moves(cont.)

• Moves between configurations:


p,q ∈Q, a∈𝜮, Z ∈𝜞, w ∈𝜮*,𝜶,𝜸 ∈𝜞*

(q,aw,Z𝜶) ⊢ (p,w,𝜸Z𝜶) iff 𝜹(q,a,Z) ∋ (p,𝜸Z)


(q,aw,Z𝜶) ⊢ (p,w, 𝜶) iff 𝜹(q,a,Z) ∋ (p, 𝜺)
(q,aw,Z𝜶) ⊢ (p,aw,𝜸Z𝜶) iff 𝜹(q,𝜺,Z) ∋ (p,𝜸Z)
(𝜺-move)
k + *
•⊢,⊢,⊢
S.Motogna - LFTC
Language accepted by PDA
• Empty stack principle:

L𝜺(M) = {w|w∈𝜮*, (q0,w,Z0) ⊢* (q,𝜺,𝜺), q∈ Q}

• Final state principle:

Lf(M) = {w|w∈𝜮*, (q0,w,Z0) ⊢* (qf,𝜺,𝜸), qf∈ F}

S.Motogna - LFTC
Representations
• Enumerate
• Table
• Graphic

S.Motogna - LFTC
Construct PDA
• L = {0n1n| n ≥ 1}
• States, stack, moves?
1. States:
• Initial state:q0 – beginning and process symbols ‘0’
• When first symbol ‘1’ is found – move to new state => q1
• Final: final state q2
2. Stack:
• Z0 – initial symbol
• X – to count symbols:
• When reading a symbol ’0’ – push X in stack
• When reading a symbol ‘1’ – pop X from stack

S.Motogna - LFTC
Exemple 1 (enumerate)
M = ({q0,q1,q2}, {0,1}, {Z0,X},𝜹,q0,Z0,{q2})

𝜹(q0,0,Z0) = (q0,XZ0)
𝜹(q0,0,X) = (q0,XX)
𝜹(q0,1,X) = (q1,𝜺)
𝜹(q1,1,X) = (q1,𝜺) Empty stack
𝜹(q1,𝜺,Z0) = (q2,Z0) 𝜹(q1,𝜺,Z0) = (q1, 𝜺)
⊢ (q1, 𝜺, 𝜺)
(q0,0011,Z0) ⊢ (q0,011,XZ0) ⊢ (q0,11,XXZ0) ⊢ (q1,1,XZ0) ⊢(q1, 𝜺, Z0) ⊢ (q2, 𝜺,
Z0)

S.Motogna - LFTC
Final state
Exemple 1 (table)
0 1 𝜺
Z0 q0,XZ0
q0 X q0,XX q1,𝜺
Z0 q2,Z0 (q1, 𝜺)
q1 X q1,𝜺
Z0
q2 X

(q0,0011,Z0) |- (q0,011,XZ0) |- (q0,11,XXZ0) |- (q1,1,XZ0)


|- (q1, 𝜺,Z0) |- (q2, 𝜺,Z0) q2 final seq. is acc based on final state

(q0,0011,Z0) |- (q0,011,XZ0) |- (q0,11,XXZ0) |- (q1,1,XZ0)


|- (q1, 𝜺,Z0) |-(q1, 𝜺, 𝜺) seq is acc based on empty stack
S.Motogna - LFTC
Exemple 1 (graphic)

push

pop

0, X➝XX
0, Z0➝XZ0 1, X➝𝜺

1, X➝𝜺 𝜺, Z0➝Z0
q0 q1 q2

S.Motogna - LFTC
Properties
Theorem 1: For any PDA M, there exists a PDA M’ such that
L 𝜺(M) = Lf(M’)

Theorem 2: For any PDA M, there exists a context free grammar such
that
L 𝜺(M) = L(G)

Theorem 3: For any context free grammar there exists a PDA M such
that
L(G) = L 𝜺(M)

S.Motogna - LFTC
HW
• Parser:
• Descendent recursive
• LL(1)
• LR(0), SLR, LR(1)

Corresponding PDA

S.Motogna - LFTC
Structure of compiler
Source program
analysis
scanning

parsing
Sequence of
tokens

semantic analysis
Syntax tree

generate intermediary
Annotated code synthesis
abstract syntax
tree optimize
Intermediary intermediary code
code
Optimized generate object Object
code program
intermediary
code
S. Motogna - LFTC
Semantic analysis
• Parsing – result: syntax tree (ST)

• Simplification: abstract syntax tree (AST)

• Annotated abstract syntax tree (AAST)


• Attach semantic info in tree nodes

Example

S.Motogna - FL&CD
Semantic analysis
• Attach meanings to syntactical constructions of a program
• What:
• Identifiers -> values / how to be evaluated
• Statements -> how to be executed
• Declaration -> determine space to be allocated and location to be stored
• Examples:
• Type checkings
• Verify properties
• How:
• Attribute grammars
• Manual methods

S.Motogna - FL&CD
Attribute grammar
• Syntactical constructions (nonterminals) – attributes

∀ 𝑋 ∈ 𝑁 ∪ Σ: 𝐴(𝑋)

• Productions – rules to compute/ evaluate attributes

∀ 𝑝 ∈ 𝑃: 𝑅(𝑝)

S.Motogna - FL&CD
Definition
AG = (G,A,R) is called attribute grammar where:

• G = (N,𝜮,P,S) is a context free grammar


• A = {A(X) | X ∈N U 𝜮} – is a finite set of attributes
• R = {R(p) | p ∈P} – is a finite set of rules to compute/evaluate attributes

S.Motogna - FL&CD
Example 1
• G = ({N,B},{0,1}, P, N}
P: N -> NB
N1.v = 2* N2.v + B.v
N -> B N.v = B.v
B -> 0 B.v = 0
B -> 1 B.v = 1

Attribute – value of number = v


- Synthetized attribute: A(lhp) depends on rhp
- Inherited attribute: A(rhp) depends on lhp

S.Motogna - FL&CD
Evaluate attributes
• Traverse the tree: can be an infinite cycle

• Special classes of AG:


• L-attribute grammars: for any node the depending attributes are on the “left”;
• can be evaluated in one left-to-right traversal of syntax tree
• Incorporated in top-down parser (LL(1))
• S-attribute grammars: synthetized attributes
• Incorporated in bottom-up parser (LR)

S.Motogna - FL&CD
Steps
• What? - decide what you want to compute (type, value, etc.)
• Decide attributes:
• How many
• Which attribute is defined for which symbol
• Attach evaluation rules:
• For each production – which rule/rules

S.Motogna - FL&CD
Example 2 (L-attribute grammar)
Decl -> DeclTip ListId ListId.type = DeclTip.type
ListId -> Id Id.type = ListId.type
ListId2.type = ListId1.type
ListId -> ListId, Id
Id.type = ListId1.type

Attribute – type int i,j

S.Motogna - FL&CD
Example 3 (S-attribute grammar)
ListDecl -> ListDecl; Decl ListDecl1.dim = ListDecl2.dim + Decl.dim
ListDecl -> Decl ListDecl.dim = Decl.dim
Decl -> Type ListId Decl.dim = Type.dim * ListId.no
Type -> int Type.dim = 4
Type.dim =8
Type -> long
ListId.no = 1
ListId -> Id ListId1.no = ListId2.no + 1
ListId -> ListId, Id

Attributes – dim + no – for which symbols int i,j; long k


S.Motogna - FL&CD
Proposed problems (HW):
1) Define an attribute grammar for arithmetic expressions
2) Define an attribute grammar for logical expressions
3) Define an attribute grammar for if statement

S.Motogna - FL&CD
Manual methods
• Symbolic execution
• Using control flow graph, simulate on stack how the program will behave
• [Grune – Modern Compiler Design]

• Data flow equations


• Data flow – associate equations based on data consumed in each node
(statement) of the control flow graph: In, Out, Generated, Killed
• [Grune – Modern Compiler Design], [Kildall], [course]

S.Motogna - FL&CD
Course 12

S.Motogna - FL&CD
Structure of compiler
Source program
analysis
scanning

parsing
Sequence of
tokens

semantic analysis
Parse tree

generate intermediary
Annotated syntax code synthesis
tree
optimize
Intermediary intermediary code
code
Optimized generate object Object
code program
intermediary
code
S. Motogna - LFTC
Generate intermediary code

Limbaj1 Maºina1

Limbaj2 Maºina2
Cod
. intermediar .
. .
. .

Limbajm Maºinan

Figura 5.1:Crearea de compilatoare pentru m limbaje ¸si n ma¸sini folosind


cod intermediar
S. Motogna - LFTC
Forms of intermediary code
• Java bytecode – source language: Java
– machine language (dif. platforms) JVM
• MSIL (Microsoft Intermediate Language)
– source language: C#, VB, etc.
– machine language (dif. platforms) Windows
• GNU RTL (Register Transfer Language)
– source language: C, C++, Pascal, Fortran etc.
– machine language (dif. platforms)

S. Motogna - LFTC
Representations of intermediary code
• Annotated tree: intermediary code is generated in semantic analysis
• Polish postfix form:
• No parenthesis
• Operators appear in the order of execution
• Ex.: MSIL

Exp = a + b * c ppf = abc*+


Exp = a * b + c ppf = ab*c+
Exp = a * (b + c) ppf = abc+*
• 3 address code

S. Motogna - LFTC
3 address code
= sequence of simple format statements, close to object code, with the
following general form:

< result >=< arg1 >< op >< arg2 >

Represented as:
- Quadruples
- Triples
- Indirected Triples

S. Motogna - LFTC
• Quadruples:
< op > < arg1 > < arg2 > < result >

• Triples:
< op > < arg1 > < arg2 >

(considered that the triple is storing the result)

S. Motogna - LFTC
Special cases:
1. Expressions with unary operator: < result >=< op >< arg2 >
2. Assignment of the form a := b => the 3 addresss code is a = b (no operatorand no 2nd
argument)
3. Unconditional jump: statement is goto L, where L is the label of a 3 address code
4. Conditional jump: if c goto L: if c is evaluated to true then unconditional jump to
statement labeled with L, else (if c is evaluated to false), execute the next statement
5. Function call p(x1, x2, ..., xn) – sequence of statements: param x1, param x2 ,
param xn, call p, n
6. Indexed variables: < arg1 >,< arg2 >,< result > can be array elements of the form a[i]
7. Pointer, references: &x,∗x

S. Motogna - LFTC
Example: b∗b−4∗a∗c
op arg1 arg2 rez
* b b t1
* 4 a t2
* t2 c t3
- t1 t3 t4

nr op arg1 arg2
(1) * b b
(2) * 4 a
(3) * (2) c
(4) - (1) (3)

S. Motogna - LFTC
Example 2
If (a<2) then a=b else a=b*b

S.Motogna - FL&CD
Optimize intermediary code
• Local optimizations:
• Perform computation at compile time – constant values
• Eliminate redundant computations
• Eliminate inaccessible code – if…then...else...

• Loop optimizations:
• Factorization of loop invariants
• Reduce the power of operations

S. Motogna - LFTC
comune C § B ¸si D + C § B.
D:=D+C*B
A:=D+C*B
Eliminate redundant
C:=D+C*Bcomputations
Secvent
¸a corespunz˘atoare de cod cu trei adrese, reprezen
este:
Example:
D:=D+C*B (1) * C B
A:=D+C*B (2) + D (1)
C:=D+C*B (3) := (2) D
(4) * C B
(5) + D (4)
(6) := (5) A
(7) * C B
(8) + D (7)
(9) := (8) C

Aceste subexpresii comune se reg˘asesc ˆın triplete identiceca ¸s


(4), (7), dar ¸siunele mai
S. Motogna greu de observat ca (2),
- LFTC (5), (8). Ideea es
Determine redundant operations
• Operation (j) is redudant to operation (i) with i<j if the 2 operations
are identical and if the operands in (j) did not change in any operation
between (i+1) and (j-1)
• Algorithm [Aho]

S.Motogna - FL&CD
const˘a
const˘a ˆınˆın a scoate
a scoate aceast˘a
aceast˘a ¸iune
¸iuneinstruct
instruct ˆınaintea
ˆınaintea ciclului,
ciclului, ea
ea exe
astfel
astfel o singur˘a
o singur˘a dat˘a.
dat˘a.
Factorization of loop invariants
Exemplul
Exemplul 6.3.6.3.O secvent
¸˘a¸˘a
O secvent de de program
program ce ¸ine
ceis ¸ine
What acont
cont un invarid
un invariant
loop invariant?
ˆınainte
ˆınainte ¸si¸si dup˘a
dup˘a optimizare:
optimizare:
for(i=0,
for(i=0, i<=n,i++)
i<=n,i++) x=y+z;
x=y+z;
{ x=y+z;
{ x=y+z; for(i=0,
for(i=0, i<=n,i++)
i<=n,i++)
a[i]=i*x}
a[i]=i*x} { a[i]=i*x}
{ a[i]=i*x}

Reducerea
Reducerea puterii
puterii operat
operat¸iilor
¸iilor
Aceast˘a optimizare
Aceast˘a areare
optimizare ca ca
scop ˆınlocuirea
scop ¸iilor¸iilor
ˆınlocuireaoperat
costisitoar
operat
costis
exemplu ˆınmult
exemplu ¸irea)
ˆınmult cu cu
¸irea) operat
¸ii mai
operat ieftine
¸ii mai (adunarea)
ieftine (adunarea)
S. Motogna - LFTC ˆın defi
ˆın
V1:
P = a[0] V2:

Challenge For i=1 to n


P = P + a[i]*v^i
P = a[0]
Q=v
For i=1 to n
P = P + a[i]*Q
Consider n, and a[i] i=0,n the coefficients of a polynomial P. Q = Q*v

Given v, write an algorithm that computes the value of P(v)

V3
P=a[n]
For i=1 to n
3 solutions P = P*v + a[n-i]

P(x) = a[n]*x^n+ … + a[1]*x + a[0] = (a[n]*x^(n-1)+ … + a[1])*x + a[0]

S.Motogna - FL&CD
valoareacalculat˘a
valoarea calculat˘ala la
¸ia¸ia
iterat
iterati ° i1.
° 1.

Exemplul
Exemplul 6.4.
Considerˆand
6.4.Considerˆand
Reduce the power of operations ciclul
ciclul urm˘ator,
urm˘ator, ˆın ˆın
carecare
v v
esteest
un
deciclu,
de ciclu,elel poate
poate fi optimizat
fi optimizat astfel:
astfel:

for(i=k,i<=n,i++)
for(i=k, i<=n,i++) t1=k*v;
t1=k*v;
{ {t=i*v;
t=i*v; for(i=k,
for(i=k, i<=n,i++)
i<=n,i++)
. . ..}.} { t=t1;
{ t=t1;
t1=t1+v;...}
t1=t1+v;...}

S. Motogna - LFTC
Course 13

S.Motogna - FL&CD
Structure of compiler
Source program
analysis
scanning

parsing
Sequence of
tokens

semantic analysis
Parse tree

generate intermediary
Adnotated syntax code synthesis
tree
optimize
Intermediary intermediary code
code
Optimized generate object Object
code program
intermediary
code
S. Motogna - LFTC
Generate object code
= translate intermediary code statements into statements of object
code (machine language)

- Depend on “machine”: architecture and OS

S. Motogna - LFTC
Computer with accumulator
• A stack machine consists of:
• a stack for storing and manipulating values (store subexpressions and
results)
• Accumulator – to execute operation
• 2 types of statements:
• move and copy values in and from head of stack to accumulator
• Operations on stack head, functioning as follows: operands are popped from
stack, execute operation and then put the result in stack

S. Motogna - LFTC
Example: 4 * (5+1)
Code acc stack
acc ← 4 4 <>
push acc 4 <4>
acc ← 5 5 <4>
push acc 5 <5,4>
acc ← 1 1 <5,4>
acc ← acc + head 6 <5,4>
pop 6 <4>
acc ← acc * head 24 <4>
pop 24 <>
S.Motogna - FL&CD
Computer with registers
• Registers +
• Memory

• Instructions:
• LOAD v,R – load value v in register R
• STORE R,v – put value v from register R in memory
• ADD R1,R2 – add to the value from register R1, value from register R2 and
store the result in R1 (initial value is lost!)

S. Motogna - LFTC
2 aspects:
• Register allocation – way in which variable are stored and
manipulated;

• Instruction selection – way and order in which the intermediary code


statements are mapped to machine instructions

S. Motogna - LFTC
Remarks:
1. A register can be available or occupied =>
VAR(R) = set of variables whose values are stored in register R

2. For every variable, the place (register, stack or memory) in which the
current value of the value exists=>
MEM(x)= set of locations in which the value of variable x exists (will
be stored in Symbol Table)

S. Motogna - LFTC
Example: F := A ∗ B − (C + B) ∗ (A * B)
Intermediary code Object code VAR MEM
VAR(R0) = {}
VAR(R1) = {}
(1) T1 = A * B
(2) T2 = C + B
(3) T3 = T2 * T1
(4) F:= T1 – T3

S.Motogna - FL&CD
Example: F := A ∗ B − (C + B) ∗ (A * B)
Intermediary code Object code VAR MEM
VAR(R0) = {}
VAR(R1) = {}
(1) T1 = A * B LOAD A, R0 VAR(R0) = {A} MEM(T1) = {R0}
MUL R0, B VAR(R0) = {T1}
(2) T2 = C + B
(3) T3 = T2 * T1
(4) F:= T1 – T3

S.Motogna - FL&CD
Example: F := A ∗ B − (C + B) ∗ (A * B)
Intermediary code Object code VAR MEM
VAR(R0) = {}
VAR(R1) = {}
(1) T1 = A * B LOAD A, R0 VAR(R0) = {T1} MEM(T1) = {R0}
MUL R0, B
(2) T2 = C + B LOAD C, R1 VAR(R1) = {T2} MEM(T2) = {R1}
ADD R1, B
(3) T3 = T2 * T1
(4) F:= T1 – T3

S.Motogna - FL&CD
Example: F := A ∗ B − (C + B) ∗ (A * B)
Intermediary code Object code VAR MEM
VAR(R0) = {}
VAR(R1) = {}
(1) T1 = A * B LOAD A, R0 VAR(R0) = {T1} MEM(T1) = {R0}
MUL R0, B
(2) T2 = C + B LOAD C, R1 VAR(R1) = {T2} MEM(T2) = {R1}
ADD R1, B
(3) T3 = T2 * T1 MUL R1,R0 VAR(R1) = {T3} MEM(T2) = {}
MEM(T3) = {R1}
(4) F:= T1 – T3

S.Motogna - FL&CD
Example: F := A ∗ B − (C + B) ∗ (A * B)
Intermediary code Object code VAR MEM
VAR(R0) = {}
VAR(R1) = {}
(1) T1 = A * B LOAD A, R0 VAR(R0) = {T1} MEM(T1) = {R0}
MUL R0, B
(2) T2 = C + B LOAD C, R1 VAR(R1) = {T2} MEM(T2) = {R1}
ADD R1, B
(3) T3 = T2 * T1 MUL R1,R0 VAR(R1) = {T3} MEM(T2) = {}
MEM(T3) = {R1}
(4) F:= T1 – T3 SUB R0,R1 VAR(R0) = {F} MEM(T1) = {}
STORE RO, F VAR(R1) = {} MEM(F) = {R0, F}

S.Motogna - FL&CD
More about Register Allocation
• Registers – limited resource
• Registers – perform operations / computations
• Variables much more than registers

IDEA: assigning a large number of variables to a reduced number of


registers

S.Motogna - FL&CD
Live variables
• Determine the number of variables that are live (used)
op op1 op2 rez
Example: 1 + b c a
2 + a e d
a=b+c
3 + a c e
d=a+e
1 2 3
e=a+c a x x x
b x
c x x x
d x
e x x
S.Motogna - FL&CD
Graph coloring allocation (Chaitin a.o. 1982)
• Graph:
• nodes = live variables that should be allocated to registers
• edges = live ranges simultaneously live

Register allocation = graph coloring: colors (registers) are assigned to


the nodes such that two nodes connected by an edge do not receive
the same color
Disadvantage:
- NP complete problem

S.Motogna - FL&CD
Linear scan allocation (Poletto a.o., 1999)
• determine all live range, represented as an interval
• intervals are traversed chronologically
• greedy algorithm

Advantage: speed – code is generated faster (speed in code


generation)
Disadvantage: generated code is slower (NO speed in code execution)

S.Motogna - FL&CD
Instruction selection
Example: F := A ∗ B − (C + B) ∗ (A * B)
Intermediary code Object code VAR MEM
VAR(R0) = {}
VAR(R1) = {}
(1) T1 = A * B LOAD A, R0 VAR(R0) = {T1} MEM(T1) = {R0}
MUL R0, B
(2) T2 = C + B LOAD C, R1 VAR(R1) = {T2} MEM(T2) = {R1}
ADD R1, B STORE R0,T1
(3) T3 = T2 * T1 MUL R1,R0 MUL R0,R1 VAR(R1) = {T3} MEM(T2) = {}
MEM(T3) = {R1}
(4) F:= T1 – T3 LOAD T1,R1

Decide which register to use for an instruction


S.Motogna - FL&CD
Turing Machines

S.Motogna - LFTC
Alan Turing

• Enigma (criptography)
• Turing test
• Turing machine (1937)

S.Motogna - LFTC
Turing Machine
• Mathematical model for computation
• Abstract machine
• Can simulate any algorithm

S.Motogna - LFTC
Turing Machine
• Input band (infinite)
• Reading head
• Control Unit: states
• Transitions / moves

S.Motogna - LFTC
Turing machine – definition
7-tuple M = (Q, 𝞒,b,𝞢,𝞭,q0, F) where:
• Q – finite set of states
• 𝞒 - alphabet (finite set of band symbols)
• b ∈ 𝞒 - blank (symbol)
• 𝞢 ⊆ 𝞒 \{b} – input alphabet L = left
• 𝞭 : (Q\F) x 𝞒 →Q x 𝞒 x {L,R} –transition function R = right
• q0 ∈Q – initial state
• F ⊆Q – set of final states

S.Motogna - LFTC
Example – palindrome over {0,1}
• 001100, 00100, 101101 a.s.o. accepted
• 00110, 1011 a.s.o. not accepted

001100

S.Motogna - LFTC
Example – palindrome over {0,1}
Delete 0 in left side;
0 1 b search 0 in right side
q0 (p1,b,R) (p2,b,R) (qf,b,R)
Delete 1 in left side;
p1 (p1,0,R) (p1,1,R) (q1,b,L) search 1 in right side
On right is 0 or 1?
p2 (p2,0,R) (p2,1,R) (q2,b,L)
Shift right
q1 (qr,b,L) (qf,b,R)
q2 (qr,b,L) (qf,b,R)
q1 and q2 – process 0 and
qr (qr,0,L) (qr,1,L) (q0,b,R) 1 on the right

qf
qf –final state
S.Motogna - LFTC
0110
0 1 1 0
1 1
1 1 0
1 1
1 1 0
1 1
1 1 0

1 1 0 1 1

1 1 0 1

1 1 ...
S.Motogna - LFTC
0 1 b
q0 (p1,b,R) (p2,b,R) (qf,b,R)

p1 (p1,0,R) (p1,1,R) (q1,b,L)

(q0,0110) |- (p1, 110) |- (p1, 110) p2 (p2,0,R) (p2,1,R) (q2,b,L)

q1 (qr,b,L) (qf,b,R)
q2 (qr,b,L) (qf,b,R)
|- (p1, 110) |- (p1, 110b) |- (q1, 110) qr (qr,0,L) (qr,1,L) (q0,b,R)

qf
|- (qr, 11) |- (qr, 11) |- (qr, b11)

|- (q0, 11) |- . . .

S.Motogna - LFTC
https://turingmachinesimulator.com

S.Motogna - LFTC
index
course details 1-9 LR(k) parsing 141-150
introduction 10-14 LR(0) Parser 151-169
scanning 15-28 SLR Parser 170-178
context for grammar 30-31 LR(1) Parser 179-189
grammar 32-36 LALR Parser 189-192
finite automata 42-50 parsing recap 193-196
regular grammars 51-53 lex & yacc 200-220
regular sets & expressions 54-58 PDA 221-233
transformations (RG ⇔ FA ⇔ RE ⇔ RG) 59-70 context for attribute grammar 235-237
pumping lemma 72-76 attribute grammar 238-246
cfg 77-83 intermediary code 249-251
eq transformations of cfg 84-93 3 address code 252-256
parsing 94-100 Optimize intermediary code 257-262
descendent recursive parser 101-112 Generate object code 264-280
LL(1) Parser 113-138 Turing machines 281-290.

You might also like