ACD_UNIT_4
ACD_UNIT_4
A compiler is a translator that converts the high-level language into the machine
language.
o High-level language is written by a developer and machine language can be understood
by the processor.
o Compiler is used to show errors to the programmer.
The main purpose of compiler is to change the code written in one language without
changing the meaning of the program.
When you execute a program which is written in HLL programming language then it
executes into two parts.
In the first part, the source program compiled and translated into the object program (low
level language).
In the second part, object program translated into the target program through the
assembler.
Structure of a Compiler:
The compilation process contains the sequence of various phases. Each phase takes source
program in one representation ánd produces output in another representation. Each phase takes
input from itsprevious stage.
There are the various plhases of compiler:
Intermediate
I nt h ei n t e r r
c o d e .I n t e r
Theinte
Co
syntax analyzer
code optimizer
code generator
Lexical Analysis:
Lexical analyzer phase is the first phase of compilation process. It takes
reads the source program one character at a time and converts it into source code as input. It
meaningful lexemes.
Lexical analyzer represents these lexemes in the form of tokens.
Syntax Analysis
Syntax analysis is the second phase of compilation process. It takes tokens as input and generates
a parse tree as output. In syntax analysis phase, the parser checks that the expression made by the
tokens is syntactically correct or not.
Semantic Analysis
Semantic analysis is the third phase of compilation process. It checks whether the parse tree
follows the rules of language. Semantic analyzer keeps track of identifiers, their types and
expressions. The output of semantic analysis phase is the annotated tree syntax.
3
Code Optimization
the intermediate code sothat the
Code optimization is an optional phase. It is used to improve removes the unnecessary lines of
output of the program could run faster and take less space. It
the program execution.
the code and arranges the sequence of statements in order to speed up
Code Generation
Code generation is the final stage of the compilation process.
It takes the optimized intermediate
language. Code generator translates the
code as input and maps it to the target machine
intermediate code into the machine code of the specified computer.
Example:
Sum*= Old sumn+ Rate k S0
Lexlcat Analyzer
Syntax anayzer
id 1
Id2
id4
Semantic analyzer
id1
inttorea!
inttoreal(50O)
temp1:temp1
temp2:
temp
= 0d2*temp2
id1: temp3
Code optimtzation
temp1: = a 3 * S0.0
jd1: + temp1
C o d e generation
MOVF id3,R2
MULF #50.0, R2
MO
oOE R2.R1
MOVF RI,Idi
Nar
Symbol Table:
Symbol table is an important data structure used in a compiler. Symbol table is Used to
IMformation about the occurrence of various entities such as objects, classes, variablestore the
name,
Interface, function name etc, it is used by both the analysis and synthesis phases.
The symbol table used for
following purposes:
o 1t is used to store the name of all entities
ina structured form at one place.
o It is'used to verify if a
variable has been declared.
It is used todetermine the scope of a name.
It is used to implement type checking by verifying
Source code are semantically correct.
assignments and expressions in the
A symbol table can either be linear or a
entry for each name. hash table. Using the following format, it
maintains the
<symbol name, type, attribute>
Implementation
The symbol table can be implemented in
Small amount of data.
the unordered list if the compiler is used to
handle the
A symbol table can be implemented in one
of the following techniques:
Linear (sorted or unsorted) list
Hash table
o Binary search tree
Operations
Insert )
Insert () operation is more frequently used in the analysis phase when the tokens are
identified and names are stored in the table.
The insert)operation is used to insert the information in the symbol table like the unique
name occurring in the source code.
In the source code, the attribute for a symbol is the information associated with that
symbol. The information contains the state, value, type and scope about the symbol.
The insert () function takes the symbol and its value in the form of argument.
For exanple:
int x;
lookup)
In the symboltable, lookup() operation is used to search a name. It is used to determine:
Terminologies
There are three terminologies
Token
Pattern
Lexeme
Token: It is a sequence of characters that
represents a unit of information in the source
code.
Pattern: The description used by the token is
known as a pattern.
Lexeme: A sequence of characters in the
known as lexeme. It is also called the source code, as per the matching pattern of a token, is
instance of a token.
The Architecture of Lexical
Analyzer
To read the input character in the
source
lexical analyzer. The lexical analyzer goescode and produce a token is the most important task of a
each token one by one. The scanner is through with the entire source code and
responsible to produce tokens when it is
identifies
parser. The lexical analyzer avoids the requested by the
any error occurs, the analyzer correlateswhitespace and comments while creating these tokens. li
these errors with the source file and line
number.
DBYJU'S
Tho Loaning App
Tokens
Source-code
Request for
tokens
Symbol
table
two
poin
consists
SufferPairs o
Lexeme
f
two
Begin
d
a r e
M
2 ecf
Buffer Pairs
t consists oftwo buffers, each of which has an N-character size and is alternately reloaded.
There are two pointers: lexeme Begin and forward.
Lexeme Begin denotes the start of the current lexeme, which has yet to be discovered.
Forward scans until it finds a match fora pattern.
When a lexeme is discovered, lexeme begin is set to the character immediately after the
newly discovered lexeme, and forward is set to the character at the right end of the lexeme.
The collection of characters between two points is the current lexeme.
Sentinels
Sentinels are used to performing a check each time the forward pointer is shifted to guarantee
that one-half of the buffers have not gone off. If it's finished, the other half will need to be
reloaded.
buffer
As a result, each advance of the forward pointer necessitates two checks at the ends of the
halves.
Test 1: Check for the buffer's end.
Test 2:To figure out which character is being read.
By expanding each buffer in half to store a sentinel character at the end, sentinel reduces the two
checks to one.
The sentinel is a unique character that isn't included in the source code. (The charactèr of serves
as a sentinel.)
Thissection
tokens, which are
The above statement has multiple
Writte
inn
Keywords: int
Identifier: x,45
Operators: =
Punctuators: ;
hexadecimal conversion */
/* Lex program for decimal to
%{
/* Definition section */
#include<stdio.h>
pcount-0, i;
int num, I, digit-0, count,
char a[20];
%}
DIGIT [0-9]
/* Rule Section */
%%
{DIGIT}+ { num=atoi(yytext);
while(num!-0)
Fnum%16;
digit-0'+r;
if(digit>'9)
digitt-7;
a[count+-+]-digit;
num-num/16:
1
Featureo
s fsyv
for(i=count-1;-pcount;-i)
printf("%c", alil): Syntax
T
the
cod
pcount-count; inclu
Jn ECHO;
%%
Syntax Analyzer:
Introduction:
Syntax Analysis or Parsing is the second phase, i.e. after lexical analysis. It checks the
syntactical structure of the given input, i.e. whether the given input is in the correct syntax (of the
language in which the input has been written) or not. It does so by building a data structure,
called a Parse tree or Syntax tree. The parse tree is constructed by using the
pre-defined
Grammar of the language and the input string. If the given input string can be produced with
help of the syntax tree (in the derivation process), the input string is found to the
be in the correct
syntax. if not, the error is reported by the syntax analyzer.
Syntax analysis, also known as parsing, is a process in compiler design where the
checks if the source code follows the grammatical rules of the compiler
progranmming
typically the second stage of the compilation process, following lexical language. This is
analysis.
The main goal of syntax analysis is to create a parse tree or
abstract syntaxX tree (AST) of the
source code, which is a hierarchical representation of the source code
that reflects the
grammatical structure of the program.
There are several types of parsing algorithms used in syntax analysis,
including:
LL parsing: This is a top-down parsing algorithm that starts with the root of the parse tree
and constructs the tree by successively expanding non-terminals. LL parsing is known for
its simplicity and ease of implementation.
LR parsing: This is a bottom-up parsing algorithm that starts with the leaves of the parse
tree and constructs the tree by successively reducing terminals. LR parsing is more
powerful than LL parsing and can handle a larger class of grammars.
LR(1) parsing: This is a variant of LR parsing that uses lookahead to disambiguate the
grammar.
LALR parsing: This is a variant of LR parsing that uses areduced set of lookahead
symbols to reduce the number of states in the LR parser.
Once the parse tree is constructed, the compiler can perform semantic analysis to check if
the source code makes sense and follows the semantics of the programming language.
The parse tree or AST can also be used in the code generation phase of the compiler design
to generate intermediate code or machine code.
13
Syntax Trees: Syntax analysis creates a syntax tree, which is a hierarchical representation of
the code's structure. The tree shows the relationship between the various parts of the code,
including statements, expressions, and operators.
Context-Free Grammar: Syntax analysis uses context-free grammar to define the syntax of
the progranmming language. Context-free grammar is a formal language used todescribe the
structure of programming languages.
Top-Down and Bottom-Up Parsing: Syntax analysis can be performed using two main
approaches: top-down parsing and bottom-up parsing. Top-down parsing starts from the
highest level of the syntax tree and works its way down, while bottom-up parsing starts from
the lowest level and works its way up.
Error Detection: Syntax analysis is responsible for detecting syntax errors in the code. If the
code does not conform to the rules of the programming language, the parser will report an error
and halt the compilation process.
Optimization: Syntax analysis can perform basic optimizations on the code, such as removing
redundant code and simplifying expressions.
The pushdown automata (PDA) is used to design the syntax analysis phase.
The Grammar for a Language consists of Production rules.
Example: Suppose Production rules for the Grammar of a language are:
S-> CAd
A -> bcla
And the input string is "cad".
Now the parser attempts to construct a syntax tree from this grammar for the given' input string.
It uses the given production rules and applies those as needed to generate the string. To
Limits
inth
generate string "cad" it uses the rules as shown in the given diagram:
A C A d A
b C
i)
i) ii) backtrack iv)
needed
Advantages :
Advantages of using syntax analysis in compiler design include:
Structural validation: Syntax analysis allows the
follows the grammatical rules of the programmingcompiler to check if the source code
language, which helps to detect and
report errors in the source code.
Improved code generation: Syntax analysis can generate a parse tree
(AST) of the source code, which can be used in the code or abstract syntax tree
design to generate more efficient and optimized code. generation phase of the compiler
Easier semantic analysis: Once the parse tree or AST is
perform semantic analysis more easily, as it can rely on constructed,
the
the compiler can
structural
by the parse tree or AST. information provided
Disadvantages:
Disadvantages of using syntax analysis in compiler design include:
Complexity: Parsing is a complex process, and the
the performance of the resulting code. Implementingquality of the parser can greatly impact
a parser for a complex programming
language can be a challenging task, especially for languages with ambiguous
grammars.
Reduced performance: Syntax analysis can add overhead to the compilation process,
can reduce the performance of the compiler. which
from errors
Limited error recovery: Syntax analysis algorithms may not be able to recover
trees and make it
in the source code, which can lead to incomplete or incorrect parse
difcult for the compiler to continue the compilation process.
Inability to handle all languages: Not all languages have formal grammars, and some
languages may not be easily parseable.
should
Overall, syntax analysis is an important stage in the compiler design process, but it
be balanced against the goals.
Top-Dow
tree,
mc
Parsers: parsin
pars
Parsing, also known as syntactic analysis, is the process of analyzing a
determine the grammatical structure of a program, sequence of tokens to
It takes the stream of tokens, which are
generated by a lexical analyzer or tokenizer, and organizes them
tree. into a parse tree Or syntax
Ine parse tree visually represents how the tokens fit
together according to the rules of the
languagessyntax. This tree structure is crucial for understanding
nelps in the next stages of processing, such as code the program's structure and
parsing ensures that the sequence of tokens follows generation or execution. Additionally,
the
language, making the program valid and ready for further syntactic rules of the programming
analysis or execution.
tokens
grammar parser
generator parser
code Intermediate
Reprsentation
What is the Role of Parser?
A parser performs syntactic and
semantic analysis of source code, converting it
intermediate representation while detecting and handling errors. into an
1. Context-free syntax analysis:
The parser checks if the structure of the
basic rules of the programming language code follows the
symbols are arranged. (like grammar rules). It looks at how words and
2. Guides context-sensitive analysis:
It helps with deeper checks that depend
of the code, like making sure variables
are on the meaning
variable used in a mathematical used correctly. For example, it ensures that a
3. Constructs an intermediate operation, like x+2, is a number and not text.
representation: The parser creates a simpler version of your
code that'seasier for the computer to understand and work with.
4. Produces meaningful error
messages:
tries to explain the problem clearly so youIf there's something wrong in your code, the parser
5. Attempts error correction: can fix it.
Sometimes,
so it can keep working without breaking the parser tries to fix small mistakes in your
code
Types of Parsing completely.
The parsing isdivided into two types, hich are
as follows:
Top-down Parsing
Bottom-up Parsing
Top-Down Parsing and Bottom-Up Parsing are used for parsing a
of the tree. Both the parsing techniques are different from each tree to reach the starting node
between the two is that top-down parsing starts from top of the other.
parse
The most basic difference
parsing starts from the lowest level of the tree, while bottom-up
parse tree.
Parsing?
What is Top-Down
Ton-Down Parsing technique is a parsing technique which starts from the top level of the parse
ree. move downwards, evaluates rules of grammar. In other words, top-down parsing is a
narsing technique that looks at the highest level of the tree at start and then moves down to the
parse tree.
The top-down parsing technique tries to identify the leftmost derivation for an input. It evaluates
the rules of grammar while parsing. Consequently, each terminal symbol in the top-down parsing
is produced by multiple production of grammar rules.
Since top-down parsing uses leftmost derivation, hence in this parsing technique, the leftmost
decision selects what production rule is used to construct the string.
Parsers
LR(0)
Bruteforce Recursive descent
method SLR(L)
parser
LALR(1)
Non-recursive CLR(1)
descent parser (LL(1})
It B Presentees(6,7) hrs
66,67,68,69,71,72,76,79,82,84,89,91,95,96,A2,A3,A4,B3,B6,C2,C3,C6,C8
LE-7,8,10,12
dest Jauion elimintin
ne
2+2+
-
2 2
t
t- ieltsoiattio
t
2
2
2 +2+ 25
F Aemouine ambii
det Rolio
Coent e qammas u olef to be Sost eusiine
Prsoluiio ude ohone -the honhmlne
en te at leno iole ot tu ule lo dppeans ay The
fit lymbol on tha bano-sle.
an amina
where
lymbol.
teninay audloy
hon. tuninaly.
dnolhey Loquene
koin tminaly,
bunietdle,
Cont ices tie folloui pouse heo
(A
guenoiloot
Tnna Aecunely enpourols tte non hninal
Seuei anag
A’ pa
nontuna .
4
¬ele
6 E4T
mpovane
kuu multiple pacuctin duhs tor a
urhan a qmmag hou
henbmina k a onmorn pufi, te pavseg pes
the
duesal pose.
Jaoal to a dutestu
tie longest
A’Ela¸ /.--/«glr
Jollouss G
A ae'
mple
A’a Aalalelate
E-b.
se; AaA
AAs&elac
aeles/ A
faas Sasblb
Sbss'le
A’ sbs lasb
- alablabelabed
A assb
Elbelbe d
bole
Bul Reuie
Besecout CLR)
LRlo) SLR) CALRI)
pousens
Bpolen poues
Contuet ap oown peey the eFq khaulol
n oley to
2- Non- olemiirm
kueh paslre
bon- ewtie eguaabut wne each
gamms.
Peelt
thne latt e s i e n .
it(lootaheno =)
mleh e'a'); ele
plutt eReoL);
3
ee
o e Poue
LL)
to Couthon
lompououi.
peng
posiug tublo
uet,
| LL) Pasey
fasinp Table
FiRsTC
FIRSs):
A b A
FIRSTC)q
Taeollou
Aale
2.he
3.
2 Folow:
Dwin the
tullauo tua nonbune on to
towiaee as Fouowty the wontmins
FottowA): Sbic3
d b t e Folouw (c)
FiRST CE)=(id, C?
e41ele
FiestC¬) 2
F tdce)
totou.
lelacbol ay
Tae folleung buninal wll be
whot fouowco),
Cousisly
the tollo¯nf Aamna:
Fottow(Q)e fITCe)
R wle
Fouowt)z (oiy?.
FoUOw te)2 y.
wonlbninle
Foto)
S’AseoE fa,b,es
Dle
¬+Te'le
f-idlce)
FidF ce)
Rales
placoo noy Fottow kela.
2 Qomainin Proouetiony ne plataal uaeg FiesT Let
S ) a Asb
A -) clt
hileh
ecaot
Kota p ponsnsr
openalae PAuolanee
peralor Paeeslena gammay tn ot thift aouue
RHS
No tioo Non tuminaly ane
RAeaolonee Can only be establihesl bebpeey tae
taumials
ane tree operals petaslance elaihone!
-) a>b mons tht ominal hou the iger pAe teoloueo to
tunal"
ame Mecaoleuee.
on
Sueu l.
tuble
id id
te
kp> esuhet t e
elota toblo
t t
->Sist
tauialel a eue he
| t
Ratahon Acta
anput
idtldid stid
7 +id id s
7 acuee tid
AClapt
Qist Raolue ansine
tte
te prabaitisn omdl
S)
Rnqut
t ay)
s-s
As)
ss-tstag
S-(sts
2mtt )
$S-(st s) Daluee
S- (s)
s-s
Peolueess-s
ACéapt.
E ’ 2 E2
E3E3 32423
()
Lsit Rooluee pomes
voe aleey.
Digttnet
SUR, LALk, cR.
L2LD) CLDI) LALRI).
-to louut
Tale
tobla.
-} chsle the dmbigu e ma
toble
Amgmout tngmoct otin
* n LRIO),
DAausung DEA
Goto
G o l o
S-A.A Racdue
Complet
A-)aA
Resluea
tonplete
CRo) toble
to
Atintmina)
atupt
S
6
al couaiy
qbb
$oas
SOa3a3 bb
1
Soa3a3bs b
(A’b)
6
b
Racluu
Paouee
bs
sOA2 by sutt
A
Le0)
cLR) LAU)
LALR)
molest cloes
) wers on tho
lamml.
)
do Conhuet
Cimple aual fast
Pae the adue mue
we
Aa )
A’ (A)la
A
)
A’(A) t
L e o )
ACA.)
ACA)
33a A>A),
A ar
\ong taba
Atn
A
fottowcs)
CLR
(a)
2.
3.
=) (a).
SLeI) CLeALALR
Leo
Precluetin
6B
Pp'P9?(-9
(a)
2
Sy
ACapt
S6
S6 S7
Ponsen
ccdd
d
d
Recuee
.
d
you alitteet o
praoluetiouy butalt
Lone
(
2
Floute
Litt opeaioy
89
But
ACcopt
Su
Yr
Syy
o
2
8
$o cc dd
2.
cdd
3.
ddt
d4
Paoue
Race
4 aoe