Cat 1
Cat 1
• It also checks the program for some errors like lexical errors, grammar
errors, and syntax errors.
• The analysis part also collects information about the source program and
stores it in a data structure called a symbol table.
• Symbol table will be used all over the compilation process.
2. Synthesis (Back end of a compiler)
• It will get the analysis phase input(intermediate representation and
symbol table) and produces the targeted machine level code.
Phases of compilation
• The compilation
process contains the
sequence of various
phases.
• Each phase takes
source program in
one representation
and produces output
in another
representation.
• Each phase takes
input from its
previous stage.
• The symbol table,
which stores
information about
the entire source
program, is used by
all phases of the
compiler
a. Lexical Analysis
• The first phase of a compiler is called lexical analysis or scanning.
• The lexical analyzer reads the stream of characters making up the source
program and groups the characters into meaningful sequences called
lexemes.
• This process can be left to right and character by character.
• The primary functions of this phase are:
– Identify the lexical units in a source code
– Classify lexical units into classes like constants, reserved words, and
enter them in different tables. It will Ignore comments in the source
program
– Identify token which is not a part of the language
• For each lexeme, the lexical analyzer produces as output a token of the
form
(token-name, attribute-value)
b. Syntax Analysis
• The second phase of the compiler is syntax analysis or parsing.
• It determines whether or not a text follows the expected format.
• The main aim of this phase is to make sure that the source code was
written by the programmer is correct or not.
• Syntax analysis is based on the rules based on the specific programing
language by constructing the parse tree with the help of tokens.
• List of tasks performed in this phase
– Obtain tokens from the lexical analyzer
– Checks if the expression is syntactically correct or not
– Report all syntax errors
– Construct a hierarchical structure which is known as a parse tree
• Example
c. Semantic Analysis
• Semantic analysis checks the semantic consistency of the code.
• It uses the syntax tree of the previous phase along with the symbol
table to verify that the given source code is semantically consistent.
• It also checks whether the code is conveying an appropriate meaning.
• An important part of semantic analysis is type checking, where the
compiler checks that each operator has matching operands.
• Functions of Semantic analyses phase are:
– Helps you to store type information gathered and save it in symbol
table or syntax tree
– Allows you to perform type checking
– In the case of type mismatch, where there are no exact type
correction rules which satisfy the desired operation a semantic error
is shown
– Collects type information and checks for type compatibility
– Checks if the source language permits the operands or not
Example
float x = 20.2;
float y = x*30;
The semantic analyzer will typecast the integer 30 to float 30.0 before
multiplication
d. Intermediate Code Generation
• In the intermediate code generation, compiler generates the source
code into the intermediate code.
• Intermediate code is generated between the high-level language and
the machine language.
• The intermediate code should be generated in such a way that you can
easily translate it into the target machine code.
• The two most important kinds of intermediate representations are:
– Tree i.e. “parse trees” and “syntax tree”.
– A Linear representation i.e., “three address code”.
Example
total = count + rate * 5
Intermediate code with the help of address code method is:
t1 := int_to_float(5)
t2 := rate * t1
t3 := count + t2
total := t3
e. Code Optimization
• Code optimization is an optional phase.
• It is used to improve the intermediate code so that the output of the
program could run faster and take less space.
• It removes the unnecessary lines of the code and arranges the
sequence of statements in order to speed up the program execution.
• Removing unreachable code and getting rid of unused variables
• Removing statements which are not altered from the loop
f. Code Generation
• Code generation is the last and final phase of a compiler.
• It gets inputs from code optimization phases and produces the page
code or object code as a result.
• The objective of this phase is to allocate storage and generate
relocatable machine code.
• It also allocates memory locations for the variable.
• The instructions in the intermediate code are converted into machine
instructions.
• This phase coverts the optimize or intermediate code into the target
language (Machine Code)
Use of symbol tables in different phases
Lexical Analysis
• It is the first phase of the compiler.
• The input for lexical analysis is source code.
• Scans the pure HLL code line by line.
• After taking source code as an input, it breaks them into valid tokens by
removing whitespace, comment from source code.
• If there are any invalid tokens present in the source code, it will show an
error.
• The output of the lexical analysis is a sequence of tokens, which will be
further sent to the syntax analysis as an input.
Operation on Languages
Regular Expression
Regular Definition
Strings and Languages
1. String
• Alphabet or character class is a finite set of symbols. Denoted as
Example :
The set of digits (symbols) = {0, 1} forms a binary alphabet.
ASCII used in almost every computer, denotes the alphabet A using the set
of digits {0, 1} i.e. A = 01000001.
= {a, b,…..,z} is the set of lower case letters
L= {0,01,1,10,100,010,……}
L={a,b,ab,ba,abb,….}
• {ε} – set containing only empty string is language under φ.
Operations on strings
Length of String
• The length of the string can be determined by the number of alphabets
in the string.
• The string is represented by the letter ‘s’ and |s| represents the length
of the string.
s = banana, |s| = 6
s= 1100 , |s| = 4
Empty string
• The empty string or the string with length 0 is represented by ‘∈’.
• The string does not contain any character
|∈|=0
TERM DEFINITION EXAMPLE
Prefix of s A string obtained by ban is a prefix of banana.
removing zero or more S=abcd
trailing symbols of string s Prefix: ∈, a, ab, abc, abcd
Suffix of s A string formed by nana is a suffix of banana.
deleting zero or more of s = abcd
the leading symbols of s. Suffix: ∈, d, cd, bcd, abcd
Substring of s A string obtained by nan is a substring of banana.
deleting a prefix and a s=banana
suffix from s Substring :∈ nan,na,anan
Proper prefix, Any nonempty string x S= abcd
suffix, or substring that is a prefix, suffix or Proper Prefix : a, ab, abc
of s substring of s that s <> x. Proper Suffix :d, cd, bcd
Substring : bcd, abc, cd, ab
Subsequence of s Any string formed by baaa is a subsequence of
deleting zero or more not banana
necessarily contiguous S=abcd
symbols from s Subsequence : abd, bcd, bd
Operation on Languages
AXIOM DESCRIPTION
r|s = s|r | is commutative
r|(s|t) = (r|s)|t | is associative
(rs)t = r(st) Concatenation is associative
r(s|t) = rs|rt Concatenation distributes over |
(s|t)r = sr|tr
Letter a|b|c|…….|z|A|B|…..|Z
digit 0|1|2|…..|9
idLetter (Letter / digit)*
or
number [0-9]+
op[-|+|*|/|^|=]
• They are not processed by the lex tool instead are copied by the lex to
the output file lex.yy.c file.
• It is bracketed with %{ and %}
(ii) Translation rules
• It contains regular expressions (patterns to be matched) and code
segments (corresponding code to be executed).
• Rule has the form
Pattern {Action}
• Pattern is a regular expression or regular definition.
– Starts from the first column
• Action refers to segments of code.
– Must begin on the same line
– Multiple sentences are enclosed within braces ({})
%% INPUT OUTPUT
{number} {printf("" number");} 13 number
{op} {printf(" operator");} + operator
%% 13 + 17 number operator number
(iii) Auxiliary functions
• LEX generates C code for the rules specified in the Rules section and
places this code into a single function called yylex().
• This section holds additional functions which are used in actions. These
functions are compiled separately and loaded with lexical analyzer.
• These functions are compiled separately and loaded with lexical analyzer.
%% Input: z
int main(){ Ouptut: Reject
int token = yylex();
if(token==ID)
printf("Accept\n");
else if(token==ER)
printf("Reject\n");
return 1;
}
BCSE307L_COMPILER DESIGN
SYNTAX ANALYSIS
Dr. B.V. Baiju,
SCOPE,Assistant Professor
VIT, Vellore
SYNTAX ANALYSIS
• In syntax analysis, the compiler checks the syntactic structure of the input
string, i.e., whether the given string follows the grammar or not.
• It uses a data structure called a parse tree or syntax tree to make
comparisons.
• It is used to check if the code is grammatically correct or not.
• It helps us to detect all types of syntax errors.
• It gives an exact description of the error.
• It rejects invalid code before actual compiling.
• Syntax Analyser Terminology
SYNTAX ANALYSIS
Dr. B.V. Baiju,
SCOPE,Assistant Professor
VIT, Vellore
Top–Down Parsing
• The top–down parser starts by constructing the parse tree with a single
node labelled with the start symbol.
• It can then build up the complete parse tree by creating the subtrees one
by one, in a left-to-right order.
• In building a subtree, the root node of that subtree is created and than all
the sub-subtrees of that subtree are generated.
Parse Trees and the Leftmost Derivation
• Illustrate the parsing for the expression x+y*z.
1. The parser starts off by constructing a tree
containing just the starting symbol as the root
node.
2. The next step in the pre-order generation is to set
up the leftmost subnode. This is done by looking at
the grammar of the language and noting
that<expr>is defined as
<expr> ::= <term> | <expr> + <term>
• Our input is x+y*z, the x is matched and the remaining input is +y*z.
6. The next step is to deal with 7. The latest <term> is given the
the third node <term>. We use child <factor> from the production
the below production and the
tree becomes: <term> ::= <factor>