0% found this document useful (0 votes)
154 views

Compiler Design

The document describes developing a recursive descent parser. It begins by discussing the prerequisites of LL(1) parsing tables and classifications of top-down parsers. It then explains that a recursive descent parser is a type of top-down predictive parser where the grammar is written such that no backtracking is required during parsing. An example grammar is given that is first transformed by removing left recursion and left factoring to make it suitable for a recursive descent parser. It is explained that for a recursive descent parser, a program is written for each variable/non-terminal in the grammar to define its production rules. Pseudocode is then provided of a simple recursive descent parser implementation for the example grammar with functions for the start symbol E and its derivative

Uploaded by

Niharika Saxena
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
154 views

Compiler Design

The document describes developing a recursive descent parser. It begins by discussing the prerequisites of LL(1) parsing tables and classifications of top-down parsers. It then explains that a recursive descent parser is a type of top-down predictive parser where the grammar is written such that no backtracking is required during parsing. An example grammar is given that is first transformed by removing left recursion and left factoring to make it suitable for a recursive descent parser. It is explained that for a recursive descent parser, a program is written for each variable/non-terminal in the grammar to define its production rules. Pseudocode is then provided of a simple recursive descent parser implementation for the example grammar with functions for the start symbol E and its derivative

Uploaded by

Niharika Saxena
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

EXPERIMENT NO.

1
DEVELOPING A LEXICAL ANALAYZER TO IDENTIFY SOME PATTERNS

AIM:

To develop a lexical analyzer to identify identifiers, constants, comments, operators etc. using C
program.

ALGORITHM:

Step1: Start the program.


Step2: Declare all the variables and file pointers.
Step3: Display the input program.
Step4: Separate the keyword in the program and display it.
Step5: Display the header files of the input program
Step6: Separate the operators of the input program and display it.
Step7: Print the punctuation marks.
Step8: Print the constant that are present in input program.
Step9: Print the identifiers of the input program.

PROGRAM CODE:

//Develop a lexical analyzer to recognize a few patterns in C.

#include<string.h>
#include<ctype.h>
#include<stdio.h>
#include<stdlib.h>
void keyword(char str[10])
{
if(strcmp("for",str)==0||strcmp("while",str)==0||strcmp("do",str)==0||strcmp("int",str)==0||strcmp("flo
at",str)==0||
strcmp("char",str)==0||strcmp("double",str)==0||strcmp("printf",str)==0||strcmp("switch",str)==0||
strcmp("case",str)==0)
printf("\n%s is a keyword",str);
else
printf("\n%s is an identifier",str);
}
void main()
{
FILE *f1,*f2,*f3;
char c,str[10],st1[10];
int num[100],lineno=0,tokenvalue=0,i=0,j=0,k=0;
f1=fopen("input","r");
f2=fopen("identifier","w");
f3=fopen("specialchar","w");
while((c=getc(f1))!=EOF)
{
if(isdigit(c))
{
tokenvalue=c-'0';
c=getc(f1);
while(isdigit(c))
{
tokenvalue*=10+c-'0';
c=getc(f1);
}
num[i++]=tokenvalue;
ungetc(c,f1);
}
else
if(isalpha(c))
{
putc(c,f2);
c=getc(f1);
while(isdigit(c)||isalpha(c)||c=='_'||c=='$')
{
putc(c,f2);
c=getc(f1);
}
putc(' ',f2);
ungetc(c,f1);
}
else
if(c==' '||c=='\t')
printf(" ");
else
if(c=='\n')
lineno++;
else
putc(c,f3);
}
fclose(f2);
fclose(f3);
fclose(f1);
printf("\n the no's in the program are:");
for(j=0;j<i;j++)
printf("\t%d",num[j]);
printf("\n");
f2=fopen("identifier","r");
k=0;
printf("the keywords and identifier are:");
while((c=getc(f2))!=EOF)
if(c!=' ')
str[k++]=c;
else
{
str[k]='\0';
keyword(str);
k=0;
}
fclose(f2);
f3=fopen("specialchar","r");
printf("\n Special Characters are");
while((c=getc(f3))!=EOF)
printf("\t%c",c);
printf("\n");
fclose(f3);
printf("Total no of lines are:%d",lineno);
}

RESULT:

Thus the program for developing a lexical analyzer to recognize a few patterns in C has been executed
successfully.
EXPERIMENT NO. 2
Write a program to parse using Brute force technique of Topdown parsing

#include<stdio.h>
#include<conio.h>
#include<iostream.h>
void main()
{
int a[30];
clrscr();
int min=10000,temp=0,i,lev,n,noofc,z;
printf("please enter how many number");
cin>>n;
for(i=0;i<n;i++)
a[i]=0;
cout<<"enter value of root";
cin>>a[0];
for(i=1;i<=n/2;i++)
{
cout<<"please enter no of child of parent with value"<<a[i-1]<<":";
cin>>noofc;
for(int j=1;j<=noofc;j++)
{z=(i)*2+j-2;
cout<<"please enter value of child";
cin>>a[z];
}
}
for(i=n-1;i>=n/2;i--)
{
temp=0;
for(int j=i+1;j>=1;j=j/2)
temp=temp+a[j-1];
if(temp<min)
min=temp;
cout<<"temp min is"<<temp<<"\n";
}
cout<<"min is"<<min;
getch();
}
EXPERIMENT N0. 3
Develop LL (1) parser (Construct parse table also)

Prerequisite – Classification of top down parsers, FIRST Set, FOLLOW Set


A top-down parser builds the parse tree from the top down, starting with the start non-terminal. There
are two types of Top Down Parsers:

Top Down Parser with Backtracking


Top Down Parsers without Backtracking

Top Down Parsers without Backtracking can further be divided into two parts:
In this article we are going to discuss about Non-Recursive Descent which is also known as LL(1)
Parser.

LL(1) Parsing:
Here the 1st L represents that the scanning of the Input will be done from Left to Right manner and
second L shows that in this Parsing technique we are going to use Left most Derivation Tree. and
finally the 1 represents the number of look ahead, means how many symbols are you going to see
when you want to make a decision.

Construction of LL(1) Parsing Table:


To construct the Parsing table, we have two functions:

1: First(): If there is a variable, and from that variable if we try to drive all the strings then the
beginning Terminal Symbol is called the first.

2: Follow(): What is the Terminal Symbol which follow a variable in the process of derivation.

Now, after computing the First and Follow set for each Non-Terminal symbol we have to construct the
Parsing table. In the table Rows will contain the Non-Terminals and the column will contain the
Terminal Symbols.
All the Null Productions of the Grammars will go under the Follow elements and the remaining
productions will lie under the elements of First set.

Now, let’s understand with an example.

Example-1:
Consider the Grammar:

E --> TE'
E' --> +TE' | e
T --> FT'
T' --> *FT' | e
F --> id | (E)

**e denotes epsilon


Find their first and follow sets:

FIRST FOLLOW
E –> TE’ { id, ( } { $, ) }
E’ –> +TE’/e { +, e } { $, ) }
T –> FT’ { id, ( } { +, $, ) }
T’ –> *FT’/e { *, e } { +, $, ) }
F –> id/(E) { id, ( } { *, +, $, ) }
Now, the LL(1) Parsing Table is:

ID + * ( ) $
E E –> TE’ E –> TE’
E’ E’ –> +TE’ E’ –> e E’ –> e
T T –> FT’ T –> FT’
T’ T’ –> e T’ –> *FT T’ –> e T’ –> e
F F –> id F –> (E)
As you can see that all the null productions are put under the follow set of that symbol and all the
remaining productions are lie under the first of that symbol.

Note: Every grammar is not feasible for LL(1) Parsing table. It may be possible that one cell may
contain more than one production.

Let’s see with an example.

Example-2:
Consider the Grammar

S --> A | a
A --> a
Find their first and follow sets:
FIRST FOLLOW
S –> A/a {a} {$}
A –>a { a } { $ }
Parsing Table:

A $
S S –> A, S –> a
A A –> a
Here, we can see that there are two productions into the same cell. Hence, this grammar is not feasible
for LL(1) Parser.
EXPERIMENT NO.4
Develop an operator precedence parser (Construct parse table also)

Operator Precedence Grammar-

A grammar that satisfies the following 2 conditions is called as Operator Precedence Grammar–
There exists no production rule which contains ε on its RHS.
There exists no production rule which contains two non-terminals adjacent to each other on its RHS.

It represents a small class of grammar.


But it is an important class because of its widespread applications.

Examples-

Operator Precedence Parser-


A parser that reads and understand an operator precedence grammar
is called as Operator Precedence Parser.

Designing Operator Precedence Parser-


In operator precedence parsing,
Firstly, we define precedence relations between every pair of terminal symbols.
Secondly, we construct an operator precedence table.

Defining Precedence Relations-


The precedence relations are defined using the following rules-

Rule-01:

If precedence of b is higher than precedence of a, then we define a < b


If precedence of b is same as precedence of a, then we define a = b
If precedence of b is lower than precedence of a, then we define a > b

Rule-02:

An identifier is always given the higher precedence than any other symbol.
$ symbol is always given the lowest precedence.

Rule-03:

If two operators have the same precedence, then we go by checking their associativity.
EXPERIMENT NO.5
Develop a recursive descent parser

Compiler Design | Recursive Descent Parser

Prerequisite – Construction of LL(1) Parsing Table, Classification of top down parsers


Parsing is the process to determine whether the start symbol can derive the program or not. If the
PArsing is successful then the program is a valid program otherwise the program is invalid.
There are generally two types of Parsers:

Top-Down Parsers:
In this Parsing technique we expand the start symbol to the whole program.
Recursive Descent and LL parsers are the Top-Down parsers.
Bottom-Up Parsers:
In this Parsing technique we reduce the whole program to start symbol.
Operator Precedence Parser, LR(0) Parser, SLR Parser, LALR Parser and CLR Parser are the Bottom-
Up parsers.
Recursive Descent Parser:
It is a kind of Top-Down Parser. A top-down parser builds the parse tree from the top to down, starting
with the start non-terminal. A Predictive Parser is a special case of Recursive Descent Parser, where
no Back Tracking is required.
By carefully writing a grammar means eliminating left recursion and left factoring from it, the
resulting grammar will be a grammar that can be parsed by a recursive descent parser.

Example:

BEFORE REMOVING LEFT RECURSION AFTER REMOVING LEFT RECURSION


E –> E + T | T
T –> T * F | F
F –> ( E ) | id E –> T E’
E’ –> + T E’ | e
T –> F T’
T’ –> * F T’ | e
F –> ( E ) | id
**Here e is Epsilon
For Recursive Descent Parser, we are going to write one program for every variable.

Example:
Grammar: E --> i E'
E' --> + i E' | e
filter_none
brightness_4
int main()
{
// E is a start symbol.
E();

// if lookahead = $, it represents the end of the string


// Here l is lookahead.
if (l == '$')
printf("Parsing Successful");
}

// Definition of E, as per the given production


E()
{
if (l == 'i') {
match('i');
E'();
}
}

// Definition of E' as per the given production


E'()
{
if (l == '+') {
match('+');
match('i');
E'();
}
else
return ();
}

// Match function
match(char t)
{
if (l == 't') {
l = getchar();
}
else
printf("Error");
}
EXPERIMENT NO. 6
Write a program for generating for various intermediate code forms
i) Three address code ii) Polish notation

Three address code in Compiler

Prerequisite – Intermediate Code Generation

Three address code is a type of intermediate code which is easy to generate and can be easily
converted to machine code.It makes use of at most three addresses and one operator to represent an
expression and the value computed at each instruction is stored in temporary variable generated by
compiler. The compiler decides the order of operation given by three address code.

General representation –

a = b op c
Where a, b or c represents operands like names, constants or compiler generated temporaries and op
represents the operator.

Definition - What does Polish Notation (PN) mean?

Polish notation is a notation form for expressing arithmetic, logic and algebraic equations. Its most
basic distinguishing feature is that operators are placed on the left of their operands. If the operator has
a defined fixed number of operands, the syntax does not require brackets or parenthesis to lessen
ambiguity.

Polish notation is also known as prefix notation, prefix Polish notation, normal Polish notation,
Warsaw notation and Lukasiewicz notation.
EXPERIMENT 8
Flex(Fast Lexical Analyzer Generator )

FLEX (fast lexical analyzer generator) is a tool/computer program for generating lexical analyzers
(scanners or lexers) written by Vern Paxson in C around 1987. It is used together with Berkeley Yacc
parser generator or GNU Bison parser generator. Flex and Bison both are more flexible than Lex and
Yacc and produces faster code.
Bison produces parser from the input file provided by the user. The function yylex() is automatically
generated by the flex when it is provided with a .l file and this yylex() function is expected by parser to
call to retrieve tokens from current/this token stream.

Note: The function yylex() is the main flex function which runs the Rule Section and extension (.l) is
the extension used to save the programs.

Installing Flex on Ubuntu:

sudo apt-get update


sudo apt-get install flex
Note: If Update command is not run on the machine fom a while, it’s better to run it first so that a
newer version is installed as an older version might not work with the other packages installed or may
not be present now.

Given image describes how the Flex is used:

Step 1: An input file describes the lexical analyzer to be generated named lex.l is written in lex
language. The lex compiler transforms lex.l to C program, in a file that is always named lex.yy.c.
Step 2: The C complier compile lex.yy.c file into an executable file called a.out.
Step 3: The output file a.out take a stream of input characters and produce a stream of tokens.

Program Structure:
In the input file, there are 3 sections:
1. Definition Section: The definition section contains the declaration of variables, regular definitions,
manifest constants. In the definition section, text is enclosed in “%{ %}” brackets. Anything written in
this brackets is copied directly to the file lex.yy.c

Syntax:

%{
// Definitions
%}
2. Rules Section: The rules section contains a series of rules in the form: pattern action and pattern
must be unintended and action begin on the same line in {} brackets. The rule section is enclosed in
“%% %%”.
Syntax:

%%
pattern action
%%
Examples: Table below shows some of the pattern matches.

PATTERN IT CAN MATCH WITH


[0-9] all the digits between 0 and 9
[0+9] either 0, + or 9
[0, 9] either 0, ‘, ‘ or 9
[0 9] either 0, ‘ ‘ or 9
[-09] either -, 0 or 9
[-0-9] either – or all digit between 0 and 9
[0-9]+ one or more digit between 0 and 9
[^a] all the other characters except a
[^A-Z] all the other characters except the upper case letters
a{2, 4} either aa, aaa or aaaa
a{2, } two or more occurrences of a
a{4} exactly 4 a’s i.e, aaaa
. any character except newline
a* 0 or more occurrences of a
a+ 1 or more occurrences of a
[a-z] all lower case letters
[a-zA-Z] any alphabetic letter
w(x | y)z wxz or wyz
3. User Code Section: This section contain C statements and additional functions. We can also compile
these functions separately and load with the lexical analyzer.

Basic Program Structure:

%{
// Definitions
%}

%%
Rules
%%

User code section

How to run the program:


To run the program, it should be first saved with the extension .l or .lex. Run the below commands on
terminal in order to run the program file.
Step 1: lex filename.l or lex filename.lex depending on the extension file is saved with
Step 2: gcc lex.yy.c
Step 3: ./a.out
Step 4: Provide the input to program in case it is required

Note: Press Ctrl+D or use some rule to stop taking inputs from the user. Please see the output images
of below programs to clear if in doubt to run the programs.

Recommended: Please try your approach on {IDE} first, before moving on to the solution.

Example 1: Count the number of characters in a string

filter_none
brightness_4
/*** Definition Section has one variable
which can be accessed inside yylex()
and main() ***/
%{
int count = 0;
%}

/*** Rule Section has three rules, first rule


matches with capital letters, second rule
matches with any character except newline and
third rule does not take input after the enter***/
%%
[A-Z] {printf("%s capital letter\n", yytext);
count++;}
. {printf("%s not a capital letter\n", yytext);}
\n {return 0;}
%%

/*** Code Section prints the number of


capital letter present in the given input***/
int yywrap(){}
int main(){

// Explanation:
// yywrap() - wraps the above rule section
/* yyin - takes the file pointer
which contains the input*/
/* yylex() - this is the main flex function
which runs the Rule Section*/
// yytext is the text in the buffer

// Uncomment the lines below


// to take input from file
// FILE *fp;
// char filename[50];
// printf("Enter the filename: \n");
// scanf("%s",filename);
// fp = fopen(filename,"r");
// yyin = fp;

yylex();
printf("\nNumber of Captial letters "
"in the given input - %d\n", count);

return 0;
}

You might also like