Compiler Design File
Compiler Design File
LAB FILE
Index
Compiler Design
3. 20-2-24
Write a program to implement DFA that recognize
identifier, constant and operator of the mini-language.
4. Write a program to check whether a given string follow the 5-3-24
given pattern (0+1)1*.
5. Write a program to implement given DFA (00 + 1)*. 5-3-24
12. 23-4-24
WAP to generate machine code from the abstract Syntax
Tree generated by theparser.
Experiment-01
Description:
2. Components of an Expression:
3. Operators: Operators are symbols that represent specific operations such as addition
(+), subtraction (-), multiplication (*), division (/), logical operations (AND, OR),
comparison operations (>, <, =), etc.
4. Types of Operators:
• Arithmetic Operators: Perform arithmetic operations.
• Logical Operators: Perform logical operations.
• Comparison Operators: Perform comparison operations.
• Assignment Operators: Assign values.
Algorithm:
i. Increment `operatorCount` by 1.
3. Return `operatorCount`.
Program:
#include<stdio.h>
int main()
int a,count=0;
scanf("%d",&a);
char c[a];
for(int i=0;i<a;i++)
scanf("%c",&c[i]);
count++;
return 0;
Output:
Experiment-02
Description:
Tokens:
In programming languages, tokens are the smallest units of a program. These units can be
identifiers, keywords, constants, operators, punctuation symbols, etc.
Types of Tokens:
Special Symbols: Specific characters with special meaning, such as the dot operator or arrow
operator in C.
Token Recognition:
Token recognition involves scanning the source code character by character and identifying
the tokens based on predefined rules and patterns.
Lexical analysis is the process of converting the sequence of characters into meaningful
tokens.
Lexical Analysis is the first phase of the compiler also known as a scanner. It converts the
High level input program into a sequence of Tokens.
The output is a sequence of tokens that is sent to the parser for syntax analysis.
Algorithm:
Algorithm: TokenRecognition
b. Check if the current character matches any of the defined token patterns.
c. If a match is found:
d. Move to the next character and repeat steps b-c until the end of the source code.
Program:
#include <stdio.h>
#include <ctype.h>
#include <string.h>
int main() {
char input[100];
token[j] = '\0';
token[j] = '\0';
} else {
return 0;
Output:
Experiment-03
Description:
The DFA serves as a mathematical model to recognize patterns in strings. It consists of states,
input symbols, transitions, a start state, and one or more accepting states. Each state
represents a specific stage of recognition for identifiers, constants, or operators. Transitions
guide the DFA's traversal through the input string based on the current input character and
state. Accepting states indicate successful recognition of a token.
Implementation involves defining states for recognizing identifiers, constants, and operators,
along with transitions between these states based on input characters. A transition function or
table guides the DFA's traversal through the input string.
Algorithm:
Input: inputString (string)
Define states for recognizing identifiers, constants, and operators. For example, "Start",
"Identifier", "Constant", "Operator", etc.
Define the alphabet of input symbols, including letters, digits, and specific symbols
representing operators.
Designate specific states as accepting states where recognition for identifiers, constants, or
operators is complete.
Define a transition table that maps the current state and input symbol to the next state.
Define transitions for each state and input symbol combination based on the rules of the mini-
language. For example:
If the current state is "Start" and the input symbol is a letter, transition to the "Identifier"
state.
If the current state is "Identifier" and the input symbol is a digit, remain in the "Identifier"
state.
If the current state is "Identifier" and the input symbol is not a valid character for identifiers,
transition to the "Error" state.
5.Initialize Variables:
a. Determine the input symbol category (letter, digit, operator, etc.) for char.
b. Look up the transition from the currentState and input symbol category in the transition
table.
If the currentState is an accepting state, return tokenType and tokenValue as the recognized
token.
If the currentState is not an accepting state, return an error indicating an unrecognized token.
Program:
#include <stdio.h>
#include <ctype.h>
int main() {
char input[100];
scanf("%s", input);
char ch = input[i];
switch (state) {
if (is_identifier_start(ch)) {
} else if (is_digit(ch)) {
} else if (is_operator(ch)) {
} else {
return 1; // Error
break;
if (is_identifier_char(ch)) {
} else {
break;
if (is_digit(ch)) {
} else {
break;
default:
return 1; // Error
if (state == 1) {
} else if (state == 2) {
return 0;
return isdigit(ch);
Output:
Experiment-04
Aim: - Write a program to check whether a given string follow the given
pattern(0+1)1*.
Description:
Given string follows the given pattern (0+1)1*, which means the string starts with either "0"
or "1" followed by one or more occurrences of "1".
• 0 and 11: These are the literal characters representing the digits "0" and "1"
respectively.
Algorithm:
function followsPattern(s):
return false
if c is not '1':
return false
return true
Program:
#include <iostream>
#include<string>
bool dfa(string s )
int n = s.length() ;
return false ;
for(int i = 1 ;i<n;i++)
if(s[i]=='1') continue ;
else
return false;
return true ;
int main()
int n = 4 ;
for(auto it : s)
if(dfa(it))
else
return 0;
Output:
Experiment-05
Description:
(00+1) ∗represents a language consisting of strings containing alternating sequences of "00"
and "1", with any number of repetitions including zero.
The DFA involves states representing different stages of pattern recognition, transitions
guiding the DFA through the input based on encountered characters, and accepting states
indicating successful recognition.
Algorithm:
1. Start
o Check if the first character of it is either '0' or '1'. If not, return false.
5. Return true if all patterns pass the checks, otherwise return false.
6. Stop
Program:
#include <iostream>
#include<string>
return true;
while(s[i]=='0')
count++; i++;
if(count % 2 != 0 )
return false ;
return true ;
int main()
int n = 5;
if(dfa(it))
else
return 0;
Output:
Experiment-06
Description:
The language L consists of strings that start with the pattern "01" repeated at least once (i >=
1), followed by exactly two "1"s, and then end with the same pattern "01" repeated the same
number of times as it appeared initially (2i).
Algorithm:
Program:
#include <iostream>
#include <string>
void checkstatea(string);
void stateb(string);
void statec(string);
void stated(string);
void statee(string);
void checkstatea(string n)
else
int i = 0;
if (n[i] == '0')
stateb(n.substr(1));
else
void stateb(string n)
int i = 0;
if (n[i] == '0')
else
statec(n.substr(1));
void statec(string n)
int i = 0;
if (n[i] == '1')
stated(n.substr(1));
else
stateb(n.substr(1));
void stated(string n)
int i = 0;
if (n.length() == 1)
if (n[i] == '1')
else
else
if (n[i] == '1')
statee(n.substr(1));
else
void statee(string n)
int i = 0;
if (n.length() == 1)
if (n[i] == '0')
else
else
if (n[i] == '0')
else
stated(n.substr(1));
int main()
int n = 4;
for(auto it :s)
cout<<it<<" ";
checkstatea(it);
return 0;
Output:
Experiment-07
Description:
A top-down parser is known as a recursive descent parser that analyses input based on grammar
rules using recursive methods. Starting with the grammar's topmost rule, the parser executes
subroutines backward for each input symbol that is not a terminal symbol until it reaches one.
A parse tree representing the input's structure according to the grammar is the parser's output.
Algorithm:
• Main Function:
• Parsing Functions:
• E function:
o Calls T to parse a term.
o Calls EP to parse the optional +TE' part.
o Returns 1 if both T and EP succeed (entire E parsed).
o Returns 0 otherwise (failure).
• EP function:
o Checks if the current input character is '+'.
o If yes, increments i and calls T and EP recursively.
o Returns 1 if '+' is found and further parsing succeeds.
o Returns 1 even if no '+' is found (epsilon case, empty string).
• T and TP functions: (similar structure to E and EP)
o Call F to parse a factor.
o Call TP to parse the optional *FT' part.
o Return 1 if both F and TP succeed, 0 otherwise.
• F function:
o Handles three cases:
▪ '(' followed by a parsed expression E and ')' (parenthesized expression).
▪ An identifier character (a-z or A-Z).
o Increments i after successful parsing in each case.
o Returns 1 if parsing succeeds, 0 otherwise.
Program:
#include<stdio.h>
#include<string.h>
char input[100];
int i;
int E();
int F();
int EP();
int T();
int TP();
int main()
printf("\nE -> TE'\nE' -> +TE'/ε\nT -> FT'\nT' -> *FT'/ε\nF -> (E)/ID\n");
if(E()) {
printf("\nString is accepted\n");
} else {
} else {
return 0;
int E() {
if(T()) {
if(EP()) {
return 1;
return 0;
int EP() {
if(input[i] == '+') {
i++;
if(T()) {
if(EP()) {
return 1;
return 0;
int T() {
if(F()) {
if(TP()) {
return 1;
return 0;
int TP() {
if(input[i] == '*') {
i++;
if(F()) {
if(TP()) {
return 1;
return 0;
int F() {
if(input[i] == '(') {
i++;
if(E()) {
if(input[i] == ')') {
i++;
return 1;
return 0;
} else if((input[i] >= 'a' && input[i] <= 'z') || (input[i] >= 'A' && input[i] <= 'Z')) {
i++;
return 1;
} else {
return 0;
Output:
Experiment-08
Description:
FIRST ()− It is a function that gives the set of terminals that begin the strings derived from the
production rule.
A symbol c is in FIRST (α) if and only if α ⇒ cβ for some sequence β of grammar symbols.
Computation of FIRST
FIRST (α) is defined as the collection of terminal symbols which are the first letters of strings
derived from α.
If Y1 does not derive to an empty string i.e., If FIRST (Y1) does not contain ε then, FIRST
(X) = FIRST (Y1, Y2, Y3) = FIRST(Y1)
FIRST (X) = FIRST (Y1, Y2, Y3) = FIRST(Y1) − {ε} ∪ FIRST(Y2, Y3)
• FIRST (Y2, Y3) = FIRST (Y2), if FIRST (Y2) does not contain ε.
Similarly, this method will be repeated for further Grammar symbols, i.e., for Y4, Y5, Y6 … .
YK.
Algorithm:
function compute_FIRST_sets(grammar):
repeat
Program:
#include <stdio.h>
#include <ctype.h>
void FIRSTfunc(char);
int count, n = 0;
char prodn[10][10],
firstTerms[10];
int main(){
int i, choice;
char c, ch;
scanf("%d", &count);
do {
n = 0;
printf("Element :");
scanf("%c", &c);
FIRSTfunc(c);
printf("}\n");
void FIRSTfunc(char c)
int j;
if (!(isupper(c)))
firstTerms[n++] = c;
if (prodn[j][0] == c)
if (prodn[j][2] == '$')
firstTerms[n++] = '$';
else FIRSTfunc(prodn[j][2]);
Output:
Experiment-09
Description:
A terminal symbol a is in FOLLOW (N) if and only if there is a derivation from the start symbol
S of the grammar such that S ⇒ αNαβ, where α and β are a (possible empty) sequence of
grammar symbols. In other words, a terminal c is in FOLLOW (N) if c can follow N at some
point in a derivation.
Computation of FOLLOW
Follow (A) is defined as the collection of terminal symbols that occur directly to the right
of A.
(a) If FIRST (β) does not contain ε then, FOLLOW (B) = {FIRST (β)}
Or
Algorihtm:
function compute_FOLLOW_sets(grammar):
Set the FOLLOW set of the start symbol to contain the end-of-string marker ($)
repeat
add FOLLOW set of left-hand side non-terminal symbol to its FOLLOW set
else:
Program:
#include <stdio.h>
#include <iostream>
#include <string.h>
int n, m = 0, p, i = 0, j = 0;
char a[10][10],
followResult[10];
void addToResult(char);
int main()
int i;
int choice;
char c, ch;
scanf("%d", &n);
// gets(a[i]); do
m = 0;
follow(c);
printf("FOLLOW(%c) = { ",
printf(" }\n");
void follow(char c)
if (a[0][0] == c)
addToResult('$');
if (a[i][j] == c)
if (a[i][j + 1] != '\0')
first(a[i][j + 1]);
follow(a[i][0]);
void first(char c)
int k;
if (!(isupper(c)))
// f[m++]=c; addToResult(c);
if (a[k][0] == c)
if (a[k][2] == '$')
follow(a[i][0]);
else if (islower(a[k][2]))
// f[m++]=a[k][2];
addToResult(a[k][2]);
else
first(a[k][2]);
void addToResult(char c)
int i;
if (followResult[i] == c)
return;
followResult[m++] = c;
Output:
Experiment-10
Descritption:
A predictive parser is a recursive descent parser that does not require backtracking or backup.
It operates in a top-down manner, meaning it starts from the root of the parse tree and works
its way down to the leaves.
Unlike some other parsing techniques, predictive parsing avoids backtracking, making it
efficient and deterministic.
The predictive parser predicts which production rule to apply based on the current input symbol
(terminal).
At each step, the choice of the rule to be expanded is made solely based on the next terminal
symbol.
For example, consider the grammar rule: A -> A1 | A2 | ... | An. If the non-terminal A is to be
further expanded, the rule is selected based on the current input symbol a.
E -> E + T | T
T -> T * F | F
F -> (E) | id
If the transition diagram reaches an accept state after consuming the entire input, the string is
successfully parsed.
Program:
#include <stdio.h>
#include <string.h>
char table[5][6][10];
int numr(char c)
switch (c)
case 'S':
return 0;
case 'A':
return 1;
case 'B':
return 2;
case 'C':
return 3;
case 'a':
return 0;
case 'b':
return 1;
case 'c':
return 2;
case 'd':
return 3;
case '$':
return 4;
return (2);
int main()
int i, j, k;
printf("%s\n", prod[i]);
fflush(stdin);
k = strlen(first[i]);
if (first[i][j] != '@')
if (strlen(pror[i]) == 1)
if (pror[i][0] == '@')
k = strlen(follow[i]);
strcpy(table[0][1], "a");
strcpy(table[0][2], "b");
strcpy(table[0][3], "c");
strcpy(table[0][4], "d");
strcpy(table[0][5], "$");
strcpy(table[1][0], "S");
strcpy(table[2][0], "A");
strcpy(table[3][0], "B");
strcpy(table[4][0], "C");
printf("\n--------------------------------------------------------\n");
printf("%-10s", table[i][j]);
if (j == 5)
printf("\n--------------------------------------------------------\n");
Output:
Experiment-11
Descritpion:
LALR Parser is Look Ahead LR Parser. It is intermediate in power between SLR and CLR
parser. It is the compaction of CLR Parser, and hence tables obtained in this will be smaller
than CLR Parsing Table.
Here, first of all, we will construct LR (1) items. Next, we will look for the items having the
same first component, and they are merged to form a single set of items. It means the states
have the same first component, but the different second component can be integrated into a
single state or item.
For Example.
Suppose if
I4: C → d ∙ , c | d
I7: C → d ∙ , $
Both items or states (I4 and I7) have the same first component, i.e., d ∙ , but a different second
component, i.e., c | d in I4 and $ in I7.
I47: C → d ∙ , c |d | $
Algorithm
Method
• Select the similar states having the same core, or first component and merge them into
one.
Let C′ = {J0, J1, J2 … . . Jm} be the resulting set.
• Construct Parsing Action for state J1 similar to CLR construction. If there is a conflict
in the Parsing Table, the algorithm can be considered to fail to produce an LALR parser.
• Construct goto actions as below.
Program:
#include <stdio.h>
#include <conio.h>
#include <stdlib.h>
#include <string.h>
int ister(char);
int isnter(char);
int isstate(char);
void error();
struct action
char row[6][5];
};
};
struct gotol
char r[3][4];
};
};
char states[12] = {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'm', 'j', 'k', 'l'};
char stack[100];
struct grammar
char left;
char right[5];
};
{'E', "e+T"},
{'E', "T"},
{'T', "T*F"},
{'T', "F"},
{'F', "(E)"},
{'F', "i"},
};
int main()
int i = 0, j, k, l, n, m, c, len;
scanf("%s", inp);
inp[len + 1] = '\0';
do
x = inp[i];
p = stacktop(stack);
isproduct(x, p);
if (strcmp(temp, "emp") == 0)
error();
if (strcmp(temp, "acc") == 0)
break;
else
if (temp[0] == 's')
i++;
else
if (temp[0] == 'r')
j = isstate(temp[1]);
dl[1] = '\0';
n = strlen(temp);
pop(stack, &top);
l = top;
y = stack[l - 1];
isreduce(y, dl[0]);
if (strcmp(temp, "acc") == 0)
else
getch();
if (*sp == 100)
else
*sp = *sp + 1;
s[*sp] = item;
char i;
i = s[top];
return i;
int k, l;
k = ister(x);
l = isstate(p);
int ister(char x)
int i;
if (x == ter[i])
return i + 1;
return 0;
int isnter(char x)
int i;
if (x == nter[i])
return i + 1;
return 0;
int isstate(char p)
int i;
if (p == states[i])
return i + 1;
return 0;
void error()
exit(0);
int k, l;
k = isstate(x);
l = isnter(p);
char item;
if (*sp == -1)
else
item = s[*sp];
*sp = *sp - 1;
return item;
int r; printf("\n");
rep(t, r);
printf("\t\t\t");
printf("%c", inp[r]);
char c; c = t[r];
switch (c)
case 'a':
printf("0");
break;
case 'b':
printf("1");
break;
case 'c':
printf("2");
break;
case 'd':
printf("3");
break;
case 'e':
printf("4");
break;
case 'f':
printf("5");
break;
case 'g':
printf("6");
break;
case 'h':
printf("7");
break;
case 'm':
printf("8");
break;
case 'j':
printf("9");
break;
case 'k':
printf("10");
break;
case 'l':
printf("11");
break;
default:
printf("%c", t[r]);
break;
Output:
Experiment-12
Aim: - WAP to generate machine code from the abstract Syntax Tree
generated by the parser.
Description:
a parse tree is generated by the parser, which is a component of the compiler that processes
the source code and checks it for syntactic correctness. The parse tree is then used by other
components of the compiler, such as the code generator, to generate machine code or
intermediate code that can be executed by the target machine.
Parse trees can be represented in different ways, such as a tree structure with nodes
representing the different elements in the source code and edges representing the relationships
between them, or as a graph with nodes and edges representing the same information. Parse
trees are typically used as an intermediate representation in the compilation process, and are
not usually intended to be read by humans.
A syntax tree is a tree-like representation of the syntactic structure of a piece of source code.
It is typically used in the process of compiler design, to represent the structure of the code in
a way that is easier to analyze and manipulate.
Syntax trees are constructed by parsing the source code, which involves analyzing the code
and breaking it down into its individual components, such as tokens, variables, and statements.
The resulting tree is made up of nodes that correspond to these various components, with the
structure of the tree reflecting the grammatical structure of the source code.
Syntax trees are useful for a variety of tasks in compiler design, such as type checking,
optimization, and code generation. They can also be used to represent the structure of other
types of linguistic or logical structures, such as natural language sentences or logical
expressions.
Algorihtm:
function generateMachineCode(node):
if node is null:
return
switch (node.type):
case ExpressionNode:
generateExpressionCode(node)
case StatementNode:
generateStatementCode(node)
case FunctionCallNode:
generateFunctionCallCode(node)
function generateExpressionCode(expressionNode):
switch (expressionNode.operator):
case ADD:
generateExpressionCode(expressionNode.leftOperand)
generateExpressionCode(expressionNode.rightOperand)
emitMachineCode(ADD_INSTRUCTION)
function generateStatementCode(statementNode):
switch (statementNode.type):
case AssignmentStatement:
generateExpressionCode(statementNode.expression)
storeResultInMemory(statementNode.variable)
case IfStatement:
generateExpressionCode(statementNode.condition)
emitMachineCode(BRANCH_IF_FALSE_INSTRUCTION, label_for_else)
generateStatementCode(statementNode.ifBranch)
emitMachineCode(BRANCH_INSTRUCTION, label_for_end)
emitLabel(label_for_else)
generateStatementCode(statementNode.elseBranch)
emitLabel(label_for_end)
function generateFunctionCallCode(callNode):
generateExpressionCode(argument)
pushArgumentToStack(argument.value)
emitMachineCode(CALL_FUNCTION_INSTRUCTION, callNode.functionName)
handleReturnValue(callNode.functionName)
Program:
#include <stdio.h>
#include <stdlib.h>
typedef enum {
NODE_ADD,
NODE_SUBTRACT,
NODE_MULTIPLY,
NODE_DIVIDE,
NODE_NUMBER
} NodeType;
NodeType type;
} ASTNode;
if (node == NULL)
return;
switch (node->type) {
case NODE_ADD:
generate_machine_code(node->left);
generate_machine_code(node->right);
printf("ADD\n");
break;
case NODE_SUBTRACT:
generate_machine_code(node->left);
generate_machine_code(node->right);
printf("SUBTRACT\n");
break;
case NODE_MULTIPLY:
generate_machine_code(node->left);
generate_machine_code(node->right);
printf("MULTIPLY\n");
break;
case NODE_DIVIDE:
generate_machine_code(node->left);
generate_machine_code(node->right);
printf("DIVIDE\n");
break;
case NODE_NUMBER:
break;
default:
// Example usage
int main() {
// Example AST: 5 + 4 * 3
root->type = NODE_ADD;
root->left->type = NODE_NUMBER;
root->left->value = 5;
root->left->left = NULL;
root->left->right = NULL;
root->right->type = NODE_MULTIPLY;
root->right->left->type = NODE_NUMBER;
root->right->left->value = 4;
root->right->left->left = NULL;
root->right->left->right = NULL;
root->right->right->type = NODE_NUMBER;
root->right->right->value = 3;
root->right->right->left = NULL;
root->right->right->right = NULL;
generate_machine_code(root);
// Clean up memory
free(root->right->right);
free(root->right->left);
free(root->right);
free(root->left);
free(root);
return 0;
Output: