0% found this document useful (0 votes)

12 views9 pages

cc-lab2

The document outlines the objectives and specifications for a lab focused on building a lexer for a toy programming language. It includes details on the language's alphabet, operators, constants, identifiers, keywords, and tasks for implementing DFA/NFA in C for recognizing various tokens. Additionally, it describes a bigger task of creating a complete lexer that processes source code and handles comments and string literals.

Uploaded by

f20220155

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views9 pages

cc-lab2

Uploaded by

f20220155

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

CS F363 Compiler Construction

Second Semester, 2024-25

Lab-2: Building Lexer for a Toy Language

1 Objectives

The objectives of this lab are the following.

1. To build a lexical analyzer (lexer) for a simplified toy programming language based on
the given specifications.

2. To understand and implement Deterministic Finite Automata (DFA) and

Non-deterministic Finite Automata (NFA) for recognizing various tokens such as
operators, constants, iden tifiers, and keywords.

3. To write C programs capable of recognizing and classifying tokens according to the lan
guage rules and reporting lexical errors where applicable.

4. To integrate all components into a complete lexer that can process an entire program
and generate a sequence of tokens with their types and lexemes.

2 Language Specifications

The toy language is made up of the following alphabet.

Alphabet

• Lowercase alphabets:
a, b, c, d, ..., z

• Digits :
0, 1, 2, ..., 9

• Special symbols:
+ - % / * < > = _ ( ) ; , (comma) : { }
Operators

• Arithmetic Operators :
+, -, *, /, %

• Relational Operators
= (equal to), >, <, >=, <=, <> (not equal to)
• Assignment Operators
:=, +=, -=, *=, /=, %=

• Separators
(),;{}"

Constants

Constants in C represent fixed values that do not change during the execution of a program.
Types of constants include the following.

• Integer Constants: Whole numbers without a decimal point, for example, 42, -17, 0, etc.

• Floating-Point Constants: Numbers with a decimal point, for example, 3.14, -0.001,
12.4565, etc.

Variables and Identifiers

An identifier in the toy language must begin with a lowercase letter (a − z) and contain only
lowercase letters, digits (0 − 9), and underscores (_). However, at most one underscore (_)
is allowed.
Example:
Valid variable names: age, count, tax_12, net_income, is_ready;

Invalid variable names: _sum, sum_curr_total, 1sum;

Keywords

The toy language provides the following keywords:

int, char, float, if, else, while, for, main

Keywords cannot be used as variable names.

3 DFA/NFA to recognize tokens and C implementation 3.1

Relational operator
Figure 1: NFA to recognize the relational operators

A snippet of the implementation is below (you must complete the code).

#include <stdio.h>
#include <ctype.h>

#define YES 1
#define NO 0

// Token structure definition

typedef struct {
char type[10];
char value[3];
} token;

token newToken(const char type, const char value) {

token t;
snprintf(t.type, sizeof(t.type), "%s", type);
snprintf(t.value, sizeof(t.value), "%s", value);
return t;
}

void retract() {
ungetc(getchar(), stdin);
}

void fail() {
printf("Lexical error: invalid relational operator.\n");
exit(1);
}
token getRelop() {
int state = 0;
char c;

while (YES) {
switch (state) {
case 0:
c = getchar();
if (c == ’<’) state = 1;
else if (c == ’=’) state = 5;
else if (c == ’>’) state = 6;
else fail();
break;
case 1:
.
.
.
case 2:
return newToken("relop", "LE");

case 4:
retract();
return newToken("relop", "LT");

default:
state = 0;
break;
}
}
}
int main() {
token result = getRelop();
printf("Token Type: %s, Token Value: %s\n", result.type, result.value); return 0;
}

Sample input / output:

$ ./a.out
<=
Token Type: relop, Token Value: LE
$ ./a.out
=>
Token Type: relop, Token Value: EQ // the longest prefix matched is = $ ./a.out
<>
Token Type: relop, Token Value: NE
$ ./a.out
_<
Lexical error: invalid relational operator. // - is not a relational operator
Task : 1

(a) Complete the above code that recognizes the relational operators at the prefix of the
input string.

(b) Give a DFA-based C implementation to identify the arithmetic operators (the list is given
above) at the prefix of the input string.

(c) Give a DFA-based C implementation to identify the assignment operators (the list is
given above) at the prefix of the input string.

(d) Give a DFA-based C implementation to identify all three types of operators: relational
operators, arithmetic operators, and assignment operators. In addition, your code
should also identify the separators at the prefix of the input string.
3.2 Constants

+|-
0 1 digit digit

other
3

2
digit
digit
digit
· other 4 5 6

Here, digit is any numeric character of the set {0, 1, . . . , 9} and other is a character other
than a digit or a dot (.). Furthermore, nodes 3 and 6 are retracted states.
Task: 2 Complete the following code so that it recognizes signed integers and real numbers,
which is the (longest) prefix of the given string.
Sample input / output:
$ ./a.out
12
Token Type: Integer, Token Value: IN
$ ./a.out
-11
Token Type: Integer, Token Value: IN
$./a.out
-56.78q
Token Type: Real num, Token Value: FL // the longest prefix is -58.76 $./a.out
--12
Lexical error: invalid constant. // the first - is not part of the number
token getNum() {
int state = 0;
char c;

while (YES) {
switch (state) {

case 0:
c = getchar();
if (c == ’+’ || c == ’-’) state = 1;
else if (isdigit(c)) state = 2;
else

case 3:
retract();
return newToken("Integer", "INT");
case 4:

case 6:
retract();
return newToken("Real num", "FLOAT");

default:
fail();
}
}
}

3.3 Identifiers

alpha | digit alpha | digit

_ | other
alpha
_
0123

other

Here, alpha is the set of all lowercase alphabets, digit is the set {0, 1, . . . , 9}. Further, other
is the set of all characters other than lowercase letters and digits. Here, 3 is a retracted
state.
Task: 3 Complete the following code so that it recognizes the identifier which is the (longest)
prefix of the given input string.
void fail() {
printf("Lexical error: Not started with lower case alphabet\n"); exit(1);
}

token getId() {
int state = 0;
char c;

while (YES) {
switch (state) {

case 0:
c = getchar();
if (islower(c)) state = 1;
else fail();
break;

case 1:

case 2:
case 3:
retract();
return newToken("Identifier", "ID");

default:
fail();

}
}
}

Sample input / output:

$ ./a.out
sum123
Token Type: Identifier, Token Value: ID
$ ./a.out
sum_123
Token Type: Identifier, Token Value: ID
$ ./a.out
sum12_45
Token Type: Identifier, Token Value: ID
$ ./a.out
sum_12_12
Token Type: Identifier, Token Value: ID // The valid lexeme is sum_12 $ ./a.out
_123
Lexical error: Not started with lowercase alphabet
$ ./a.out
12sum
Lexical error: Not started with lowercase alphabet
3.4 Key words

In this toy language, only the following words are reserved as

keywords. int, char, float, if, else, while, for, main

Task: 4 Construct a DFA that recognizes keywords from the above list and write a C imple
mentation to identify a keyword that is the (longest) prefix of the given input string.

4 Bigger Task: Lexer for the toy language

In the preceding sections and tasks, you developed various lexers capable of recognizing op
erators, constants, identifiers, and keywords. Your objective is to write a C program that
implements a lexer that will read a source code file named input.txt, and divide the program
into a sequence of valid tokens.
A sample program can be seen below:
main( )
{
int sum, float_num;
float cgpa_sem1_1;
for(int i:=-5; i<=10; )
sum *= i ;
if(sum <> 0)
}

The output of your code for the above program must be:
i Identifier
:= Assignment Operator
Lexeme Token type Lexeme Token type
------------------------ ------------------------
main Keyword -5 Integer
( Separator ; Separators
) Separator i Identifier
{ Separator <= Relational Operator 10 Integer
int Keyword ; Separator
sum Identifier ) Separator
, Separator sum Identifier
float_num Identifier *= Assignment Operator i Identifier
; Separator ; Separator
float Keyword if Keyword
cgpa_sem1 Identifier ( Separator
_ Invalid operator 1 Integer sun Identifier
; Separator <> Relational Operator 0 Integer
for Keyword ) Separator
( Separator } Separator
int Keyword
4.1 Adding more patterns

Modify your above lexer/scanner by adding logic to detect and process multi-line comments,
single-line comments, and string literals. Ensure that unclosed comments or strings produce
an error message. To handle this, you must consider that the toy language has a new
constant type String constant (a string between “ and ”).
Sample program:
int a := 10; // This is a single-line comment
char str;
/* This is a
multi-line comment */
if (a < b) {
str := "Value of a is less than b" // String literal
} else {
/* Unclosed multi-line comment
return 0;
}

The output of the above program must be:

Lexeme Token type
------------------------------------------
int Keyword
a Identifier
:= Assignment operator
10 Integer
; Separator
char Keyword
str Identifier
; Separator
if Keyword
( Separator
a Identifier
< Relational Operator
b Identifier
) Separator
{ Separator
str Identifier
:= Assignement Operator
"Value of a sis less than b" String constant
} Separator
else Keyword
{ Separator
ERROR: Unclosed multi-line comment

Aditya Panwar Compiler File
No ratings yet
Aditya Panwar Compiler File
29 pages
Experiment - 1: Aim: Write A C Program To Design Lexical Analyzer Which Will Identify Keywords, Identifiers, Source Code
No ratings yet
Experiment - 1: Aim: Write A C Program To Design Lexical Analyzer Which Will Identify Keywords, Identifiers, Source Code
5 pages
Compiler Design Lab
No ratings yet
Compiler Design Lab
43 pages
Compiler Design 3170701 LabManual 2022
No ratings yet
Compiler Design 3170701 LabManual 2022
85 pages
Krisha CD
No ratings yet
Krisha CD
63 pages
R20 CD Lab Manual
No ratings yet
R20 CD Lab Manual
43 pages
BDA Assignment
No ratings yet
BDA Assignment
55 pages
CN Record
No ratings yet
CN Record
64 pages
cs3501 Compiler Design Lab
No ratings yet
cs3501 Compiler Design Lab
47 pages
Final Lab Manual CC
No ratings yet
Final Lab Manual CC
42 pages
Experiment 1
No ratings yet
Experiment 1
51 pages
ccfile
No ratings yet
ccfile
44 pages
CD_Lab_manual[1] (5)
No ratings yet
CD_Lab_manual[1] (5)
41 pages
CD 1
No ratings yet
CD 1
31 pages
Teja CD Record
No ratings yet
Teja CD Record
33 pages
CD LAB MANUAL
No ratings yet
CD LAB MANUAL
68 pages
CD Lab
No ratings yet
CD Lab
39 pages
CD Lab Prgms Final
No ratings yet
CD Lab Prgms Final
43 pages
COMPILER DESIGN LAB MANUAL
No ratings yet
COMPILER DESIGN LAB MANUAL
28 pages
CD Lab Prgms Final
No ratings yet
CD Lab Prgms Final
42 pages
Cdlab UPDATED
No ratings yet
Cdlab UPDATED
43 pages
CD Lab Manual
No ratings yet
CD Lab Manual
48 pages
1to10
No ratings yet
1to10
16 pages
CD Lab Manual File
No ratings yet
CD Lab Manual File
27 pages
CD File - Merged
No ratings yet
CD File - Merged
52 pages
20DCS020 DLP
No ratings yet
20DCS020 DLP
29 pages
CD Lab Manual
No ratings yet
CD Lab Manual
31 pages
Compiler Design Record Old
No ratings yet
Compiler Design Record Old
43 pages
CD Lab Manual - Word
No ratings yet
CD Lab Manual - Word
42 pages
SP File
No ratings yet
SP File
38 pages
Compiler Design Lab Manual
No ratings yet
Compiler Design Lab Manual
51 pages
CITY CD Lab
No ratings yet
CITY CD Lab
50 pages
CD lab (1)
No ratings yet
CD lab (1)
27 pages
CD_LAB_MANUAL
No ratings yet
CD_LAB_MANUAL
37 pages
Compiler Record
No ratings yet
Compiler Record
42 pages
CD Lab-1
No ratings yet
CD Lab-1
34 pages
Practical File: Be (Cse) 6 Semester
No ratings yet
Practical File: Be (Cse) 6 Semester
54 pages
ss_Ass[1]
No ratings yet
ss_Ass[1]
18 pages
Compiler Design labs
No ratings yet
Compiler Design labs
25 pages
Compiler Isha
No ratings yet
Compiler Isha
30 pages
DA1_cd_22BLC1161
No ratings yet
DA1_cd_22BLC1161
18 pages
CD_LAB
No ratings yet
CD_LAB
36 pages
CDSS Lab Programs 1-11
No ratings yet
CDSS Lab Programs 1-11
27 pages
3 - Lexical Analysis (Compatibility Mode) PDF
No ratings yet
3 - Lexical Analysis (Compatibility Mode) PDF
28 pages
Compiler Design Lab
No ratings yet
Compiler Design Lab
27 pages
CD Lab File
No ratings yet
CD Lab File
45 pages
Compiler_Lab_Experiments[1]
No ratings yet
Compiler_Lab_Experiments[1]
24 pages
1 Updated GaganCD
No ratings yet
1 Updated GaganCD
26 pages
Compiler Construction Practical List
No ratings yet
Compiler Construction Practical List
16 pages
CSF-401 LabRecord AnkushNegi 1000015073
No ratings yet
CSF-401 LabRecord AnkushNegi 1000015073
16 pages
Microsoft Word - Lab - Compiler 6-10
No ratings yet
Microsoft Word - Lab - Compiler 6-10
5 pages
Compiler Lab
No ratings yet
Compiler Lab
28 pages
Experiment - 1: Aim-Develop Lexical Analyzer To Recognize Few Patterns
No ratings yet
Experiment - 1: Aim-Develop Lexical Analyzer To Recognize Few Patterns
25 pages
Compiler Design Lab
No ratings yet
Compiler Design Lab
26 pages
Module1PPT (1)
No ratings yet
Module1PPT (1)
83 pages
Compiler Design: Aim: Program For 3-Address Code
No ratings yet
Compiler Design: Aim: Program For 3-Address Code
20 pages
Compiler Design File
No ratings yet
Compiler Design File
32 pages
Practical File: Department of Computer Science and Engineering
No ratings yet
Practical File: Department of Computer Science and Engineering
32 pages
CD Lab
No ratings yet
CD Lab
34 pages
Week 2a &2B
No ratings yet
Week 2a &2B
6 pages
Question Bank NLP
100% (1)
Question Bank NLP
11 pages
Input Buffering
100% (2)
Input Buffering
11 pages
5.Tokens, Patterns, and Lexemes
No ratings yet
5.Tokens, Patterns, and Lexemes
7 pages
PoPL Lecture 3
No ratings yet
PoPL Lecture 3
31 pages
Compiler Design i Msc (1)
No ratings yet
Compiler Design i Msc (1)
160 pages
B.Tech in Computer Science and Business System Third Year 2023 24
No ratings yet
B.Tech in Computer Science and Business System Third Year 2023 24
69 pages
System Software 18csl66 - Ss and Os Lab Manual
No ratings yet
System Software 18csl66 - Ss and Os Lab Manual
117 pages
CSA14-Compiler Design Analytical Questions
No ratings yet
CSA14-Compiler Design Analytical Questions
22 pages
"Resume Ranking Using NLP and Machine Learning": Bachelor of Engineering
No ratings yet
"Resume Ranking Using NLP and Machine Learning": Bachelor of Engineering
41 pages
Mikrobasic Dspic Manual v100
No ratings yet
Mikrobasic Dspic Manual v100
726 pages
Poetry Thesis Statement
100% (2)
Poetry Thesis Statement
8 pages
Lecture#5 - Chap#2 (Syntax Directed Translator (Part-I) )
No ratings yet
Lecture#5 - Chap#2 (Syntax Directed Translator (Part-I) )
34 pages
Programs
No ratings yet
Programs
29 pages
WR Compiler Error Messages Reference 5.6
0% (1)
WR Compiler Error Messages Reference 5.6
142 pages
SCOA021 Supp Exam Paper 2023
No ratings yet
SCOA021 Supp Exam Paper 2023
9 pages
Comparison of Tokenizer Method
No ratings yet
Comparison of Tokenizer Method
17 pages
BTech CSE 5th Sem Syllabus
No ratings yet
BTech CSE 5th Sem Syllabus
17 pages
Tagged Back-Translation: Isaac Caswell, Ciprian Chelba, David Grangier Google Research
No ratings yet
Tagged Back-Translation: Isaac Caswell, Ciprian Chelba, David Grangier Google Research
11 pages
Systematically Finding Security Vulnerabilities in Black-Box
No ratings yet
Systematically Finding Security Vulnerabilities in Black-Box
14 pages
Solid Cuts in MSSO
100% (1)
Solid Cuts in MSSO
5 pages
03 Lex Analysis
No ratings yet
03 Lex Analysis
61 pages
CSB353: Compiler Design Lab: Project Report
No ratings yet
CSB353: Compiler Design Lab: Project Report
15 pages
Java MP
No ratings yet
Java MP
14 pages
Cse 6TH Sem Syllabus
No ratings yet
Cse 6TH Sem Syllabus
9 pages
Cse309 3
No ratings yet
Cse309 3
101 pages
LEXand YACC
No ratings yet
LEXand YACC
21 pages
Depression Detection Emotion AI
No ratings yet
Depression Detection Emotion AI
5 pages
Compiler Writing Tools
100% (2)
Compiler Writing Tools
17 pages
Comp309 PDF
No ratings yet
Comp309 PDF
4 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)

cc-lab2

Uploaded by

cc-lab2

Uploaded by

CS F363 Compiler Construction

Second Semester, 2024-25

The objectives of this lab are the following.

2. To understand and implement Deterministic Finite Automata (DFA) and

The toy language is made up of the following alphabet.

Variables and Identifiers

Invalid variable names: _sum, sum_curr_total, 1sum;

The toy language provides the following keywords:

Keywords cannot be used as variable names.

A snippet of the implementation is below (you must complete the code).

// Token structure definition

token newToken(const char *type, const char *value) {

Sample input / output:

alpha | digit alpha | digit

Sample input / output:

In this toy language, only the following words are reserved as

keywords. int, char, float, if, else, while, for, main

4 Bigger Task: Lexer for the toy language

The output of the above program must be:

You might also like

token newToken(const char type, const char value) {