0% found this document useful (0 votes)
3 views

assignment1

This document outlines the requirements for Assignment 1 of CS 335 for the semester 2023-2024-II, due by January 28, 2024. It specifies that students must complete the assignment individually using Flex for creating lexical analyzers for two programming languages: Kanpur and Fortran 2008, detailing submission guidelines, evaluation criteria, and the structure of the required files. The document also includes specific lexical specifications, keywords, operators, and rules for both languages that must be adhered to in the assignment.

Uploaded by

Aman Arya
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

assignment1

This document outlines the requirements for Assignment 1 of CS 335 for the semester 2023-2024-II, due by January 28, 2024. It specifies that students must complete the assignment individually using Flex for creating lexical analyzers for two programming languages: Kanpur and Fortran 2008, detailing submission guidelines, evaluation criteria, and the structure of the required files. The document also includes specific lexical specifications, keywords, operators, and rules for both languages that must be adhered to in the assignment.

Uploaded by

Aman Arya
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

CS 335 Semester 2023–2024-II: Assignment 1

14th January 2024

Due Your assignment is due by Jan 28 2024 11:59 PM IST.


General Policies

• You should do this assignment alone.

• Do not plagiarize or turn in solutions from other sources. You will be penalized if caught.

Submission

• Your solution should use Flex. We recommend using flex and not flex++. You can include
C++ STL code in the flex specification, and use g++ to compile the generated C file.
Do NOT use alternate software like RE/flex, JFlex, and PLY for this assignment.

• Submission will be through Canvas.

• Upload a zip file named “hrolli-assign1.zip”. The zipped file should contain a directory
hrolli-assign1, two sub-directories problem1 and problem2, and the following files:

(i) The Flex specification files,


(ii) A PDF file that includes compilation and execution instructions. You may optionally
include other relevant details (e.g., corner cases).

• We encourage you to use the LATEX typesetting system for generating the PDF file.

• You will get up to two late days to submit your assignment, with a 25% penalty for each
day.

Evaluation

• Write your code such that the exact output format (if any) is respected.

• We will evaluate your implementations on a Unix-like system, for example, a recent Debian-
based distribution. Check for compilation and linking errors.

• We will evaluate the implementations with our own inputs and test cases (where applicable),
so remember to test thoroughly.

• We will deduct marks if you disregard the listed rules.

1
Problem 1 [40 marks]

The following is the lexical specification for a new language “Kanpur”. The file extension for
Kanpur programs is .knp. Write a lexical analyzer (i.e., scanner) for Kanpur programs using Flex.
Name the flex specification prob1.l. Any illegal character encountered should be reported along
with the line number in the source code.

Keywords ARRAY BEGIN BOOLEAN COMMENT CONTINUE DO DOUBLE ELSE END FALSE FOR IF INTEGER
LABEL LIST LONG OWN PROCEDURE STEP SWITCH THEN TRUE UNTIL VALUE WHILE
Operators AND OR LEQ LT GEQ GT NOT EQL NEQ := + − * / % ^ | &
« »
Identifier A valid identifier begins with a letter and is followed by any number of occurrences of
digits and letters.
Strings A string literal is a collection of one or more ASCII characters encapsulated either in
single quotes or double quotes. A string literal with double quotes cannot have single quotes
included in the literal, and vice versa. Strings can be multiline.
Delimiters ; : , ' " [ ] { } ( )
The character ' is a single quote and " is a double quote.
Numeric literals Numeric literals can be decimal (base 10) or hexadecimal (base 16), and are
whole numbers. Leading 0s are not allowed. For example, 0 is valid but numbers of the form
0001 and 00 are invalid. Hex literals start with 0x or 0X followed by one or more hex digits.
As before, 0x0 is valid, but 0x00 is invalid.
A floating point number is formed by one or more digits before the decimal point followed
by at least one and up to six digits after the decimal. For example, floating point numbers
of the form .01, 1., and 123.4567890 are ill-formed. Floating point numbers are only in base
10.
Comments Comments are any text between { and }. The first occurrence of the terminating
character } ends the comment; comments can be multiline but cannot be nested.
White Space White space is defined as the ASCII space character, horizontal tab character, form
feed character, and line terminator characters.
Blanks, tabs, ends of lines, and comments are considered to be delimiters. The Lexical
Analyzer consumes these but does not return anything.
Other rules • Keywords and operators are case-insensitive (i.e., UNTIL and until have the
same meaning).
• All the numbers are unsigned (i.e., whole numbers).

The full set of tokens are KEYWORD, OPERATOR, IDENTIFIER, STRING, DELIMITER, IN-
TEGER, FLOATING_POINT, and HEXADECIMAL. Check the public test cases to learn about the
usage. The meaning of the programs is not important for this assignment.

• The output should list and classify all unique lexemes into proper syntactic categories (as
enumerated above).

2
• The output should be sorted by LEXEME (do not change case while printing). Follow the
conventions in the sample output. Not abiding by the output rules will attract a minimum
of 20% marks as a penalty.

• Your scanner can terminate after reporting the first error. You may optionally continue
beyond the first error.

Example. The output for test case public1.knp is as follows.

TOKEN COUNT LEXEME


STRING 1 "x is greater than y"
STRING 1 "y is greater than x"
DELIMITER 2 (
DELIMITER 2 )
DELIMITER 1 ,
INTEGER 1 10
INTEGER 1 20
OPERATOR 2 :=
DELIMITER 5 ;
KEYWORD 1 BEGIN
KEYWORD 1 ELSE
KEYWORD 1 END
OPERATOR 1 GT
KEYWORD 1 IF
KEYWORD 1 INTEGER
IDENTIFIER 2 PRINT
KEYWORD 1 THEN
IDENTIFIER 3 x
IDENTIFIER 3 y

Problem 2 [60 marks]

Write a lexer for Fortran 2008. Name the flex specification prob2.l. Letters, digits, underscore,
and special characters (Table 3.1) make up the Fortran character set. Any illegal character en-
countered should be reported along with the line number in the source code.
We list the constraints on the specification in the following. Refer to Chapter 3 in the linked
PDF for more details.

Keywords allocatable allocate assign associate asynchronous backspace bind call case
class close common contains continue cycle codimension contiguous critical data
deallocate deferred dimension do else elsewhere end endfile endif entry equivalence
exit? external enum enumerator extends forall final flush format function goto
generic import if implicit inquire intrinsic include interface intent lock module
nopass namelist nullify open only operator optional parameter pause print program
pointer private pass protected procedure public read return recursive result rewind
rewrite save stop subroutine select sequence submodule target then use value unlock
volatile while write.

3
Names A valid name begins with a letter and is followed by more letters, digits, or underscore.
The maximum length of a name is 63.

Literals Consider only the rules for int-literal-constant, real-literal-constant, and logical-literal-
constant (Section 3.2.3). A string literal is a collection of one or more characters encapsulated
in double quotes. A string literal with double quotes cannot have single quotes included in
the literal. Strings can be multiline.

Operator = + − ∗ / < > % &


In addition, consider only the rules defined in intrinsic-operator (Section 3.2.4).

Delimiters ( ) / [ ] (/ /)

Comments If the first non-blank character on a line is an “!”, the line is a comment (Section
3.3.2.3). The character “!” initiates a comment except where it appears within a string. The
comment extends to the end of the line.

Special Characters All other characters that are listed in Table 3.1 but not mentioned above.

Other rules • Assume that keywords are reserved, and cannot be used as identifiers.
• Extend the operators to include && « » ||
• Fortran is case-insensitive, i.e., keywords and variables written in any case combination
are equivalent. Only strings are case-sensitive.
• Ignore labels (Section 3.2.5).
• Ignore nuances based on fixed-format layout.

Your lexer should tokenize input Fortran programs into the given token categories, and print
the count of each token along with the category. The full set of tokens are KEYWORD, NAME,
INT_LITERAL, REAL_LITERAL, LOGICAL_LITERAL, CHAR_LITERAL, OPERATOR, LABEL, DE-
LIMITER, and SPECIAL_CHAR.
The output requirements are the same as in Problem 1.

You might also like