0% found this document useful (0 votes)
2 views

Lab Assgn4b

The document outlines the CS 302 Compiler Design Laboratory assignment scheduled for February 2025, focusing on writing a lexical analyzer for X86-64 bit assembly language. Students are required to identify tokens in assembly programs, create lex scripts, and generate statistical data from assembly code. The lab includes practice problems that involve optimizing and unoptimizing assembly code, along with detailed instructions and attachments for reference.

Uploaded by

C30 Md arbab
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lab Assgn4b

The document outlines the CS 302 Compiler Design Laboratory assignment scheduled for February 2025, focusing on writing a lexical analyzer for X86-64 bit assembly language. Students are required to identify tokens in assembly programs, create lex scripts, and generate statistical data from assembly code. The lab includes practice problems that involve optimizing and unoptimizing assembly code, along with detailed instructions and attachments for reference.

Uploaded by

C30 Md arbab
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

CS 302 : COMPILER DESIGN LABORATORY Scheduled : Feb 07 & 10, 2025

Premises : Lab 4 [VIA], Lab 1 [VIB] and Lab 7[VIC]

Faculty : Bikas Kanti Sarkar, Indrajit Mukherjee, Prashant Pranab, Supratim Biswas
Teaching Assistant : Swarna Aishwarya Twinkle (Research Scholar)

Lab Assignment 4 :
Objective : To write a lexical analyzer (or tokenizer) for X86-64 bit assembly language programs. The purpose
is to acquire skills to process assembly code of a contemporary real machine, such as the X86-64 bit
architecture, which is probably the same on which you are performing your experiments.

General Instructions :
1. You will need to identify tokens in an assembly program of X86 64 bit architecture. Read the handout to
understand the token structures so that you can write regexes for them in Lex.
2. Test your scanner design with the test cases given, and follow with your cases created by you.

Practice Problems

P1. You have to do the following activities, a) examine the assembly code generated by gcc, b) identify patterns
in the assembly code of interest, c) write a lex script that identifies the patterns, and d) finally produce some
statistical data about the various elements in the assembly code.

You can start with a C program of your own or use the program, “testprog1.c”, given to you. Let the generated
assembly code be named as “testprog1_opt.s”. Check the assembly code on your server as it may be slightly
different than that given below which was produced on the instructor’s laptop.

Source Program Tasks to be performed


// C program named as testprog1.c 1. $ gcc testprog1.c -O2 -S
#include <stdio.h> to generate assembly code using level 2 optimization.
int main() 2. $ mv testprog1.s testprog1_opt.s
to rename the assembly code
{ int a[100], i, j;
3. Write a lex script, “assembly-scanner1.l” and generate an executable,
a[0] = 10; a[1] = 20; say, scanass1
for (i = 2; i < 100; i++) 4. Run the scanner over the assembly to generate its output.
{ a[i] = a[i-1]*i + a[i-2]*(i-1);} $ ./scanass1 < testprog1_opt.s > test1out
printf(" a[1] : %d \n", a[1]); 5. Check that there are no unrecognized characters in “test1out”. Compare
return 0; your “test1out” with the output file provided to you as “desired-output”.
} Make changes to the lex script till your output is same as the “desired-
output”.
6. Note that desired-output gives the count of assembler directives but not
the directives themselves. Modify the lex script to print the directives also.
Attachments : 1. testprog1.c 2. testprog1_opt.s 3. desired-output 4. desired-output-directives

P2. Repeat the experiment stated in P1 above, by generating unoptimized assembly code of the source,
testprog1.c and use “scanass1” to tokenize the unoptimized assembly. Determine if some assembly instructions
are reported as unrecognized and fix the lex script to remove these anomalies.
Attachment : 1. testprog1.s 2. desired-output-unopt

SP2025/Compiler Design Lab / Lab 4 : Tokenizer for X 86 AssemblyBKS+IM+PP+SB / 1


P3. The lex scripts of P1 and P2 are primitive in the sense that for assembly instructions of the form given
below, the scanner possibly generates tokens of the following form, given in column 2. However we wish that
the tokenizer determine the operand type
Assembly instruction Tokenization Desired Tokenization
movq %rax, -8(%rbp) Quad Word Instruction : movq Quad Word Instruction : movq
64 bit Register operand : %rax 64 bit Register operand : %rax
DELIMITER , DELIMITER ,
Number : -8 Register indirect displacement
LEFT PAR : ( Operand : -8(%rbp)
64 bit Register operand : %rbp
RIGHT PAR : )
movl $10, -416(%rbp) Long Word Instruction : movl Long Word Instruction : movl
Immediate operand : $10 Immediate operand : $10
DELIMITER , DELIMITER ,
Number : -416 Register indirect displacement
LEFT PAR : ( Operand : -416(%rbp)
64 bit Register operand : %rbp
RIGHT PAR : )
movl -416(%rbp,%rax,4), %eax Long Word Instruction : movl Long Word Instruction : movl
Number : -416 register indirect scaled index
LEFT PAR : ( operand : -416(%rbp,%rax,4)
64 bit Register operand : %rbp DELIMITER ,
DELIMITER , 32 bit Register operand : %eax
64 bit Register operand : %rax
DELIMITER ,
Number : 4
RIGHT PAR : )
DELIMITER ,
32 bit Register operand : %eax

End of Lab Experiment 4

SP2025/Compiler Design Lab / Lab 4 : Tokenizer for X 86 AssemblyBKS+IM+PP+SB / 2


APPENDIX
Table 1 : Summary of X86-64 bit Assembly Instructions

X86-64 bit Architecture Clarification

16 registers of 64 bit %rip (instruction pointer), %rsp (stack pointer), %rbp (base pointer on
stack), "%rax, %rdi, %rsi, %rdx, %rbx, %rcx, %r8, %r9, %r10, %r11,
%r12, %r13, %r14, %r15
32 bit Registers %esp (stack pointer), %ebp (base pointer on stack), "%eax, %edi, %esi,
%edx, %ebx (replace the prefix, ‘r’ by ‘e’
Common instruction op codes sub, mov, xor, lea, add, cmp, imul, sal, shr, and, or ,not
Other assembly instructions endbr64, call, ret
Jump instructions jmp, je, jne, js, jns, jg, jge, jl, jle
Stack instructions push, pop, call, ret
Assembly instructions vary in length We add a suffix {‘b’, ‘w’, ’l’ or ‘q’} to denote the length {1, 2, 4, 8} in
from 1 to 8 bytes bytes. For instance, movb, movw, movl, movq are instances of the move
instructions.
Assembly code starting with a dot (.) These have to be recognized but no action taken, except count of
followed by some standard occurrences
keywords, such as “file”, “string”,
and many others
Labels of an assembly code .LX where X ≥ 1 alphanumeric characters denotes a Label
Operand formats : few are given in 1. $number : report as “immediate operand”
the second column. Identify 2. register : report as “register operand”, special mention for %rsp, $rbp
operands that are not captured by the and %rip
above patterns. 3. (register) : report as register indirect operand
4. num(%register) : report as “register indirect displacement operand”
5. num(register, register, num) : report as “register indirect scaled index
operand”

SP2025/Compiler Design Lab / Lab 4 : Tokenizer for X 86 AssemblyBKS+IM+PP+SB / 3

You might also like