100% found this document useful (1 vote)
877 views

Chap 2 - Assemblers

The document discusses the basics of assemblers, including: 1) An assembler translates assembly language code into machine language by replacing mnemonics with opcodes and symbols with addresses. 2) Assembly language programs use mnemonics, symbols, literals, and directives to make code more readable and modular. 3) Assemblers use symbol tables, literal tables, and opcode tables to translate code during a one-pass or two-pass assembly process.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
877 views

Chap 2 - Assemblers

The document discusses the basics of assemblers, including: 1) An assembler translates assembly language code into machine language by replacing mnemonics with opcodes and symbols with addresses. 2) Assembly language programs use mnemonics, symbols, literals, and directives to make code more readable and modular. 3) Assemblers use symbol tables, literal tables, and opcode tables to translate code during a one-pass or two-pass assembly process.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 50

Chapter 2:- Assemblers

Topics in this Chapter:

Definition of an Assembler
Basic functions of Assembler
Features and Elements of Assembly Language Programming
Forward Reference Problem and Solutions to it
Multi-pass (2-Pass) Assembler Design
1-Pass Assembler Design
SPARC Assembler
Possible Errors during assembly of a program

2.1 What is an Assembler?

An Assembler is a language translator that accepts an assembly language program as input and
produces its machine language equivalent as output, along with information required by the
Loader.

Assemb Machine
ly Assembl Language
Langua er Equivalent,
ge along with
Progra information
m Databas required by
Loader
es

Fig. 2.1: An Assembler

1
2.2 Elements of Assembly Language Programming

2.2.1 Structure of an ALP Statement

An ALP Statement has following structure:

[label:] mnemonic [operands] [;comments]

Label is used to assign a symbolic name for a memory location; it is optional.


Mnemonic is a human-readable keyword for the binary opcode of an instruction.
Operands are the data that the instruction operates upon. Operands may be direct values (literals)
or symbolic reference to stored data (symbols). Some instructions do not have operands, as the
opcode implicitly specifies the operands.

The primary function of assembler is to replace the mnemonic with its equivalent binary opcode
and replace the operand symbols and literals with their storage addresses.

Comments are for programmer to understand the meaning of the ALP statement in context with
the remaining code. The Assembler must ignore them as they are not for actual processing.

2.2.2 Literals, Symbols and Procedures

Literals are plain values that are not associated with any variable name. They are often used by
programmers as ad-hoc values that are not needed throughout the program, but only for a few
statements. An Assembler handles these literals by using a separate data structure called Literal
Table (LT).

For example, an integer abc is to be increased by 4 units, we write the statement as abc=abc+4;
Here, it is assumed that 4 is not needed again in the program. So, the value 4 is used an integer
literal here. (Similarly if we consider strName = John, here John is a string literal)

2
Symbols are the variable names that hold values needed throughout the program. The Assembler
stores these symbols and their definition addresses in a data structure called as the Symbol Table
(ST).

Symbols are also referred to as symbolic references, because they are practically implemented as
references to memory locations where their values are stored.

Symbols can be used for Labels and Operands. The function of Assembler is to search the ST when
a symbol is found in the program and replace it with its definition address, where the value is
stored.

For example, an integer abc is increased by 4 units once, and then this value 4 is incremented and
added repeatedly to abc in a loop. Here, value 4 is used once but its referred from inside a loop
again and again. So, we define a variable (i.e. a symbol) incr = 4 and then this variable is used to
increment the value repeatedly inside the loop by using incr = incr + 1.

Procedures are nothing but functions in ALP; they are used when a relatively large number of
statements are to be used repeatedly in the program.

When a procedure is called, current execution state of the program is pushed onto stack. When the
procedure returns, the execution state is popped out of the stack and the calling program is
resumed.

Procedure names are also handled by Assembler as symbols.

2.2.3 Types of ALP Statements

ALP statements can be broadly categorized into following 3 types:


1. Assembler Directive statements
2. Declarative Statements
3. Imperative Statements

3
1. Assembler Directive Statements

Exam Question:
Q) What are Assembler Directives? Explain with examples (May
06 [Comps] 6M, Dec 08 [Comps] 5M)

These statements direct the assembler to take the action associated with it. They are not a part of
executable instructions. So, they are not included in the final object file generated by assembler as
output

Following are some examples of assembler directives:


i. START is a statement that directs the assembler to store the program at the given address
in the memory. If its operand field is blank or 0, the assembler can arbitrarily choose the
location of program in the memory.

ii. ORG has a similar purpose as START, but ORG is included in (and is implicitly associated
with) the program, whereas START is used independently of a program. So, START
statement must also include the program name in its label field.

E.g:
PG1 START 2000 ; Start storing the program PG1 from address 2000

ORG 3000 ; Start storing the current program (in which ORG is used) from address 3000

iii. USING is an assembler directive to indicate which register is to be used as base register
and what will be its value for the base.

E.g: USING 1500,15 ; Use register 15 as base register and 1500 as its value for the base

iv. DROP is an assembler directive used to drop a base register that has been allocated by a
USING statement

E.g: DROP 15 ; Drop register 15 as the base register.

4
v. EQU is an assembler directive used to equate a symbolic name to a value and make program
more readable. Whenever a symbol is defined using EQU directive, no memory is allocated
to it; only an entry is made in symbol table (ST).

E.g:
SUNDAY EQU 1
MONDAY EQU SUNDAY + 1

Now, wherever in the program we use the symbols SUNDAY and MONDAY, assembler will
replace them by 1 and 2 i.e. their respective equated values.

2. Declarative Statements These are special cases of assembler directives; they are used to
declare symbols and associated them with values. For example, var1 DW A43B is used to declare a
symbol var1 with hex value A43B; DW tells the assembler that var1 uses a 16-bit word in the
memory.

3. Imperative Statements They are simply instructions to be executed by the machine.


Assembler translates them into equivalent binary opcode (along with operands, if any) and puts
them into an output object file. For example, ADD A,B is an imperative statement. The assembler
will replace ADD with its binary opcode; A & B are symbols which will be replaced with their
definition addresses.

2.3 Features of ALP

Assembly Language (often referred to as middle-level language) provides following features to


overcome problems faced with pure machine-language code:

1. Use of Mnemonics to specify Opcodes makes the assembly language program much more
readable and debugging is also easier.

2. Use of Symbols to specify Operands means that program can be modified with no
overhead. That is, if definition address of a symbol changes, the change in ALP is done
5
only at the place where the symbol is declared; all the places where the symbol has been
used need not be updated.

3. Separation of Code and Data Segments allows the programmer to keep aside some
portion of memory for the data to be used by the program.

2.4 Basic Data Structures of an Assembler

Assembler needs following basic data structures (also called as databases) as input:

1. ALP Source Code ALP statements are contained in the source file. For example,
A sample ALP code for adding 2 numbers can be given as follows:

START 4000 ; Start storing the program from location 4000


LOAD 1000 ; Load data at location 1000 into register A
MOV B,A ; Copy contents of Register A into register B
LOAD 2000 ; Load data at location 2000 into register A
ADD B ; Add content of register A with that of B, store the sum in A
STORE 3000 ; Store the sum from register A at location 3000.
END ; End of Program

2. Mnemonic Opcode Table (MOT) All different machine architectures have their own set
of mnemonics to be used for their respective assembly language (So, ALP of one machine
cannot run on another machine).

MOT is a fixed length table defined by programmer as per assembly language of the
underlying machine. During translation, the assembler searches for the mnemonic in MOT
and replaces it with the associated binary opcode given in MOT.

6
The structure of MOT is as follows:

Mnemonic Opcode Binary Opcode Number of Length of Instruction (LOI)


Operands

. . . .
. . . .
. . . .

So, the MOT for the above example can be as follows:

Mnemonic Opcode Binary Opcode Number of Length of


Operands Instruction (LOI)
(in bytes)
ADD 01 1 2
SUB 02 1 2
MULT 03 1 2
LOAD 04 1 2
STORE 05 1 2
MOV 06 2 2
JMP 07 1 2
STOP 08 0 1

3. Pseudo-Opcode Table (POT) POT is also a fixed length table defined by programmer as
per assembly language of the underlying machine. While translation, the assembler
searches for the pseudo-opcode in POT and takes necessary action associated with it.

7
Structure of POT is as follows:

Pseudo-Opcode Number of
Operands

. .
. .
. .

So, the POT for ALP in above example, can be as follows

Pseudo-Opcode Number of
Operands
DB 1
DW 1
DD 1
CONST 1
START 1
LTORG 1
ENDP 0
END 0

[Note that these MOT and POT are only for a Hypothetical machine (and not for any specific
machine like 8085/8086); the ALP given in above example uses only a few of these mnemonics
and pseudo-ops]

4. Symbol Table (ST) ST is used to keep a track of symbols assigned to variables being
used in the program.

When a symbol gets defined, Assembler makes its entry into the Symbol Table (ST) along
with its definition address. When a symbol is used in an instruction, the Assembler first
verifies validity of the symbol using ST and if validation is successful, definition address of
symbol is written into output file.

(Note that, ST is not language-specific like MOT / POT; ST is program specific i.e. every
program has its own ST).
8
Structure of ST is as follows:

Symbol Name Type Address

. . .
. . .
. . .

For example, consider the sample ALP code as given below


(memory location)
1000 LOAD A
1002 up1: SUB B
1004 JNZ up1 ; Jump if not zero to location of label up1
1006 STORE A
1008 STOP ; program execution ends
1009 ENDP ; code segment ends
1010 A1 DB 04 ; Define a Byte for variable A1 with value 04
1011 B1 DB 04 ; Define a Byte for variable B1 with value 04
1012 END

So, ST for this example can be as follows:

Symbol Name Type Address


up1 LABEL 1008
A VAR 1016
B VAR 1017

5. Literal Table Assembler tracks the usage of literals in a program through a Literal Table
(LT).

When a literal is first encountered in a program, its entry is made into LT along with its
usage address, but the definition address is not updated. Now, with each different usage of

9
same literal, assembler inserts the usage address into the field accordingly. But these literals
do not get defined until EOF is reached.

Once EOF is reached, all the literals in LT get defined in the area after the end of program
and their definition address is updated in LT. This area at the end of program where literals
get defined is often called as a Literal Pool

Structure of LT is given as follows:

Literal Notation Value Usage Address Definition Address

. . . .
. . . .
. . . .

For example, consider the sample ALP code as given below


(memory location)
1000 LOAD A
1002 ADD =4
1004 STORE A
1006 LOAD B
1008 ADD =4
1010 SUB =5
1012 STORE B
1014 STOP ; program execution ends
1015 ENDP ; code segment ends
1016 A DB 04 ; Define a Byte for variable A with value 04
1017 B DB 04 ; Define a Byte for variable B with value 04
1018 C DB 06 ; Define a Byte for variable C with value 06
1019 END ; End of program

So, the LT for above example is given as follows:

10
Literal Notation Value Usage Address Definition Address
=4 4 1003, 1009 1020
=5 5 1011 1021
(Note: these are literal pool
addresses)

2.5 Forward Reference & Forward Reference Problem (FRP)

Exam Questions:
Q) State the reason for assembler to be a multi-pass program. (May
05 [Comps] 4M, May 04 [IT] 4M, Dec 06 [Comps] 4M)

Q) Explain Forward Reference Problem in Assembler. (May 05 [IT]


4M, Dec 05 [IT] 4M)

Q) What is Assembly Language? What feature of assembly language


required us to build a two-pass assembler? (Dec 05 [Comps] 10M)

Q) What is Forward Reference Problem? How is it handled in a two-


pass assembler? Explain with the help of database. (June 07 [IT]
10M)
(Hint: For this question, explain FRP and also explain how ST is used by a
two-pass assembler. Give a sample ALP program, construct its ST stating
that it is the output of pass 1 and use it to assemble the program in Pass 2.
Give final output file only, all the tables not needed)
When a symbol is used before it is defined, such a reference is called as Forward Reference. Since
the symbol has not yet been defined, its address is not known. So, the Assembler cannot assemble
the entire instruction at once. This is called as Forward Reference Problem (FRP).

[NOTE: These tables and algorithms in this book are for a hypothetical machine. So some major
changes might be required, for actual implementation of Assembler (or even simulation of these
algorithms)]

2.6 Solutions to FRP

FRP can be solved in following 2 ways:


1. Multi-pass Approach
2. Single-Pass approach

11
In a Multi-pass approach, assembler assembles the mnemonics and constructs ST in pass 1. In the
Pass 2, Input file is read again along with ST and an assembled object code file is generated in the
output. This is a 2-Pass Approach, but some complex programs may need more than just 2 passes
for completing the assembly of the program. Such an approach is called Multi-pass Approach.

In a Single-pass approach, the Assembler makes use of a special data structure called Forward-
Reference Table (FRT) to keep track of only the forward-referenced symbols. At the end of pass 1,
this new data structure is used to update definition addresses of forward-referenced symbols in ST
and in the program.

2.7 Multi-Pass Assembler Design

Exam Questions:
Q) Explain with neat flowchart and database working of two-pass
assembler. (Dec 04 [IT] 12M, Dec 07 [IT] 10M, June 08 [Comps]
10M, Dec 08 [Comps] 10M, June 08 [IT] 10M) [Often asked as a
compulsory question]

Q) Explain working of each of the two passes of a two-pass


assembler, with the help of databases. (Dec 04 [Comps] 10M, Dec
05 [IT] 10M)

Q) Give the analysis and design of a two-pass assembler w.r.t

When an Assembler needs more than one pass through the input program to complete the
assembly of ALP, its called a Multi-pass Assembler.

In general, not more than 2 passes are actually required for complete assembly of an ALP.
[Complex programs requiring more than 2 passes for assembly are beyond the scope of our
syllabus (and this book). So we are essentially looking at a Two-Pass Assembler Design]

12
Symbol Table
(ST)

Pass 1 Pass 2

Fig. 2.2: Basic Schematic of a Two-pass Assembler

Pass 1 is used collect all symbol definitions (for variables and labels) in ST. In Pass 2, the source
file is read again and since addresses of all symbols are now known through ST, all instructions
get fully assembled and a final object file is generated as output.

2.7.1 Pass 1 of a 2P Assembler

2.7.1.1 Pass 1 Databases

Following data structures (or databases) are needed in Pass 1:


1. ALP Source Code File (Input)
2. Machine Opcode Table (MOT)
3. Pseudo-Opcode Table (POT)
4. Symbol Table (ST)
5. Literal Table (LT)

Additionally, following pointers are also used:


6. MOT_PTR (to keep track of location being read from MOT)
7. POT_PTR (to keep track of location being read from POT)
8. ST_PTR (to keep track of location being read from/written into ST)
9. LC (Location Counter is needed to keep track of which location from source file is being
read)
10. CODE_FLAG (to indicate where code segment ends and data segment starts)

13
A Symbol Table (ST) containing all symbol definitions used in programs, is generated at the end of
Pass 1.

2.7.1.2 Overall Pass 1 Flowchart:

Actions performed by a 2-Pass Assembler in Pass 1 can be summarized in the following


flowchart:
(Pass 1 flowchart fig. 2.3 here)

[For the purpose of simplicity, we break down algorithm based on how each different element of
an ALP statement is processed by the assembler, in each pass]

(PLEASE NOTE:
1. From here onwards in this book, NEXT STEP refers to the next LOGICAL step. So,
after processing Label, if there is a mnemonic, next step will be processing that
mnemonic. If there is no mnemonic, next line will be read. Similarly after mnemonic,
if the instruction has an operand, then next step is processing those operands. And
obviously, after operands if there are any comments they are ignored and next step is
reading the next line.
2. In exam, you need not draw these smaller flowcharts; they are meant only for
understanding. The main overall flowcharts of the each pass are enough. But make
sure you explain clearly how each element is processed).

2.7.1.3 Initializations in Pass 1:

Before Pass 1 begins following initializations are performed:


1. Source file is opened in Read mode.
2. All tables (MOT, POT, ST, and LT) get initialized.
3. Respective pointers are initialized to point to their first entries.
4. Location Counter (LC) is set initialized to 0.
5. Error flag is set off, Code_flag is set to 1.

2.7.1.4 Handling Labels using ST in Pass 1:


14
When a label definition is encountered in the program, the assembler searches for it in ST. If an
existing label already exists, then it is a case of duplicate label definition. So the error flag is
turned on and Duplicate Label error is displayed. If the label is not found in ST, then it entry is
made into ST and definition address gets updated. (Label references or Label calls are treated
as symbol references by Assembler)

Search Label in
ST
Label
Definition

YES NO Insert Label in ST ,


Label Type LABEL and
Found Definition Address
? LC

Set Error flag


ON

Display Error
Duplicate NEXT
Label STEP

Fig. 2.4: Processing Labels in Pass 1

15
2.7.1.5 Handling Mnemonics in Pass 1 :

A mnemonic is searched in MOT. If found, only LC is incremented by length-of-instruction


(LOI), since purpose of pass 1 is only to prepare ST. (If not found, error flag is set ON and
Instruction not found error can be displayed)

Search Mnemonic
in MOT
Mnemon
ic
Search
Mnemo
nic in LC LC +
MOT LOI

NEXT
STEP

Fig. 2.5: Processing Mnemonics in Pass 1

2.7.1.6 Handling Operands in Pass 1 :

If an instruction has operands, the assembler first checks if the operand is a Literal or not.

If operand is a Literal, it is searched in LT. If found, its usage address is updated to current LC. If
not found, first its entry is made in LT and then the usage address is updated.

If operand is a symbol, it is searched in ST. If found, it indicates that it has already been defined.
If not found, its entry is made in ST, but neither type nor definition address are updated.

16
Opera YES
nd Operand Search Literal in
is a LT
Literal?

NO
NO
Literal
Found
?
Search Symbol
in ST
Insert Literal YES
in LT

YES
Symbo
l
Usage Address
Found {Do LC
? Nothing}

NO
LC LC + NEXT
LOI STEP

Insert Symbol
in ST

Fig. 2.6: Processing Operands in Pass 1

2.7.1.7 Processing Pseudo-opcodes in Pass 1:

1. START: The START pseudo-op is used to indicate the starting location of program in the
memory. If its operand field is 0 or blank, it means program can be moved anywhere in the
memory (i.e. it is re-locatable). Otherwise, the operand gives a fixed starting address in
memory (i.e. program is static).

In Pass 1, this START pseudo-op is ignored, since it will be processed by Pass 2.

2. ENDP: The ENDP pseudo-op (without any operand) is used to indicate end of code segment
and start of data segment. So, to process this, the assembler simply resets the code_flag to 0.

17
3. DB / DW: These two pseudo-ops are used to define variables. Its label field is the variable
name and operand field is variable value.

So, when DB or DW is encountered, its label is searched in ST. If found, its type is updated
to VAR and definition address is updated to current LC. If not found, first its entry is made in
ST and then its type and definition address are updated.

4. CONST: The CONST pseudo-op is used to define a constant variable whose value is fixed at
the time of definition and cannot be changed during program execution. Its label field is the
variable name and operand field is variables fixed value.

When CONST is encountered, its label is searched in ST. If found, its type is updated to
CONST and definition address is updated to current LC. If not found, first its entry is made in
ST and then its type and definition address are updated.

18
Pseudo-
Opcode
Type of Pseudo-
Opcode
STAR
T END
P
DB /
DW CON
ST
Search POT
for Pseudo-
op Search POT for
{Ignor Pseudo-op
e}

Reset Code
Flag=0 Search Label
in ST

NO
Label
Found
Read Next ?
Line
Make
YES entry
into ST

For DB / DW, set Type


Set Definition VAR
Address LC For CONST, set Type
CONST

Fig. 2.7: Processing Pseudo-opcodes in Pass 1

19
2.7.1.8 Handling Procedure Definitions and Calls:

1. PROC Mnemonic: The PROC mnemonic indicates start of a procedure definition. Its
operand field gives the procedure name. This procedure name is searched in ST. If it
already exists in ST, then a Duplicate Procedure error is displayed. If not found, its entry
is made in ST and type is set to PROC and definition address is set to current LC.

Also, the assembler creates a flag with name procname_flag and sets it ON, indicating the
procedure definition is on. (When ENDP with same procname as its operand is found,
procname_flag is turned off).

Search Operand
in ST
PROC

NO
Operand Set error flag
found = ON
in ST?

YES
Display error
Duplicate
Make entry into ST
Procedure
with Type PROC
and
Definition Address
LC.

Set procname operand


and
Create procname_flag and
set it ON

NEXT
STEP

Fig. 2.8: Handling Procedure Definition in Pass 1

20
2. ENDP procname: If ENDP is followed by an operand, it indicates end of a procedure with
name procname. So, the assembler sets procname_flag to OFF.

3. CALL Mnemonic: The CALL mnemonic indicates a procedure call; its operand is the
name of the procedure to be called. When CALL is encountered, it is first searched in MOT
and its operand is searched in ST for type=PROC. If found, it means the procedure is
well-defined. So, only LC is incremented by LOI. If not found, its entry is made into ST,
but neither the type nor definition address are updated.

Search Mnemonic
in MOT
CAL
L
Se
arc Search Operand
h in ST for
Mn type=PROC
em
oni
c in
NO
MO Operand
T found Set error flag
in ST? = ON

YES

Display error
Make entry into Call to
ST Undefined
Procedure

LC LC +
LOI

NEXT
STEP

Fig. 2.8: Handling Procedure Calls in Pass 1

21
2.7.1.9 Defining Literals:

At the end of Pass 1, all the literals in LT get defined in the area after the end of program and
their definition address is updated in LT. This area at the end of program where literals get
defined is often called as a Literal Pool

2.7.2 Pass 2 of a 2P Assembler

After Pass 1 is completed successfully, assembler re-scans the source file, assembles the
mnemonics in it and writes their equivalent binary code into output file. Symbols and Literals are
resolved using ST and LT respectively.

2.7.2.1 Pass 2 Databases

Following data structures (or databases) are needed in Pass 2:


1. ALP Source Code File (Input)
2. Machine Opcode Table (MOT)
3. Pseudo-Opcode Table (POT)
4. Symbol Table (ST)
5. Literal Table (LT)

Additionally, following pointers are also used:


6. MOT_PTR (to keep track of location being read from MOT)
7. POT_PTR (to keep track of location being read from POT)
8. ST_PTR (to keep track of location being read from ST)
9. LC (Location Counter is needed to keep track of which location of output file is being
written)
10. CODE_FLAG (to indicate where code segment ends and data segment starts)

A final object file is generated at the end of Pass 2.

22
2.7.2.2 Overall Pass 2 Flowchart:

Actions performed by a 2-Pass Assembler in Pass 2 can be summarized in the following


flowchart:

(pass 2 flowchart fig. 2.10 here)

2.7.2.3 Initializations in Pass 2:

Before Pass 2 begins following initializations are performed:


1. Source file is opened in Read mode and Output file is opened in Write mode.
2. All tables (MOT, POT, ST, and LT) get initialized.
3. Respective pointers are initialized to point to their first entries.
4. Location Counter (LC) is set initialized to 0.
5. Error flag is set off, Code_flag is set to 1.

2.7.2.4 Labels in Pass 2:


Label definitions have been handled in Pass 1. So, they are ignored in Pass 2.

2.7.2.5 Mnemonics in Pass 2:


The mnemonic is searched in MOT; its equivalent binary opcode is read from MOT and
written into output file. LC is incremented by LOI.

23
Search Mnemonic
in MOT
Mnemon
ic
Search
Mnemo
nic in Write binary
MOT opcode in
o/p file

LC LC + NEXT
LOI STEP

Fig. 2.11: Processing Mnemonics in Pass 2

2.7.2.6 Operands in Pass 2:


If operand is a literal, its definition address is taken from LT and written into output file.
If operand a symbol, its definition address is taken from ST and written in output file. Finally,
LC is incremented by LOI.

24
Opera YES
nd Operand Search Literal in
is a LT and get
Literal? definition
address

NO
Write
definition
address in
o/p file
Search Symbol in
ST and get
definition address
L LC +
LOI

NEXT
STEP

Fig. 2.12: Processing Operands in Pass 2

2.7.2.7 Pseudo-opcodes in Pass 2:

1. START: The pseudo-op is searched in POT; its equivalent binary opcode is written
into output file along with its operand.

2. ENDP: It is used to indicate end of code segment and start of data segment. So, the
assembler simply resets the code_flag to 0.

3. DB / DW / CONST: They are used to define variables. In Pass 2, their labels (i.e.
variable name) are searched in ST and their definition address is retrieved. Their
operands (i.e. value of variable) are now written into this definition address. For
CONST, the processing is same, except that values once written cannot be modified
by the program.

25
Pseudo-
Opcode
Type of Pseudo-
Opcode
STAR
T END
P
DB /
DW CON
ST
Search POT Search POT
for Pseudo- for Pseudo-
op op Search POT for
Pseudo-op

Write binary Reset Code Search Label in ST


opcode and Flag=0 and get definition
operand in o/p address
file

Insert operand into


definition address

NEXT STEP

For DB / CONST, set LC


LC + 1
For DW, set LC LC + 2

Fig. 2.13: Processing Pseudo-opcodes in Pass 2

26
2.7.2.8 Handling Procedure Definitions and Calls:

1. PROC Mnemonic: The PROC mnemonic is searched in MOT; its binary opcode is
written into output file along with its operand. Operand is read into procname and
procname_flag is set ON. LC is then incremented by LOI.

Search Mnemonic in
MOT
PRO
C

Write binary opcode


into o/p file

Get procname
Operand

Set
procname_flag =
ON

LC LC +
LOI

NEXT
STEP

Fig. 2.14: Handling Procedure Calls in Pass 2

2. ENDP procname: It indicates end of a procedure with name procname. So, the
assembler sets procname_flag to OFF.

27
3. CALL Mnemonic: The CALL mnemonic is searched in MOT; its operand symbol
searched in ST. Binary opcode of mnemonic and definition address of operand symbol
are written into output file.

Search Mnemonic in
MOT
CAL
L
Se
arc
h
Mn Write binary opcode
em into o/p file
oni
c in
MO
T
Search Operand
in ST

Write definition address


into o/p file

LC LC +
LOI

NEXT
STEP

Fig. 2.15: Handling Procedure Calls in Pass 2

At the end of Pass 2, a final object file is generated.

2.8 Sample Program Assembly using Two-Pass Assembler

Exam Questions:
Q) Using any assembler language, write a sample assembly
program and w.r.t that program, describe how a two-pass
assembler will translate it. (May 04 [IT] 10M). 28
(Hint: Give a brief explanation of the steps as well)
Sample Program: To multiply N1 * N2 by successive addition method

Input ALP Code:


(memory location)
1000 START
1002 LOAD N2 ; Load value of N2 into register A
1004 STORE COUNT ; Store value of N2 into COUNT variable
1006 LOAD =0 ; Load literal value 0 into register A
1008 repeat: ADD N1 ; Add value of N1 to register A
1010 DEC COUNT ; Decrement COUNT variable by 1
1012 JNZ repeat ; Jump if-not-zero to repeat
1014 STORE SUM ; Store sum value into SUM variable
1016 STOP ; program execution ends
1017 ENDP ; code segment ends
1018 N1 DB 07 ; Define a Byte for variable N1 with value 07
1019 N2 DB 05 ; Define a Byte for variable N2 with value 05
1020 COUNT DB ? ; Define a Byte for variable COUNT with null value
1021 SUM DW ?? ; Define a word for variable SUM with null value
1023 END ; End of program

Machine Opcode Table (MOT) (Remains static for both passes)

Mnemonic Opcode Binary Opcode Number of Length of Instruction (LOI)


Operands (in bytes)
ADD 01 1 2
SUB 02 1 2
LOAD 03 1 2
STORE 04 1 2
INC 05 1 2
DEC 06 1 2
JMP 07 1 2
JNZ 08 1 2
STOP 09 0 1
Pseudo-Opcode Table (POT) (Remains static for both passes)

29
Pseudo-Opcode Number of
Operands
DB 1
DW 1
DD 1
CONST 1
START 1
LTORG 1
ENDP 0
END 0

(NOTE: If an exam question asks you to assemble an entire program, keep MOT / POT limited to
only mnemonics and pseudo-ops used in your program, as it is only for a hypothetical machine)

Symbol Table (ST) (After Pass 1)

Symbol Name Type Address


N2 - -
COUNT - -
N1 - -
repeat LABEL 1008
SUM - -

Literal Table (LT) (After Pass 1)

Literal Notation Value Usage Address Definition Address


=0 0 1007 -

30
Symbol Table (ST) (After Pass 2)

Symbol Name Type Address


N2 VAR 1019
COUNT VAR 1020
N1 VAR 1018
repeat LABEL 1008
SUM VAR 1021

Literal Table (LT) (After Pass 2)

Literal Notation Value Usage Address Definition Address


=0 0 1007 1024

Output File: (After Pass 2)

Address Opcode Operand


1002 03 1019
1004 04 1020
1006 03 1024
1008 01 1018
1010 06 1020
1012 08 1008
1014 04 1021
1016 09 -

Stored Variables: (Stored in Data Segment)


Address Value
1018 07
1019 08
1020 NULL
1021,1022 NULL

Stored Literals: (Literal Pool)

Address Value
1024 00

2.9 Single-Pass Assembler Design

Exam Questions:
Q) Explain with the help of flowchart and data structures, the 31
working of a single-pass assembler. (May 04 [Comps] 10M, May 05
[Comps] 10M, May 05 [IT] 10M, May 06 [IT] 10M, Dec 07 [IT] 10M,
May 07 [Comps] 10M).
Forward Reference Problem (FRP) occurs when a symbol is referenced before it gets defined. Due
to this, the assembler cannot assemble the instruction right at the time when it gets encountered.

To resolve FRP, we used Multi-pass approach which collects all symbol definitions in Pass 1 and
then assembles all instructions in Pass 2. But it requires an extra pass to handle forward-referenced
symbols, which is inefficient.

A Single-pass Assembler solves FRP efficiently in one pass. It uses a special data structure called
the Forward Reference Table (FRT) and assembles all the instructions completely, right at the time
when they are encountered.

If the instruction involves a forward-referenced symbol (that is not yet defined), such symbols are
entered into FRT. At the end of pass, when all symbol definitions have been collected in ST, the
assembler uses ST to update definition addresses of forward-referenced symbols in FRT.

It then uses FRT to update the usage locations of forward-referenced symbols in output file. Thus,
a single-pass assembler handles forward reference symbols efficiently.

32
2.9.1 Forward Reference Table (FRT)

FRT is sometimes also called as Table of Incomplete Instructions (TII).

When One-Pass Assembler encounters a forward-referenced symbol, it enters the symbol in FRT
and updates its usage address. For multiple references to a symbol in FRT, the assembler will
append the usage address in the corresponding field.

When the symbol gets defined, its definition address is updated in FRT. At the end of pass, the
assembler copies the definition address from FRT into usage addresses, for every forward-
referenced symbol.

Structure of Forward Reference Table (FRT)

Symbol Name Type Usage Address Definition Address

. . . .
. . . .
. . . .

For example: Consider following ALP as example:


(memory location)

1000 LOAD X
1002 UP1: SUB Y
1004 JZ DOWN1 ; Forward-referenced Symbol encountered
1006 ADD Y
1008 JNZ DOWN1 ; Forward-referenced Symbol encountered
1010 JMP UP1
1012 DOWN1: STORE Y
1014 ENDP
1015 X DB 02
1016 Y DB 04

33
1017 STOP
1018 END

At the end of assembly of this program using One-Pass Assembler, FRT will be as follows:

Symbol Name Type Usage Address Definition Address


DOWN1 LABEL 1005, 1009 1012

Now, the assembler copies definition address 1012 at usage addresses 1005 and 1009.

2.9.2 Databases of Single-Pass Assembler

A Single-Pass Assembler uses following databases:


1. ALP Source Code File (Input)
2. Machine Opcode Table (MOT)
3. Pseudo-Opcode Table (POT)
4. Symbol Table (ST)
5. Literal Table (LT)
6. Forward Reference Table (FRT)

Additionally, following pointers are also needed.


7. MOT_PTR (to keep track of location being read from MOT)
8. POT_PTR (to keep track of location being read from POT)
9. ST_PTR (to keep track of location being read from/written into ST)
10. FRT_PTR (to keep track of location being read from/written into FRT)
11. LC (Location Counter is needed to keep track of which location from source file is being
read)
12. CODE_FLAG (to indicate where code segment ends and data segment starts)

2.9.3 Overall One-Pass Assembler Flowchart

34
START: One-Pass
Assembler Handle each
Pseudo-op as
per its type

Initializati
ons
NXT

Read a line Assemble


from ALP Process Label using ST, Mnemonic using
Source File Assemble Mnemonic using MOT and Process
MOT and Process Operand Operand using ST
using ST (or FRT, if needed) (or FRT, if
needed)
Label + Mnemonic + Pseud
Operand Mnemonic + o-
Operand Opcod
Is END NO
Analyze the Statement e
Pseudo-
Op
Reached?
Comme Only
nt Label Label +
Mnemonic
YE
Ignore rest Process
S of the line Label
Process
using ST
Label
using ST
NO
Is Error
Flag Display
Error Assemble
OFF?
Assembly Mnemonic using
Unsuccessf MOT
ul
YE
S
NXT1
STOP
Display Error
Assembly
successful

End of One-Pass
Assembler

Fig. 2.16: Single-Pass Assembler Flowchart

35
(Note that the design of 1P-Assembler given in this book does not handle procedures)
2.9.4 Initializations in One-Pass Assembler

Following initializations are needed for 1P Assembler:


1. Source file is opened in Read mode and Output file is opened in Write mode.
2. All tables (MOT, POT, ST, LT, and FRT) get initialized.
3. Respective pointers are initialized to point to their first entries.
4. Location Counter (LC) is set initialized to 0.
5. Error flag is set OFF, Code_flag is set to 1.

2.9.5 Handling Labels:

When a label definition is encountered in the program, the assembler searches for the label in ST.
If an existing label already exists, then its a case of duplicate label definition. So the error flag is
turned on and Duplicate Label error is displayed.

If the label is not found in ST, then it is searched in FRT to see if it has been forward-referenced.
If found in FRT, then it is surely a case of forward-referenced LABEL. So, its definition address is
appended in corresponding field FRT.

If not found in FRT, then its a regular non-forward-referenced LABEL. So, its entry in made into
ST with type as LABEL and definition address as current LC.

36
Search Label
in ST
Label
Definition

YES NO
Label
Found Search Label
in ST? in FRT

Set Error flag


ON
NO
Label
Found
Display Error in
Duplicate FRT?
Label
YES
Insert Label in ST ,
Type LABEL and Append
Definition Address Definition
LC Address LC
in FRT

NEXT
STEP

Fig. 2.17: Label Handling by a Single-pass Assembler

2.9.6 Handling Mnemonics:

The mnemonic is searched in MOT; its equivalent binary opcode is read from MOT and written
into output file. LC is incremented by LOI.

37
Search Mnemonic
in MOT
Mnemon
ic
Search
Mnemo
nic in Write binary
MOT opcode in
o/p file

Fig.
LC LC + NEXT
2.18 LOI STEP

Fig. 2.18: Mnemonic Handling by a Single-Pass Assembler

2.9.7 Handling Operands:

If the operand is a literal, its first searched in LT. If found, its usage address is updated. Otherwise,
first its entry is made into LT and then its usage address is updated.

If the operand is a symbol, it is searched in ST. If found, its definition address is written into output
file and LC is incremented by LOI.

If the operand symbol is not found in ST, then its a case of forward-referenced symbol. So, the
entry for that symbol is made into FRT with type=LABEL and usage address= LC. Now, if this
symbol is referenced again in the program, its usage address gets appended in the corresponding
field. When this symbol finally gets defined, its definition address is updated in FRT.

38
Opera YES
nd Operand Search Literal in
is a LT
Literal?

NO
NO
Literal
Found
?
Search Symbol
in ST
Insert Literal YES
in LT

YES
Symbo
l
Append LC to
Found
Usage Address
in ST?
in LT
Get definition
NO address and
write it into
o/p file

Search Symbol
in FRT

LC LC + NEXT
YES LOI STEP
Symbo
l
Found
in
FRT? Append LC to
usage Address
NO
in FRT

Make Symbol
entry into FRT

Fig. 2.19: handling Operands in a Single-Pass Assembler

39
2.9.8 Handling Pseudo-opcodes:

1. START: The pseudo-op is searched in POT; its equivalent binary opcode is written into
output file along with its operand.

2. ENDP: It is used to indicate end of code segment and start of data segment. So, the
assembler simply resets the code_flag to 0.

3. DB / DW / CONST: They are used to define variables / constants. Their labels (i.e.
variable names) are searched in ST. If not found, its entry is first made into ST and then its
definition address is updated. If found, its definition address can be directly updated.

After the definition address is updated, the operand of instruction (i.e. value of symbol) is
written at this address.

Now to check that if symbol was also forward-referenced, the label field (i.e. symbol name)
is now searched in FRT. If found, its definition address is updated. If not found, it means
symbol was not forward-referenced.

Finally, LC is incremented by 1 for DB or CONST (since each of these two occupies one
byte) or by 2 for DW (since one word occupies two bytes).

40
Pseudo-
Opcode
Type of Pseudo-
Opcode
STAR
T END
P
DB /
DW CON
ST
Search POT Search POT
for Pseudo- for Pseudo-
op op Search POT for
Pseudo-op

Write binary Reset Code Search Label


opcode and Flag=0 in ST
operand in o/p
file
NO
Label
found in Enter Label
ST? into ST

NEXT STEP
YES For DB / DW, set Type
VAR
For CONST, set Type
CONST
For DB / CONST, set LC
LC + 1 Set Definition
For DW, set LC LC + 2 Address LC in ST

Set Definition Insert operand


Address LC in FRT into definition
address

Set Definition Search Label


Address LC in FRT again, this Fig.
time in FRT 2.20

Fig. 2.20: Handling Pseudo-opcodes in a Single-Pass Assembler

41
2.10 Sample Program Assembly using One-Pass Assembler

Sample Program: To multiply N1 * N2 by successive addition method

Input ALP Code:


(memory location)
1000 START
1002 LOAD N2 ; Load value of N2 into register A
1004 STORE COUNT ; Store value of N2 into COUNT variable
1006 LOAD =0 ; Load literal value 0 into register A
1008 repeat: ADD N1 ; Add value of N1 to register A
1010 DEC COUNT ; Decrement COUNT variable by 1
1012 JNZ repeat ; Jump if-not-zero to repeat
1014 STORE SUM ; Store sum value into SUM variable
1016 JNC down ; Jump if-not-carry to down
1018 INC SUM ; Increment SUM if there is carry
1020 down: STOP ; program execution ends
1021 ENDP ; code segment ends
1022 N1 DB 07 ; Define a Byte for variable N1 with value 07
1023 N2 DB 05 ; Define a Byte for variable N2 with value 05
1024 COUNT DB ? ; Define a Byte for variable COUNT with null value
1025 SUM DW ?? ; Define a word for variable SUM with null value
1027 END ; End of program

Machine Opcode Table (MOT) (Remains static)

Mnemonic Opcode Binary Opcode Number of Length of Instruction (LOI)


Operands (in bytes)
ADD 01 1 2
SUB 02 1 2
LOAD 03 1 2
STORE 04 1 2
INC 05 1 2
DEC 06 1 2
42
JMP 07 1 2
JNZ 08 1 2
JNC 09 1 2
STOP 10 0 1

Pseudo-Opcode Table (POT) (Remains static)

Pseudo-Opcode Number of
Operands
DB 1
DW 1
DD 1
CONST 1
START 1
LTORG 1
ENDP 0
END 0

(NOTE: If an exam question asks you to assemble an entire program, keep MOT / POT limited to
only mnemonics and pseudo-ops used in your program, as it is only for a hypothetical machine)

Symbol Table (ST) (After processing data segment)

Symbol Name Type Address


N2 VAR 1023
COUNT VAR 1024
N1 VAR 1022
repeat LABEL 1008
SUM VAR 1025
down LABEL 1020

Forward Reference Table (FRT)

Symbol Name Type Usage Address Definition Address


N2 VAR 1003 1023
COUNT VAR 1005,1011 1024
N1 VAR 1009 1022
repeat LABEL 1013 1008
SUM VAR 1015,1019 1025
down LABEL 1017 1020

43
Literal Table (LT)

Literal Notation Value Usage Address Definition Address


=0 0 1007 1029

Output File: (After Pass 2)

Address Opcode Operand


1002 03 1023
1004 04 1024
1006 03 1029
1008 01 1022
1010 06 1024
1012 08 1008
1014 04 1025
1016 09 1020
1018 05 1025
1020 10 -

44
Stored Variables: (In Data Segment)

Address Value
1022 07
1023 08
1024 NULL
1025,1026 NULL

Stored Literals: (Literal Pool)

Address Value
1028 00

2.11 SPARC Assembler

Exam Questions:
Q) Explain SPARC Assembler. (May 04 [Comps] 5M, Dec 04 [Comps]
10M, May 06 [Comps] 6M, May 07 [Comps] 5M, Dec 07 [IT] 10M, June
08 [Comps] 10M, Dec 08 [Comps] 5M).

The SPARC (Scalable Processor Architecture) is based on RISC (Reduced Instruction-Set


Computing) Architecture, which means that a bare minimum number of instructions are needed to
complete a job. The resulting architecture is faster and leads to rapid execution, as most
instructions take only one clock cycle to decode.

Some of the key features of SunOS SPARC Assembler are as follows:

1. Segmented Program Memory

The memory given to a program is divided into different segments called sections. Some
examples of such sections are
.TEXT for executable instructions.
.DATA for data needed by program, it is a read-write section.

45
.BSS (Block Starting Symbol) for uninitialized or zero-initialized data sections (when
programs become large and complex, often considerable number of data variables may
not be initialized at all (or initialized to zero) at the start of program; they may get
values during execution. .BSS section is needed to store such data).

The Assembler maintains a different location counter (LC) for each named section. When control
is switched form one section to another, the associated LC is also switched.

Thus, sections are like blocks of same program. But inter-section references (within the same
program) are to be resolved by the Linker and not the Assembler.

2. Object File

The object file produced by SPARC Assembler includes following:


Assembled program sections
A Symbol Table (ST), which includes information regarding various symbols and
section names.
A list of relocation and linking operations (for the Linker to resolve inter-section
references).

3. Different Types of Symbols

The SPARC Assembler provides facility of following types of symbols:

Local Symbols: Symbols defined and used in same program are called Local Symbols.
A section can freely refer to local symbols defined in other sections of same program.

Global Symbols: Symbols that are used in a program, but defined externally or the
ones that are defined locally in a program but can be accessed by other programs are
called Global Symbols.

46
Weak Symbols: If in a section, there are two symbols of same name, one is a local
symbol and the other is a global symbol, then the local symbol may be over-ridden by
the global symbol having the same name.

Such local symbols that may be over-ridden by a global symbol are called weak
symbols

4. Delay Slot

The SPARC Architecture uses delayed branching logic i.e. instruction appearing after a branch
instruction is executed before the branch is actually taken. Such an instruction is said to be in a
delay slot of the branch.

SPARC Assembly Programmers often use NOP (No Operation) instruction or any other
significant instruction to optimize the performance.

2.12 Possible errors during program assembly

Non-existent (or wrongly used) mnemonic.


Reserved mnemonic used as a symbol
Operand mismatch while using a mnemonic (wrong number of operands and their type)
Duplicate symbol / label
Un-defined symbol / label
Wrong number (or wrong type) of parameters (i.e. parameter mismatch), in a procedure
call.
Missing End-of-Procedure
Missing End-of-File

2.13 Chapter Summary

47
Assembler is a translator which converts assembly language of a machine into machine-
language code of that machine. It also includes information required by the loader in the
output.
Literals are plain values, not associated with any symbol name. They are used sparingly in
the program. Symbols are names assigned to values that are used frequently in the program.
Their value usually goes through many manipulations during the lifetime of the program.
Symbol names can also be used to assign a label for a memory location.
Features of assembly language include: use of human-readable mnemonics for binary
opcodes, use of symbols for operands and specification of a separate area to store variables.
An Assembler uses following data structures:
o ALP Source Code
o Machine Opcode Table (MOT, to store corresponding binary opcodes of language-
specific mnemonics)
o Pseudo-Opcode Table (POT, stores basic information about language-specific
assembler-directives and mnemonics used for declarative statement)
o Symbol Table (ST, to store information about symbols used in the program)
o Literal Table (LT, to store information about literals used in the program).
When a symbol is referenced before it is defined in the program, it is called forward
reference. Due to this, the assembler cannot assemble the entire instruction at once. This is
called as Forward Reference Problem (FRP).
To solve this FRP, two approaches are used: Multipass Approach and Single-pass
Approach.
A Multipass assembler makes two passes over the program (more passes may be needed).
In Pass 1, it only collects all symbols in ST. In Pass 2, all instructions are assembled using
this ST. Since it uses one pass is used to collect only symbols, it may be inefficient in case
of programs having very less or no forward-referenced symbols at all.
So, we use a Single-pass approach that makes only one pass through the program and uses
an extra data structure, Forward Reference Table (FRT) to resolve forward-reference
symbols.
In a Single-pass assembler, a forward-referenced symbol (along with its usage locations) is
first inserted in FRT. When these symbols get defined in ST, their definition addresses are

48
updated in FRT. At the end of the pass, FRT is used to insert definition address value into
usage locations for all forward-referenced symbols.
Features of SPARC Assembler include:
o Division of program memory into different types of sections and maintaining
different location counters for each different section.
o Use of global, local and weak symbols.
o Object file containing a list of relocation and linking operations for the Linker
o A Delayed Branching facility i.e. that instruction immediately following a branch
instruction will be actually executed before the branch is taken. Programmers can
place important instructions in this delay slot to optimize the program
performance.

2.14 Expected Viva Questions

Q.1) What is a Delay Slot in SPARC Assembler?

Q.2) What is a weak symbol in a SPARC Assembler Program?

Q.3) Usage Address field in Literal Table (LT) and Forward Reference Table (FRT) can have
multiple values for same literal or symbol. How it can be practically implemented?
(Ans: A Linked List! The usage address column for an entry, will point to start [i.e. head] of
the linked list. Each usage address node of the linked list will point to the next node.
Whenever a new usage address is to be appended, it will be linked to the last node of the list).

Q.4) What is a Literal Pool? (Note: Sometimes, literals may not be defined in memory
immediately after the program; a separate memory area may be reserved for them. So, literal
pool may not necessarily be immediately after the program).

Q.5) What is forward-reference problem?

Q.6) How are literals different from regular symbols? (Also, mention about difference in structure
of ST and LT).

49
Q.7) Why is assembly language preferred over machine language? (Refer section 2.3)

Q.8) What are the different types of ALP statements?

Q.9) What could be the possible errors during the assembly of the program?

50

You might also like