Chap 2 - Assemblers
Chap 2 - Assemblers
Definition of an Assembler
Basic functions of Assembler
Features and Elements of Assembly Language Programming
Forward Reference Problem and Solutions to it
Multi-pass (2-Pass) Assembler Design
1-Pass Assembler Design
SPARC Assembler
Possible Errors during assembly of a program
An Assembler is a language translator that accepts an assembly language program as input and
produces its machine language equivalent as output, along with information required by the
Loader.
Assemb Machine
ly Assembl Language
Langua er Equivalent,
ge along with
Progra information
m Databas required by
Loader
es
1
2.2 Elements of Assembly Language Programming
The primary function of assembler is to replace the mnemonic with its equivalent binary opcode
and replace the operand symbols and literals with their storage addresses.
Comments are for programmer to understand the meaning of the ALP statement in context with
the remaining code. The Assembler must ignore them as they are not for actual processing.
Literals are plain values that are not associated with any variable name. They are often used by
programmers as ad-hoc values that are not needed throughout the program, but only for a few
statements. An Assembler handles these literals by using a separate data structure called Literal
Table (LT).
For example, an integer abc is to be increased by 4 units, we write the statement as abc=abc+4;
Here, it is assumed that 4 is not needed again in the program. So, the value 4 is used an integer
literal here. (Similarly if we consider strName = John, here John is a string literal)
2
Symbols are the variable names that hold values needed throughout the program. The Assembler
stores these symbols and their definition addresses in a data structure called as the Symbol Table
(ST).
Symbols are also referred to as symbolic references, because they are practically implemented as
references to memory locations where their values are stored.
Symbols can be used for Labels and Operands. The function of Assembler is to search the ST when
a symbol is found in the program and replace it with its definition address, where the value is
stored.
For example, an integer abc is increased by 4 units once, and then this value 4 is incremented and
added repeatedly to abc in a loop. Here, value 4 is used once but its referred from inside a loop
again and again. So, we define a variable (i.e. a symbol) incr = 4 and then this variable is used to
increment the value repeatedly inside the loop by using incr = incr + 1.
Procedures are nothing but functions in ALP; they are used when a relatively large number of
statements are to be used repeatedly in the program.
When a procedure is called, current execution state of the program is pushed onto stack. When the
procedure returns, the execution state is popped out of the stack and the calling program is
resumed.
3
1. Assembler Directive Statements
Exam Question:
Q) What are Assembler Directives? Explain with examples (May
06 [Comps] 6M, Dec 08 [Comps] 5M)
These statements direct the assembler to take the action associated with it. They are not a part of
executable instructions. So, they are not included in the final object file generated by assembler as
output
ii. ORG has a similar purpose as START, but ORG is included in (and is implicitly associated
with) the program, whereas START is used independently of a program. So, START
statement must also include the program name in its label field.
E.g:
PG1 START 2000 ; Start storing the program PG1 from address 2000
ORG 3000 ; Start storing the current program (in which ORG is used) from address 3000
iii. USING is an assembler directive to indicate which register is to be used as base register
and what will be its value for the base.
E.g: USING 1500,15 ; Use register 15 as base register and 1500 as its value for the base
iv. DROP is an assembler directive used to drop a base register that has been allocated by a
USING statement
4
v. EQU is an assembler directive used to equate a symbolic name to a value and make program
more readable. Whenever a symbol is defined using EQU directive, no memory is allocated
to it; only an entry is made in symbol table (ST).
E.g:
SUNDAY EQU 1
MONDAY EQU SUNDAY + 1
Now, wherever in the program we use the symbols SUNDAY and MONDAY, assembler will
replace them by 1 and 2 i.e. their respective equated values.
2. Declarative Statements These are special cases of assembler directives; they are used to
declare symbols and associated them with values. For example, var1 DW A43B is used to declare a
symbol var1 with hex value A43B; DW tells the assembler that var1 uses a 16-bit word in the
memory.
1. Use of Mnemonics to specify Opcodes makes the assembly language program much more
readable and debugging is also easier.
2. Use of Symbols to specify Operands means that program can be modified with no
overhead. That is, if definition address of a symbol changes, the change in ALP is done
5
only at the place where the symbol is declared; all the places where the symbol has been
used need not be updated.
3. Separation of Code and Data Segments allows the programmer to keep aside some
portion of memory for the data to be used by the program.
Assembler needs following basic data structures (also called as databases) as input:
1. ALP Source Code ALP statements are contained in the source file. For example,
A sample ALP code for adding 2 numbers can be given as follows:
2. Mnemonic Opcode Table (MOT) All different machine architectures have their own set
of mnemonics to be used for their respective assembly language (So, ALP of one machine
cannot run on another machine).
MOT is a fixed length table defined by programmer as per assembly language of the
underlying machine. During translation, the assembler searches for the mnemonic in MOT
and replaces it with the associated binary opcode given in MOT.
6
The structure of MOT is as follows:
. . . .
. . . .
. . . .
3. Pseudo-Opcode Table (POT) POT is also a fixed length table defined by programmer as
per assembly language of the underlying machine. While translation, the assembler
searches for the pseudo-opcode in POT and takes necessary action associated with it.
7
Structure of POT is as follows:
Pseudo-Opcode Number of
Operands
. .
. .
. .
Pseudo-Opcode Number of
Operands
DB 1
DW 1
DD 1
CONST 1
START 1
LTORG 1
ENDP 0
END 0
[Note that these MOT and POT are only for a Hypothetical machine (and not for any specific
machine like 8085/8086); the ALP given in above example uses only a few of these mnemonics
and pseudo-ops]
4. Symbol Table (ST) ST is used to keep a track of symbols assigned to variables being
used in the program.
When a symbol gets defined, Assembler makes its entry into the Symbol Table (ST) along
with its definition address. When a symbol is used in an instruction, the Assembler first
verifies validity of the symbol using ST and if validation is successful, definition address of
symbol is written into output file.
(Note that, ST is not language-specific like MOT / POT; ST is program specific i.e. every
program has its own ST).
8
Structure of ST is as follows:
. . .
. . .
. . .
5. Literal Table Assembler tracks the usage of literals in a program through a Literal Table
(LT).
When a literal is first encountered in a program, its entry is made into LT along with its
usage address, but the definition address is not updated. Now, with each different usage of
9
same literal, assembler inserts the usage address into the field accordingly. But these literals
do not get defined until EOF is reached.
Once EOF is reached, all the literals in LT get defined in the area after the end of program
and their definition address is updated in LT. This area at the end of program where literals
get defined is often called as a Literal Pool
. . . .
. . . .
. . . .
10
Literal Notation Value Usage Address Definition Address
=4 4 1003, 1009 1020
=5 5 1011 1021
(Note: these are literal pool
addresses)
Exam Questions:
Q) State the reason for assembler to be a multi-pass program. (May
05 [Comps] 4M, May 04 [IT] 4M, Dec 06 [Comps] 4M)
[NOTE: These tables and algorithms in this book are for a hypothetical machine. So some major
changes might be required, for actual implementation of Assembler (or even simulation of these
algorithms)]
11
In a Multi-pass approach, assembler assembles the mnemonics and constructs ST in pass 1. In the
Pass 2, Input file is read again along with ST and an assembled object code file is generated in the
output. This is a 2-Pass Approach, but some complex programs may need more than just 2 passes
for completing the assembly of the program. Such an approach is called Multi-pass Approach.
In a Single-pass approach, the Assembler makes use of a special data structure called Forward-
Reference Table (FRT) to keep track of only the forward-referenced symbols. At the end of pass 1,
this new data structure is used to update definition addresses of forward-referenced symbols in ST
and in the program.
Exam Questions:
Q) Explain with neat flowchart and database working of two-pass
assembler. (Dec 04 [IT] 12M, Dec 07 [IT] 10M, June 08 [Comps]
10M, Dec 08 [Comps] 10M, June 08 [IT] 10M) [Often asked as a
compulsory question]
When an Assembler needs more than one pass through the input program to complete the
assembly of ALP, its called a Multi-pass Assembler.
In general, not more than 2 passes are actually required for complete assembly of an ALP.
[Complex programs requiring more than 2 passes for assembly are beyond the scope of our
syllabus (and this book). So we are essentially looking at a Two-Pass Assembler Design]
12
Symbol Table
(ST)
Pass 1 Pass 2
Pass 1 is used collect all symbol definitions (for variables and labels) in ST. In Pass 2, the source
file is read again and since addresses of all symbols are now known through ST, all instructions
get fully assembled and a final object file is generated as output.
13
A Symbol Table (ST) containing all symbol definitions used in programs, is generated at the end of
Pass 1.
[For the purpose of simplicity, we break down algorithm based on how each different element of
an ALP statement is processed by the assembler, in each pass]
(PLEASE NOTE:
1. From here onwards in this book, NEXT STEP refers to the next LOGICAL step. So,
after processing Label, if there is a mnemonic, next step will be processing that
mnemonic. If there is no mnemonic, next line will be read. Similarly after mnemonic,
if the instruction has an operand, then next step is processing those operands. And
obviously, after operands if there are any comments they are ignored and next step is
reading the next line.
2. In exam, you need not draw these smaller flowcharts; they are meant only for
understanding. The main overall flowcharts of the each pass are enough. But make
sure you explain clearly how each element is processed).
Search Label in
ST
Label
Definition
Display Error
Duplicate NEXT
Label STEP
15
2.7.1.5 Handling Mnemonics in Pass 1 :
Search Mnemonic
in MOT
Mnemon
ic
Search
Mnemo
nic in LC LC +
MOT LOI
NEXT
STEP
If an instruction has operands, the assembler first checks if the operand is a Literal or not.
If operand is a Literal, it is searched in LT. If found, its usage address is updated to current LC. If
not found, first its entry is made in LT and then the usage address is updated.
If operand is a symbol, it is searched in ST. If found, it indicates that it has already been defined.
If not found, its entry is made in ST, but neither type nor definition address are updated.
16
Opera YES
nd Operand Search Literal in
is a LT
Literal?
NO
NO
Literal
Found
?
Search Symbol
in ST
Insert Literal YES
in LT
YES
Symbo
l
Usage Address
Found {Do LC
? Nothing}
NO
LC LC + NEXT
LOI STEP
Insert Symbol
in ST
1. START: The START pseudo-op is used to indicate the starting location of program in the
memory. If its operand field is 0 or blank, it means program can be moved anywhere in the
memory (i.e. it is re-locatable). Otherwise, the operand gives a fixed starting address in
memory (i.e. program is static).
2. ENDP: The ENDP pseudo-op (without any operand) is used to indicate end of code segment
and start of data segment. So, to process this, the assembler simply resets the code_flag to 0.
17
3. DB / DW: These two pseudo-ops are used to define variables. Its label field is the variable
name and operand field is variable value.
So, when DB or DW is encountered, its label is searched in ST. If found, its type is updated
to VAR and definition address is updated to current LC. If not found, first its entry is made in
ST and then its type and definition address are updated.
4. CONST: The CONST pseudo-op is used to define a constant variable whose value is fixed at
the time of definition and cannot be changed during program execution. Its label field is the
variable name and operand field is variables fixed value.
When CONST is encountered, its label is searched in ST. If found, its type is updated to
CONST and definition address is updated to current LC. If not found, first its entry is made in
ST and then its type and definition address are updated.
18
Pseudo-
Opcode
Type of Pseudo-
Opcode
STAR
T END
P
DB /
DW CON
ST
Search POT
for Pseudo-
op Search POT for
{Ignor Pseudo-op
e}
Reset Code
Flag=0 Search Label
in ST
NO
Label
Found
Read Next ?
Line
Make
YES entry
into ST
19
2.7.1.8 Handling Procedure Definitions and Calls:
1. PROC Mnemonic: The PROC mnemonic indicates start of a procedure definition. Its
operand field gives the procedure name. This procedure name is searched in ST. If it
already exists in ST, then a Duplicate Procedure error is displayed. If not found, its entry
is made in ST and type is set to PROC and definition address is set to current LC.
Also, the assembler creates a flag with name procname_flag and sets it ON, indicating the
procedure definition is on. (When ENDP with same procname as its operand is found,
procname_flag is turned off).
Search Operand
in ST
PROC
NO
Operand Set error flag
found = ON
in ST?
YES
Display error
Duplicate
Make entry into ST
Procedure
with Type PROC
and
Definition Address
LC.
NEXT
STEP
20
2. ENDP procname: If ENDP is followed by an operand, it indicates end of a procedure with
name procname. So, the assembler sets procname_flag to OFF.
3. CALL Mnemonic: The CALL mnemonic indicates a procedure call; its operand is the
name of the procedure to be called. When CALL is encountered, it is first searched in MOT
and its operand is searched in ST for type=PROC. If found, it means the procedure is
well-defined. So, only LC is incremented by LOI. If not found, its entry is made into ST,
but neither the type nor definition address are updated.
Search Mnemonic
in MOT
CAL
L
Se
arc Search Operand
h in ST for
Mn type=PROC
em
oni
c in
NO
MO Operand
T found Set error flag
in ST? = ON
YES
Display error
Make entry into Call to
ST Undefined
Procedure
LC LC +
LOI
NEXT
STEP
21
2.7.1.9 Defining Literals:
At the end of Pass 1, all the literals in LT get defined in the area after the end of program and
their definition address is updated in LT. This area at the end of program where literals get
defined is often called as a Literal Pool
After Pass 1 is completed successfully, assembler re-scans the source file, assembles the
mnemonics in it and writes their equivalent binary code into output file. Symbols and Literals are
resolved using ST and LT respectively.
22
2.7.2.2 Overall Pass 2 Flowchart:
23
Search Mnemonic
in MOT
Mnemon
ic
Search
Mnemo
nic in Write binary
MOT opcode in
o/p file
LC LC + NEXT
LOI STEP
24
Opera YES
nd Operand Search Literal in
is a LT and get
Literal? definition
address
NO
Write
definition
address in
o/p file
Search Symbol in
ST and get
definition address
L LC +
LOI
NEXT
STEP
1. START: The pseudo-op is searched in POT; its equivalent binary opcode is written
into output file along with its operand.
2. ENDP: It is used to indicate end of code segment and start of data segment. So, the
assembler simply resets the code_flag to 0.
3. DB / DW / CONST: They are used to define variables. In Pass 2, their labels (i.e.
variable name) are searched in ST and their definition address is retrieved. Their
operands (i.e. value of variable) are now written into this definition address. For
CONST, the processing is same, except that values once written cannot be modified
by the program.
25
Pseudo-
Opcode
Type of Pseudo-
Opcode
STAR
T END
P
DB /
DW CON
ST
Search POT Search POT
for Pseudo- for Pseudo-
op op Search POT for
Pseudo-op
NEXT STEP
26
2.7.2.8 Handling Procedure Definitions and Calls:
1. PROC Mnemonic: The PROC mnemonic is searched in MOT; its binary opcode is
written into output file along with its operand. Operand is read into procname and
procname_flag is set ON. LC is then incremented by LOI.
Search Mnemonic in
MOT
PRO
C
Get procname
Operand
Set
procname_flag =
ON
LC LC +
LOI
NEXT
STEP
2. ENDP procname: It indicates end of a procedure with name procname. So, the
assembler sets procname_flag to OFF.
27
3. CALL Mnemonic: The CALL mnemonic is searched in MOT; its operand symbol
searched in ST. Binary opcode of mnemonic and definition address of operand symbol
are written into output file.
Search Mnemonic in
MOT
CAL
L
Se
arc
h
Mn Write binary opcode
em into o/p file
oni
c in
MO
T
Search Operand
in ST
LC LC +
LOI
NEXT
STEP
Exam Questions:
Q) Using any assembler language, write a sample assembly
program and w.r.t that program, describe how a two-pass
assembler will translate it. (May 04 [IT] 10M). 28
(Hint: Give a brief explanation of the steps as well)
Sample Program: To multiply N1 * N2 by successive addition method
29
Pseudo-Opcode Number of
Operands
DB 1
DW 1
DD 1
CONST 1
START 1
LTORG 1
ENDP 0
END 0
(NOTE: If an exam question asks you to assemble an entire program, keep MOT / POT limited to
only mnemonics and pseudo-ops used in your program, as it is only for a hypothetical machine)
30
Symbol Table (ST) (After Pass 2)
Address Value
1024 00
Exam Questions:
Q) Explain with the help of flowchart and data structures, the 31
working of a single-pass assembler. (May 04 [Comps] 10M, May 05
[Comps] 10M, May 05 [IT] 10M, May 06 [IT] 10M, Dec 07 [IT] 10M,
May 07 [Comps] 10M).
Forward Reference Problem (FRP) occurs when a symbol is referenced before it gets defined. Due
to this, the assembler cannot assemble the instruction right at the time when it gets encountered.
To resolve FRP, we used Multi-pass approach which collects all symbol definitions in Pass 1 and
then assembles all instructions in Pass 2. But it requires an extra pass to handle forward-referenced
symbols, which is inefficient.
A Single-pass Assembler solves FRP efficiently in one pass. It uses a special data structure called
the Forward Reference Table (FRT) and assembles all the instructions completely, right at the time
when they are encountered.
If the instruction involves a forward-referenced symbol (that is not yet defined), such symbols are
entered into FRT. At the end of pass, when all symbol definitions have been collected in ST, the
assembler uses ST to update definition addresses of forward-referenced symbols in FRT.
It then uses FRT to update the usage locations of forward-referenced symbols in output file. Thus,
a single-pass assembler handles forward reference symbols efficiently.
32
2.9.1 Forward Reference Table (FRT)
When One-Pass Assembler encounters a forward-referenced symbol, it enters the symbol in FRT
and updates its usage address. For multiple references to a symbol in FRT, the assembler will
append the usage address in the corresponding field.
When the symbol gets defined, its definition address is updated in FRT. At the end of pass, the
assembler copies the definition address from FRT into usage addresses, for every forward-
referenced symbol.
. . . .
. . . .
. . . .
1000 LOAD X
1002 UP1: SUB Y
1004 JZ DOWN1 ; Forward-referenced Symbol encountered
1006 ADD Y
1008 JNZ DOWN1 ; Forward-referenced Symbol encountered
1010 JMP UP1
1012 DOWN1: STORE Y
1014 ENDP
1015 X DB 02
1016 Y DB 04
33
1017 STOP
1018 END
At the end of assembly of this program using One-Pass Assembler, FRT will be as follows:
Now, the assembler copies definition address 1012 at usage addresses 1005 and 1009.
34
START: One-Pass
Assembler Handle each
Pseudo-op as
per its type
Initializati
ons
NXT
End of One-Pass
Assembler
35
(Note that the design of 1P-Assembler given in this book does not handle procedures)
2.9.4 Initializations in One-Pass Assembler
When a label definition is encountered in the program, the assembler searches for the label in ST.
If an existing label already exists, then its a case of duplicate label definition. So the error flag is
turned on and Duplicate Label error is displayed.
If the label is not found in ST, then it is searched in FRT to see if it has been forward-referenced.
If found in FRT, then it is surely a case of forward-referenced LABEL. So, its definition address is
appended in corresponding field FRT.
If not found in FRT, then its a regular non-forward-referenced LABEL. So, its entry in made into
ST with type as LABEL and definition address as current LC.
36
Search Label
in ST
Label
Definition
YES NO
Label
Found Search Label
in ST? in FRT
NEXT
STEP
The mnemonic is searched in MOT; its equivalent binary opcode is read from MOT and written
into output file. LC is incremented by LOI.
37
Search Mnemonic
in MOT
Mnemon
ic
Search
Mnemo
nic in Write binary
MOT opcode in
o/p file
Fig.
LC LC + NEXT
2.18 LOI STEP
If the operand is a literal, its first searched in LT. If found, its usage address is updated. Otherwise,
first its entry is made into LT and then its usage address is updated.
If the operand is a symbol, it is searched in ST. If found, its definition address is written into output
file and LC is incremented by LOI.
If the operand symbol is not found in ST, then its a case of forward-referenced symbol. So, the
entry for that symbol is made into FRT with type=LABEL and usage address= LC. Now, if this
symbol is referenced again in the program, its usage address gets appended in the corresponding
field. When this symbol finally gets defined, its definition address is updated in FRT.
38
Opera YES
nd Operand Search Literal in
is a LT
Literal?
NO
NO
Literal
Found
?
Search Symbol
in ST
Insert Literal YES
in LT
YES
Symbo
l
Append LC to
Found
Usage Address
in ST?
in LT
Get definition
NO address and
write it into
o/p file
Search Symbol
in FRT
LC LC + NEXT
YES LOI STEP
Symbo
l
Found
in
FRT? Append LC to
usage Address
NO
in FRT
Make Symbol
entry into FRT
39
2.9.8 Handling Pseudo-opcodes:
1. START: The pseudo-op is searched in POT; its equivalent binary opcode is written into
output file along with its operand.
2. ENDP: It is used to indicate end of code segment and start of data segment. So, the
assembler simply resets the code_flag to 0.
3. DB / DW / CONST: They are used to define variables / constants. Their labels (i.e.
variable names) are searched in ST. If not found, its entry is first made into ST and then its
definition address is updated. If found, its definition address can be directly updated.
After the definition address is updated, the operand of instruction (i.e. value of symbol) is
written at this address.
Now to check that if symbol was also forward-referenced, the label field (i.e. symbol name)
is now searched in FRT. If found, its definition address is updated. If not found, it means
symbol was not forward-referenced.
Finally, LC is incremented by 1 for DB or CONST (since each of these two occupies one
byte) or by 2 for DW (since one word occupies two bytes).
40
Pseudo-
Opcode
Type of Pseudo-
Opcode
STAR
T END
P
DB /
DW CON
ST
Search POT Search POT
for Pseudo- for Pseudo-
op op Search POT for
Pseudo-op
NEXT STEP
YES For DB / DW, set Type
VAR
For CONST, set Type
CONST
For DB / CONST, set LC
LC + 1 Set Definition
For DW, set LC LC + 2 Address LC in ST
41
2.10 Sample Program Assembly using One-Pass Assembler
Pseudo-Opcode Number of
Operands
DB 1
DW 1
DD 1
CONST 1
START 1
LTORG 1
ENDP 0
END 0
(NOTE: If an exam question asks you to assemble an entire program, keep MOT / POT limited to
only mnemonics and pseudo-ops used in your program, as it is only for a hypothetical machine)
43
Literal Table (LT)
44
Stored Variables: (In Data Segment)
Address Value
1022 07
1023 08
1024 NULL
1025,1026 NULL
Address Value
1028 00
Exam Questions:
Q) Explain SPARC Assembler. (May 04 [Comps] 5M, Dec 04 [Comps]
10M, May 06 [Comps] 6M, May 07 [Comps] 5M, Dec 07 [IT] 10M, June
08 [Comps] 10M, Dec 08 [Comps] 5M).
The memory given to a program is divided into different segments called sections. Some
examples of such sections are
.TEXT for executable instructions.
.DATA for data needed by program, it is a read-write section.
45
.BSS (Block Starting Symbol) for uninitialized or zero-initialized data sections (when
programs become large and complex, often considerable number of data variables may
not be initialized at all (or initialized to zero) at the start of program; they may get
values during execution. .BSS section is needed to store such data).
The Assembler maintains a different location counter (LC) for each named section. When control
is switched form one section to another, the associated LC is also switched.
Thus, sections are like blocks of same program. But inter-section references (within the same
program) are to be resolved by the Linker and not the Assembler.
2. Object File
Local Symbols: Symbols defined and used in same program are called Local Symbols.
A section can freely refer to local symbols defined in other sections of same program.
Global Symbols: Symbols that are used in a program, but defined externally or the
ones that are defined locally in a program but can be accessed by other programs are
called Global Symbols.
46
Weak Symbols: If in a section, there are two symbols of same name, one is a local
symbol and the other is a global symbol, then the local symbol may be over-ridden by
the global symbol having the same name.
Such local symbols that may be over-ridden by a global symbol are called weak
symbols
4. Delay Slot
The SPARC Architecture uses delayed branching logic i.e. instruction appearing after a branch
instruction is executed before the branch is actually taken. Such an instruction is said to be in a
delay slot of the branch.
SPARC Assembly Programmers often use NOP (No Operation) instruction or any other
significant instruction to optimize the performance.
47
Assembler is a translator which converts assembly language of a machine into machine-
language code of that machine. It also includes information required by the loader in the
output.
Literals are plain values, not associated with any symbol name. They are used sparingly in
the program. Symbols are names assigned to values that are used frequently in the program.
Their value usually goes through many manipulations during the lifetime of the program.
Symbol names can also be used to assign a label for a memory location.
Features of assembly language include: use of human-readable mnemonics for binary
opcodes, use of symbols for operands and specification of a separate area to store variables.
An Assembler uses following data structures:
o ALP Source Code
o Machine Opcode Table (MOT, to store corresponding binary opcodes of language-
specific mnemonics)
o Pseudo-Opcode Table (POT, stores basic information about language-specific
assembler-directives and mnemonics used for declarative statement)
o Symbol Table (ST, to store information about symbols used in the program)
o Literal Table (LT, to store information about literals used in the program).
When a symbol is referenced before it is defined in the program, it is called forward
reference. Due to this, the assembler cannot assemble the entire instruction at once. This is
called as Forward Reference Problem (FRP).
To solve this FRP, two approaches are used: Multipass Approach and Single-pass
Approach.
A Multipass assembler makes two passes over the program (more passes may be needed).
In Pass 1, it only collects all symbols in ST. In Pass 2, all instructions are assembled using
this ST. Since it uses one pass is used to collect only symbols, it may be inefficient in case
of programs having very less or no forward-referenced symbols at all.
So, we use a Single-pass approach that makes only one pass through the program and uses
an extra data structure, Forward Reference Table (FRT) to resolve forward-reference
symbols.
In a Single-pass assembler, a forward-referenced symbol (along with its usage locations) is
first inserted in FRT. When these symbols get defined in ST, their definition addresses are
48
updated in FRT. At the end of the pass, FRT is used to insert definition address value into
usage locations for all forward-referenced symbols.
Features of SPARC Assembler include:
o Division of program memory into different types of sections and maintaining
different location counters for each different section.
o Use of global, local and weak symbols.
o Object file containing a list of relocation and linking operations for the Linker
o A Delayed Branching facility i.e. that instruction immediately following a branch
instruction will be actually executed before the branch is taken. Programmers can
place important instructions in this delay slot to optimize the program
performance.
Q.3) Usage Address field in Literal Table (LT) and Forward Reference Table (FRT) can have
multiple values for same literal or symbol. How it can be practically implemented?
(Ans: A Linked List! The usage address column for an entry, will point to start [i.e. head] of
the linked list. Each usage address node of the linked list will point to the next node.
Whenever a new usage address is to be appended, it will be linked to the last node of the list).
Q.4) What is a Literal Pool? (Note: Sometimes, literals may not be defined in memory
immediately after the program; a separate memory area may be reserved for them. So, literal
pool may not necessarily be immediately after the program).
Q.6) How are literals different from regular symbols? (Also, mention about difference in structure
of ST and LT).
49
Q.7) Why is assembly language preferred over machine language? (Refer section 2.3)
Q.9) What could be the possible errors during the assembly of the program?
50