0% found this document useful (0 votes)
17K views

Document

The document discusses features of machine-independent assemblers, including literals, symbol-defining statements like EQU and ORG, expressions, and program blocks. Literals allow constant values to be written directly in instructions, avoiding separate definitions. EQU and ORG define symbols and reset the location counter. Expressions must meet criteria to be absolute or relative. The assembler handles symbols and literals in tables and generates object code in blocks.

Uploaded by

gjaj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17K views

Document

The document discusses features of machine-independent assemblers, including literals, symbol-defining statements like EQU and ORG, expressions, and program blocks. Literals allow constant values to be written directly in instructions, avoiding separate definitions. EQU and ORG define symbols and reset the location counter. Expressions must meet criteria to be absolute or relative. The assembler handles symbols and literals in tables and generates object code in blocks.

Uploaded by

gjaj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

Module 3

3.1 MACHINE INDEPENDENT ASSEMBLER


FEATURES
⚫ Some common assembler features that are not closely related to the
machine architecture are as follows.
3.1.1 Literals
⚫ The programmer writes the value of a constant operand as a part of
the instruction that uses it. This avoids having to define the constant
elsewhere in the program and make a label for it.
⚫ Such an operand is called a Literal because the value is literally in
the instruction.
⚫ It is convenient to write the value of a constant operand as a part of
instruction.
⚫ A literal is identified with the prefix =, followed by a specification
of the literal value.
Example:
45 001A ENDFIL LDA =C’EOF’ 032010
Here a 3 byte literal is used as operand whose value is character string
EOF.

215 1062 WLOOP TD =X’05’ E32011


Here a 1 byte literal is used as operand with the hexadecimal value 05.
Literals vs. Immediate Operands
Literals
⚫ The assembler generates the specified value as a constant at some other
memory location. The address of this generated constant is used as the target
address for the machine instruction.

Immediate Operands

⚫ The operand value is assembled as part of the machine instruction


⚫ We can have literals in SIC, but immediate operand is only valid in
SIC/XE.
Literal Pools
⚫ Normally literals are placed into a pool at the end of the program
⚫ Literal pools shows the assigned addresses and generated data
values corresponding to each literal
⚫ In some cases, it is desirable to place literals into a pool at some
other location in the object program
⚫ For this assembler directive LTORG is used.
⚫ When the assembler encounters a LTORG statement, it generates a
literal pool (containing all literal operands used since previous
LTORG)
⚫ Reason: keep the literal operand close to the instruction
⚫ If LTORG is not used, the literal pool will be at the end of the
program and hence the literal operand would be placed too far away
from the instruction referring to it. Thus the use of LTORG avoids
the use of extended format instructions when referring to the
literals.
Duplicate literals
⚫ The same literal used more than once in the program and only one
copy of the specified value is stored is called duplicate literal.
⚫ In order to recognize the duplicate literals

1. Compare the character strings defining them


Easier to implement, but has potential problem
e.g. =X’05’ (line 215 and 230 of figure 2)
2. Compare the generated data value
◦ Better, but will increase the complexity of the assembler
◦ e.g. =C’EOF’ and =X’454F46’

◦ The assembler might avoid storing both literals if it recognized this


equivalence
Problem of duplicate-literal recognition

⚫ If we use the character string defining a literal to recognize duplicates,


we must be careful of literals whose value depends upon their location
in the program.
⚫ Suppose we allow literals that refer to the current value of the location
counter
⚫ The literal =* repeatedly used in the program has the same name, but
different values when used as operand in different instructions and both
must appear in th literal pool.
⚫ The literal “=*” represents an “address” in the program, so the
assembler must generate the appropriate “Modification records”.
Literal table – LITTAB
⚫ Assembler handles literal operands by using the data structure
LITTAB.
⚫ The contents of LITTAB includes
o Literal name
o Operand value and length
o Address assigned to the operand when it is placed in a literal pool.

⚫ LITTAB is often organized as a hash table, using the literal name or


value as the key.
Implementation of Literals
Pass 1
⚫ Build LITTAB with literal name, operand value and length, leaving
the address unassigned
⚫ When LTORG or END statement is encountered, assign an address
to each literal not yet assigned an address.
⚫ Location counter is updated to reflect the number of bytes occupied
by each literal
Pass 2
◦ Search LITTAB for each literal operand encountered
◦ Generate data values as if using BYTE or WORD statements
◦ Generate Modification record for literals that represent an
address in the program
SYMTAB & LITTAB
3.1.2 Symbol-Defining Statements
Most assemblers provide an assembler directive that allows the programmer
to define symbols and specify their values.
Assembler directive used is EQU
⚫ Syntax: symbol EQU value
The above statement enter the symbol in the SYMTAB. The value may be
a constant or any expression involving constants and previously defined
symbols.
Uses of EQU
⚫ Used to improve the program readability, avoid using magic numbers,
make it easier to find and change constant values·
Replace +LDT #4096 with
MAXLEN EQU 4096
+LDT #MAXLEN
• Another use of EQU is to define mnemonic names for registers.
• The assembler recognizes standard mnemonics for registers Eg: A,
X, L etc
• Suppose that the assembler expect register numbers instead of
names in the instruction RMO (RMO 0,1 instead of RMO A, X).In
such a case, the programmer include a sequence of statements like
● A EQU 0
● X EQU 1
◦ Then expression RMO A,X is allowed. The assembler would search
SYMTAB finding the values 0 and 1 for the symbols A and X and
assemble the instruction.
⚫ Consider a machine that has general purpose registers typically
designated by 0,1,2,…(or R0,R1, R2,….)
⚫ In a particular program, some of these may be used as base registers
, index registers, accumulators etc.
⚫ The usage of registers change from one program to next by writing
statements like
BASE EQU R1
INDEX EQU R2
Assembler directive ORG
⚫ Used to indirectly assign values to symbols.
Syntax: ORG value
⚫ The value can be constant or an expression involving
constants and previously defined symbols
⚫ When ORG is encountered, the assembler resets its LOCCTR
to the specified value.
⚫ Since the values of symbols used as labels are taken from
LOCCTR,ORG will affect the values of all labels defined
until the next ORG.
Example: using ORG
⚫ If ORG statements are used
⚫ Consider the symbol table with the following structure
Forward-Reference Problem
Forward reference is not allowed for either EQU or ORG.
All terms in the value field must have been defined previously in the
program.
The reason is that all symbols must have been defined during Pass 1 in
a two-pass assembler.

Allowed:
ALPHA RESW 1
BETA EQU ALPHA

Not Allowed:
BETA EQU ALPHA
ALPHA RESW 1
ORG ALPHA
BYTE1 RESB 1
BYTE2 RESB 1
BYTE3 RESB 1
ORG
ALPHA RESB 1

⚫ The assembler would not know what value to assign to the location
counter in response to the first ORG statement. As a result, the
symbols BYTE1, BYTE2, BYTE3 could not be assigned addresses
during pass 1
3.1.3 Expressions
⚫ The assemblers allow “the use of expressions as operand”
⚫ The assembler evaluates the expressions and produces a single
operand address or value.
⚫ Expressions consist of Operators+,-,*,/ (division is usually
defined to produce an integer result)
⚫ Individual terms in the expressions may be
o Constants
o User-defined symbols
o Special terms, e.g., *, the current value of LOCCTR
Example:
MAXLEN EQU BUFEND-BUFFER
Relocation Problem in Expressions
⚫ Values of terms can be
o Absolute (independent of program location)
constants
o Relative (to the beginning of the program)
Labels on instructions and data areas
References to the location counter value
⚫ Expressions can be
Absolute
Contains only absolute terms.
MAXLEN EQU 1000
Absolute expressions may contain relative terms in pairs with opposite
signs for each pair.
MAXLEN EQU BUFEND-BUFFER
None of the relative terms may enter into a multiplication or division
operation
Relative
All the relative terms except one can be paired as described in
“absolute”.
The remaining unpaired relative term must have a positive sign.
STAB EQU OPTAB + (BUFEND – BUFFER)
None of the relative terms may enter into a multiplication or division
operation
⚫ Expressions that do not meet the conditions of either “absolute” or
“relative” expressions should be flagged as errors.
o BUFEND + BUFFER
o 100 – BUFFER
⚫ A relative term or expression represents some value that may be written
as s+r where s is the starting address of the program and r is the value
of the term or expression relative to the starting address
⚫ When relative terms are paired with opposite signs, the dependency on
the program starting address is canceled out. The result will be an
absolute value
MAXLEN EQU BUFEND-BUFFER
⚫ BUFEND and BUFFER are relative terms representing an address
within the program. But the expression represents an absolute value that
is the difference between the two addresses which is the length of the
buffer area in bytes.
Handling Relative Symbols in SYMTAB
⚫ To determine the type of an expression, we must keep track of the
types of all symbols defined in the program.
⚫ We need a “flag” in the SYMTAB for indication.
3.1.4 Program blocks
❑ In the previous examples the source program as a whole were
handled by the assembler as a single entity resulting in a single
block of object program

❑ Within this object program the generated machine instructions and


data appeared in the same order as they were written in the source
program

❑ Some assemblers provide features that allow more flexible handling


of the source and object programs.

Some features allow the generated machine instructions and


data to appear in the object program in a different order from
the corresponding source statements.

Other features result in the creation of several independent


Program blocks v.s. Control sections
■ Program blocks
■ Segments of code that are rearranged within a

single object program unit


■ Control sections
▪ Segments of code that are translated into independent
object program units.
⚫ Figure shows the example program(copy) written using program
blocks.
⚫ 3 blocks are used here
⚫ The first(unnamed) block contains the executable instructions of the
program
⚫ The second (CDATA) contains all data that are a few words or less
in length.
⚫ The third (CBLKS) contains all data areas that consist of larger
blocks of memory.
⚫ The assembler directive USE indicates which portions of the source
program belong to various blocks.
Assembler directive: USE
■ USE [block name]
■ At the beginning, statements are assumed to be part of the
unnamed (default) block
■ If no USE statements are included, the entire program belongs
to this single block
■ The USE statement may also indicate a continuation of a
previously begun block.
❑ Each program block may actually contain several separate
segments of the source program
❑ The assembler will(logically) rearrange these statements to
gather together the pieces of each block.
❑ These blocks will then be assigned addresses in the object
program with the blocks appearing in the same order in
which they first begun in the source program
Example: pp. 81, Figure 2.12
(default) block Block number
0000 0 COPY START 0
0000 0 FIRST STL RETADR 172063
0003 0 CLOOP JSUB RDREC 4B2021
0006 0 LDA LENGTH 032060
0009 0 COMP #0 290000
000C 0 JEQ ENDFIL 332006
000F 0 JSUB WRREC 4B203B
0012 0 J CLOOP 3F2FEE
0015 0 ENDFIL LDA =C’EOF’ 032055
0018 0 STA BUFFER 0F2056
001B 0 LDA #3 010003
001E 0 STA LENGTH 0F2048
0021 0 JSUB WRREC 4B2029
0024 0 J @RETADR 3E203F
0000 1 USE CDATA CDATA block
0000 1 RETADR RESW 1
0003 1 LENGTH RESW 1
1000 MAXLE EQU BUFEND-BUFFE
0000 2 N USE RCBLKS CBLKS block
0000 2 BUFFER RESB 4096
1000 2 BUFEND EQU * 30
Example: pp. 81, Figure 2.12
(default) block
0027 0 RDREC USE
0027 0 CLEAR X B410
0029 0 CLEAR A B400
002B 0 CLEAR S B440
002D 0 +LDT #MAXLEN75101000
0031 0 RLOOP TD INPUT E32038
0034 0 JEQ RLOOP 332FFA
0037 0 RD INPUT DB2032
003A 0 COMPR A,S A004
003C 0 JEQ EXIT 332008
003F 0 STCH BUFFER,X 57A02F
0042 0 TIXR T B850
0044 0 JLT RLOOP 3B2FEA
0047 0 EXIT STX LENGTH 13201F
004A 0 RSUB 4F0000
0006 1 USE CDATA
CDATA block
0006 1 INPUT BYTE X’F1’ F1

31
Example: pp. 81, Figure 2.12
(default) block
004D 0 USE
004D 0 WRREC CLEAR X B410
004F 0 LDT LENGTH 772017
0052 0 WLOOP TD =X’05’ E3201B
0055 0 JEQ WLOOP 332FFA
0058 0 LDCH BUFFER,X 53A016
005B 0 WD =X’05’ DF2012
005E 0 TIXR T B850
0060 0 JLT WLOOP 3B2FEF
k
0063 0 RSUB
LTOR 4F0000
0007
0007 11 * GUSE CDATA 454F4
CDATA bloc
000 1 * =C’EOF 6
A FIRS 05
=X’05 T

END

32
■ Pass 1
■ A separate location counter for each program block
■ Save and restore LOCCTR when switching between

blocks
■ At the beginning of a block, LOCCTR is set to 0.

■ Assign each label an address relative to the start of the block


■ Store the block name or number in the SYMTAB along
with the assigned relative address of the label
■ Indicate the block length as the latest value of LOCCTR for
each block at the end of Pass1
■ Assign to each block a starting address in the object
program by concatenating the program blocks in a
particular order
■ Pass 2
■ Calculate the address for each symbol relative to the start
of the object program by adding
■ The location of the symbol relative to the start of its block.

■ The starting address of this block.


Example of Address Calculation (P.81)
20 0006 0 LDA LENGTH 032060
■ The value of the operand (LENGTH)
■ Address 0003 relative to Block 1 (CDATA)
■ Address 0003+0066=0069 relative to program
■ When this instruction is executed
■ PC = 0009
■ disp = 0069 – 0009 = 0060
■ op nixbpe disp
000000 110010 060 => 032060

SYMTAB
label name block num addr. Flag
LENGTH 1 0003
…. …. …. ….

37
■ It is not necessary to physically rearrange the generated code in
the object program
■ The assembler just simply insert the proper load address in each
Text record.
■ The loader will load these codes into correct place
Advantages
⚫ The separation of the program into blocks has considerably reduced the
addressing problems.
⚫ Large buffer area is moved to the end of the object program hence no
need to use extended format instructions.
⚫ Base register is no longer needed.
⚫ Problem of placement of literals solved. An LTORG statement is
included in the CDATA block to be sure that the literals are placed
ahead of any large data areas.

⚫ Program readability is better if data areas are placed in the source


program close to the statements that reference them.
3.1.5Control Sections and Program
Linking
Control sections
❑ Part of the program that maintains its identity after assembly
❑ can be loaded and relocated independently of the other control
sections
❑ Different control sections are used for subroutines or other logical
subdivisions of a program
❑ the programmer can assemble, load, and manipulate each of these
control sections separately resulting in flexibility which is a major
benefit of using control sections
❑ Control sections form logically related parts of a program hence
there should be some means for linking control sections together
■ assembler directive: CSECT
⚫ secname CSECT
■ separate location counter for each control section
⚫ Instructions in one control section may need to refer to instructions
or data located in another control section.
⚫ Since control sections are independently loaded and relocated the
assembler is unable to process these references in the usual way.
⚫ The assembler has no idea where any other control section will be
located at the execution time. Such references between control
sections are called external references.
⚫ The assembler generates information for each external reference
that will allow the loader to perform the required linking.
■ Control sections differ from the program blocks in that they are handled
separately by the assembler.
■ Symbols that are defined in one control section may not be handled
directly by another control section; they must be identified as external
references.
■ External definition
⚫ EXTDEF name [, name]
■ EXTDEF names symbols that are defined in this control section
and may be used by other sections
■ Ex: EXTDEF BUFFER, BUFEND, LENGTH
■ External reference
⚫ EXTREF name [,name]
■ EXTREF names symbols that are used in this control section and
are defined elsewhere
■ Ex: EXTREF RDREC, WRREC
■ To reference an external symbol, extended format instruction is
needed
External Reference Handling
■ Case 1 (P.87)
⚫ 15 0003 CLOOP +JSUB RDREC 4B100000

⚫ The operand RDREC is named in the EXTREF statement for the


control section.
⚫ The assembler has no idea where the control section containing
RDREC will be loaded, so it cannot assemble the address for this
instruction.
⚫ The assembler inserts an address of zero and passes information to the
loader which will cause the proper address to be inserted at load time.
⚫ The address of RDREC has no relationship with anything in this
control section; therefore relative addressing is not possible.
⚫ Thus an extended format is used to provide room for the actual address
to be inserted.
External Reference Handling
■ Case 2
190 0028 MAXLEN WORD BUFEND-BUFFER 000000
■ There are two external references in the expression, BUFEND and
BUFFER.
■ The assembler
■ inserts a value of zero
■ passes information to the loader
■ Add to this data area the address of BUFEND
■ Subtract from this data area the address of BUFFER

■ Case 3
■ On line 107, BUFEND and BUFFER are defined in the same control
section and the expression can be calculated immediately.
107 1000 MAXLEN EQU BUFEND-BUFFER

50
⚫ The assembler must remember(via entries in SYMTAB) in which
control section a symbol is defined.
⚫ Any attempt to refer to a symbol in another control section must be
flagged as an error unless the symbol is identified(using
EXTREF)as an external reference.
⚫ The assembler must allow the same symbol to be used in different
control sections.
Eg: MAXLEN
⚫ The assembler must include information in the object program that
will cause the loader to insert the proper values where they required.
⚫ 2 new record types are used for this in the object program.
■ Extended restriction
■ If relative terms are used, both the terms in each pair of an
expression must be within the same control section
■ Legal: BUFEND-BUFFER
■ Illegal: RDREC-COPY
■ How to enforce this restriction
■ When an expression involves external references, the assembler
cannot determine whether or not the expression is legal.
■ The assembler evaluates all of the terms it can, combines these to
form an initial expression value, and generates Modification
records.
■ The loader checks the expression for errors and finishes the
evaluation.

61
Assembler design options
■ One-pass assemblers
■ Multi-pass assemblers
One-Pass Assemblers
⚫ They are used when it is necessary or desirable to avoid a second pass
over the source program.
Problem
⚫ Trying to assemble a program in one pass involves forward references.
Remedy
⚫ Define all the forward reference areas before they are referenced .i.e.
placing all such data areas in the start of the source program.

⚫ Forward references to labels on instructions cannot be eliminated


easily. Hence assembler must make some provision for handling
forward references.
2 main types of one pass assembler

1. Produces object code directly in memory for immediate execution


2. Produces object program for later execution.
First case
■ Called as Load-and-go assembler that generates their object code
in memory for immediate execution.
■ No object program is written out, no loader is needed.
■ It is useful in a system oriented toward program development and
testing such that the efficiency of the assembly process is an
important consideration.
■ Programs are re-assembled nearly every time they are run,
efficiency of the assembly process is an important
consideration.
■ Since the object program is produced in memory rather than
being written out on secondary storage, the handling of forward
references becomes less difficult.
■ The assembler simply generates the object code instructions as
it scans the source program.
■ If the instruction operand is a symbol that has not yet been
defined, omit the operand address when the instruction is
assembled.
■ Enters this undefined symbol into SYMTAB and indicates that it is
undefined using flags.
■ The address of the operand field of the instruction that refers to the
undefined symbol is added to a list of forward references associated
with the symbol table entry.
■ When the definition for the symbol is encountered, scans the
reference list and inserts the address.
■ At the end of the program, reports the error if there are still
SYMTAB entries indicated undefined symbols.
■ An absolute program is used as an example.
Sample program for a one-pass
assembler Figure 2.18, pp. 94

5
Object Code in Memory and
SYMTAB Figure 2.19(a), pp.95
After scanning line 40 of the program in Fig. 2.18

7
Object Code in Memory and
SYMTAB Figure 2.19(b), pp.96
After scanning line 160 of the program in Fig. 2.18

8
Second case:
⚫ One pass assemblers that produce object programs follow a slightly
different procedure from the previous procedure.
⚫ If the operand contains an undefined symbol, use 0 as the address and
write the Text record to the object program.
⚫ Forward references are entered into lists as in the load-and-go
assembler.
⚫ When the definition of a symbol is encountered, the assembler
generates another Text record with the correct operand address of each
entry in the reference list.
⚫ When loaded, the incorrect address 0 will be updated by the latter Text
record containing the symbol definition.
Object code generated by
one-pass assembler Figure
2.18, pp.97

10
Multi-Pass Assemblers
⚫ For a two pass assembler, forward references in symbol definition
are not allowed:
ALPHA EQU BETA
BETA EQU DELTA
DELTA RESW 1
⚫ The symbol BETA cannot be assigned a value when it is
encountered during Pass1 because DELTA has not yet been
defined.
⚫ Hence ALPHA cannot be evaluated during Pass 2.
⚫ Any assembler that makes only two sequential passes over the
source program cannot resolve such a sequence of definitions.
⚫ Prohibiting forward references in symbol definition is not a serious
inconvenience.
⚫ Forward references tend to create difficulty for a person reading the
program.
⚫ The general solution for forward references is a multi-pass
assembler that can make as many passes as are needed to process
the definitions of symbols.
⚫ It is not necessary for such an assembler to make more than 2 passes
over the entire program.
⚫ The portions of the program that involve forward references in
symbol definition are saved during Pass 1.
⚫ Additional passes through these stored definitions are made as the
assembly progresses.
⚫ This process is followed by a normal Pass 2.
Implementation
⚫ For a forward reference in symbol definition, we store in the
SYMTAB:
o The symbol name
o The defining expression
o The number of undefined symbols in the defining expression
⚫ The undefined symbol (marked with a flag *) associated with a list
of symbols depend on this undefined symbol.
⚫ When a symbol is defined, we can recursively evaluate the symbol
expressions depending on the newly defined symbol.
Multi-pass assembler example
Figure 2.21, pp. 99-101
# of undefined symbols in the
defining expression
The defining expression

Depending list

Undefined symbol

13
Multi-pass assembler example
Figure 2.21, pp. 99-101

2 EQU BUFEND-BUFFER 3 PREVBT EQU BUFFER-1


MAXLEN

14
Multi-pass assembler example
Figure 2.21, pp. 99-101

4 RESB 4096 5 BUFEND EQU *


BUFFER

15

You might also like