IBM Personal Computer Assembly Language Tutorial
IBM Personal Computer Assembly Language Tutorial
Language Tutorial
Joshua Auerbach
Yale University
Yale Computer Center
175 Whitney Avenue
P. O. Box 2112
New Haven, Connecticut 06520
Installation Code YU
Integrated Personal Computers Project
Communications Group
Communications and Data Base Division
Session C316
This talk is for people who are just getting started with the PC MACRO
Assembler. Maybe you are just contemplating doing some coding in assembler,
maybe you have tried it with mixed success. If you are here to get aimed in
the right direction, to get off to a good start with the assembler, then you
have come for the right reason. I can't promise you'll get what you want, but
I'll do my best.
On the other hand, if you have already turned out some working assembler
code, then this talk is likely to be on the elementary side for you. If you
want to review a few basics and have no where else pressing to go, then by all
means stay.
The reasons for LEARNING assembler are not the same as the reasons for USING
it in a particular application. But, we have to start with some of the reasons
for using it and then I think the reasons for learning it will become clear.
First, let's dispose of a bad reason for using it. Don't use it just because
you think it is going to execute faster. A particular sequence of ordinary
bread-and-butter computations written in PASCAL, C, FORTRAN, or compiled BASIC
can do the job just about as fast as the same algorithm coded in assembler. Of
course, interpretive BASIC is slower, but if you have a BASIC application which
runs too slow you probably want to try compiling it before you think too much
about translating parts of it to another language.
On the other hand, high level languages do tend to isolate you from the
machine. That is both their strength and their weakness. Usually, when
implemented on a micro, a high level language provides an escape mechanism to
the underlying operating system or to the bare machine. So, for example, BASIC
has its PEEK and POKE. But, the route to the bare machine is often a
circuitous one, leading to tricky programming which is hard to follow.
Sometimes, the system or the language does too little for you. For example,
with the asynch adapter, the system provides no interrupt handler, no buffer,
and no flow control. The application is stuck with the responsibility for
monitoring that port and not missing any characters, then deciding what to do
with all errors. BASIC does a reasonable job on some of this, but that is only
BASIC. Most other languages do less. Sometimes, the system may do too much
for you. System support for the keyboard is an example. At the hardware
1
level, all 83 keys on the keyboard send unique codes when they are pressed,
held down, and released. But, someone has decided that certain keys, like Num
Lock and Scroll Lock are going to do certain things before the application even
sees them and can't therefore be used as ordinary keys.
Sometimes, the system does about the right amount of stuff but does it less
efficiently then it should. System support for the screen is in this class.
If you use only the official interface to the screen you sometimes slow your
application down unacceptably. I said before, don't use assembler just to
speed things up, but there I was talking about mainline code, which generally
can't be speeded up much by assembler coding. A critical system interface is a
different matter: sometimes we may have to use assembler to bypass a
hopelessly inefficient implementation.We don't want to do this if we can
avoid it, but sometimes we can't.
Assembly language code can overcome these deficiencies. In some cases, you
can also overcome these deficiencies by judicious use of the escape valves
which your high level language provides. In BASIC, you can PEEK and POKE and
INP and OUT your way around a great many issues. In many other languages you
can issue system calls and interrupts and usually manage, one way or other, to
modify system memory. Writing handlers to take real-time hardware interrupts
from the keyboard or asynch port, though, is still going to be a problem in
most languages. Some languages claim to let you do it but I have yet to see an
acceptably clean implementation done that way.
The real reason while assembler is better than "tricky POKEs" for writing
machine-dependent code, though, is the same reason why PASCAL is better than
assembler for writing a payroll package: it is easier to maintain. Let the
high level language do what it does best, but recognize that there are some
things which are best done in assembler code. The assembler, unlike the tricky
POKE, can make judicious use of equates, macros, labels, and appropriately
placed comments to show what is really going on in this machine-dependent realm
where it thrives.
So, there are times when it becomes appropriate to write in assembler; given
that, if you are a responsible programmer or manager, you will want to be
"assembler-literate" so you can decide when assembler code should be written.
1. Learn the 8086 architecture and most of the instruction set. Learn what
you need to know and ignore what you don't. Reading: The 8086 Primer by
Stephen Morse, published by Hayden. You need to read only two chapters, the
one on machine organization and the one on the instruction set.
2. Learn about a few simple DOS function calls. Know what services the
operating system provides. If appropriate, learn a little about other systems
too. It will aid portability later on. Reading: appendices D and E of the PC
DOS manual.
3. Learn enough about the MACRO assembler and the LINKer to write some
simple things that really work. Here, too, the main thing is figuring out what
you don't need to know. Whatever you do, don't study the sample programs
distributed with the assembler unless you have nothing better!
2
4. At the same time as you are learning the assembler itself, you will need
to learn a few tools and concepts to properly combine your assembler code with
the other things you do. If you plan to call assembler subroutines from a high
level language, you will need to study the interface notes provided in your
language manual. Usually, this forms an appendix of some sort. If you plan to
package your assembler routines as .COM programs you will need to learn to do
this. You should also learn to use DEBUG.
5. Read the Technical Reference, but very selectively. The most important
things to know are the header comments in the BIOS listing. Next, you will
want to learn about the RS 232 port and maybe about the video adapters.
Notice that the key thing in all five phases is being selective. It is easy
to conclude that there is too much to learn unless you can throw away what you
don't need. Most of the rest of this talk is going to deal with this very
important question of what you need and don't need to learn in each phase. In
some cases, I will have to leave you to do almost all of the learning, in
others, I will teach a few salient points, enough, I hope, to get you started.
I hope you understand that all I can do in an hour is get you started on the
way.
The Morse book might seem like a lot of book to buy for just two really
important chapters; other books devote a lot more space to the instruction set
and give you a big beautiful reference page on each instruction. And, some of
the other things in the Morse book, although interesting, really aren't very
vital and are covered too sketchily to be of any real help. The reason I like
the Morse book is that you can just read it; it has a very conversational
style, it is very lucid, it tells you what you really need to know, and a
little bit more which is by way of background; because nothing really gets
belabored to much, you can gracefully forget the things you don't use. And, I
very much recommend READING Morse rather than studying it. Get the big picture
at this point.
Now, you want to concentrate on those things which are worth fixing in
memory. After you read Morse, you should relate what you have learned to this
outline.
1. You want to fix in your mind the idea of the four segment registers CODE,
DATA, STACK, and EXTRA. This part is pretty easy to grasp. The 8086 and the
8088 use 20 bit addresses for memory, meaning that they can address up to 1
megabyte of memory. But, the registers and the address fields in all the
instructions are no more that 16 bits long. So, how to address all of that
memory? Their solution is to put together two 16 bit quantities like this:
In other words, any time memory is accessed, your program will supply a
sixteen bit address. Another sixteen bit address is acquired from a segment
register, left shifted four bits (one nibble) and added to it to form the real
address. You can control the values in the segment registers and thus access
any part of memory you want. But the segment registers are specialized: one
for code, one for most data accesses, one for the stack (which we'll mention
again) and one "extra" one for additional data accesses.
Most people, when they first learn about this addressing scheme become
3
obsessed with converting everything to real 20 bit addresses. After a while,
though, you get use to thinking in segment/offset form. You tend to get your
segment registers set up at the beginning of the program, change them as little
as possible, and think just in terms of symbolic locations in your program, as
with any assembly language.
EXAMPLE:
MOV AX,DATASEG
MOV DS,AX ;Set value of Data segment
ASSUME DS:DATASEG ;Tell assembler DS is usable
.......
MOV AX,PLACE ;Access storage symbolically by 16 bit address
In the above example, the assembler knows that no special issues are involved
because the machine generally uses the DS register to complete a normal data
reference.
If you had used ES instead of DS in the above example, the assembler would
have known what to do, also. In front of the MOV instruction which accessed
the location PLACE, it would have placed the ES segment prefix. This would
tell the machine that ES should be used, instead of DS, to complete the
address.
2. You will want to learn what other registers are available and learn their
personalities:
AX and DX are general purpose registers. They become special only when
accessing machine and system interfaces.
AX-DX can be divided in half, forming AH, AL, BH, BL, CH, CL, DH, DL.
SI and DI are strictly 16 bit. They can be used to form indexed addresses
(like BX) and they are also used to point to strings.
Most sixteen bit operations are legal (even if unusual) when performed in SI,
DI, SP, or BP.
a. 8086 and 8088 instructions can be broken up into subfields and bits with
names like R/M, MOD, S and W. These parts of the instruction modify the basic
4
operation in such ways as whether it is 8 bit or 16 bit, if 16 bit, whether all
16 bits of the data are given, whether the instruction is register to register,
register to memory, or memory to register, for operands which are registers,
which register, for operands which are memory, what base and index registers
should be used in finding the data.
There is no point in memorizing any of this detail; just distill the bottom
line, which is, what kinds of operand combinations EXIST in the instruction set
and what kinds don't. If you ask the assembler to ADD two things and the two
things are things for which there is a legal ADD instruction somewhere in the
instruction set, the assembler will find the right instruction and fill in all
the modifier fields for you.
I guess if you memorized all the opcode construction rules you might have a
crack at being able to disassemble hex dumps by eye, like you may have learned
to do somewhat with 370 assembler. I submit to you that this feat, if ever
mastered by anyone, would be in the same class as playing the "Minute Waltz" in
a minute; a curiosity only. Here is the basic matrix you should remember:
1) ADD and ADC -- addition, with or without including a carry from a previous
addition 2) SUB and SBB -- subtraction, with or without including a borrow from
a previous subtraction 3) CMP -- compare. It is useful to think of this as a
subtraction with the answer being thrown away and neither operand actually
changed 4) AND, OR, XOR -- typical boolean operations 5) TEST -- like an AND,
except the answer is thrown away and neither operand is changed. 6) MOV --
move data from source to target 7) LDS, LES, LEA -- some specialized forms of
MOV with side effects
b. Ordinary one operand instructions. These can take any of the operand
5
forms described above. Usually, the perform the operation and leave the result
in the stated place:
c. Now you touch on some instructions which do not follow the general
operand rules but which require the use of certain registers. The important
ones are:
They include:
f. Flag instructions: CLI, STI, CLD, STD, CLC, STC. These can set or
clear the interrupt (enabled) direction (for string operations) or
carry flags.
The addressing summary and the instruction summary given above masks a lot of
annoying little exceptions. For example, you can't POP CS, and although the R
<-- M form of LES is legal, the M <-- R form isn't etc. etc. My advice is:
6
a. Go for the general rules
5. A few instructions are rich enough and useful enough to warrent careful
study. Here are a few final study guidelines:
a. It is well worth the time learning to use the string instruction set
effectively. Among the most useful are:
b. Similarly, if you have never written for a stack machine before, you will
need to exercise PUSH and POP and get very comfortable with them because they
are going to be good friends. If you are used to the 370, with lots of general
purpose registers, you may find yourself feeling cramped at first, with many
fewer registers and many instructions having register restrictions. But, you
have a hidden ally: you need a register and you don't want to throw away
what's in it? Just PUSH it, and when you are done, POP it back. This can lead
to abuse. Never have more than two "expedient" PUSHes in effect and never
leave something PUSHed across a major header comment or for more than 15
instructions or so. An exception is the saving and restoring of registers at
entrance to and exit from a subroutine; here, if the subroutine is long, you
should probably PUSH everything which the caller may need saved, whether you
will use the register or not, and POP it in reverse order at the end. Be aware
that CALL and INT push return address information on the stack and RET and IRET
pop it off. It is a good idea to become familiar with the structure of the
stack.
c. In practice, to invoke system services you will use the INT instruction.
It is quite possible to use this instruction effectively in a cookbook fashion
without knowing precisely how it works.
1) all three have the capability of being either NEAR (CS register unchanged)
or FAR (CS register changed)
3) if NEAR and DIRECT, a JMP can be SHORT (less than 128 bytes away) or LONG
In general, the third issue is not worth worrying about. On a forward jump
which is clearly VERY short, you can tell the assembler it is short and save
one byte of code:
e. The conditional jump set is rather confusing when studied apart from the
assembler, but you do need to get a feeling for it. The interactions of the
sign, carry, and overflow flags can get your mind stuttering pretty fast if you
worry about it too much. What is boils down to, though, is
You should understand that all conditional jumps are inherently DIRECT, NEAR,
and "short"; the "short" part means that they can't go more than 128 bytes in
either direction. Again, this is something you could easily imagine to be more
of a problem than it is. I follow this simple approach:
The latter, of course, is a jump around a jump. Some would say it is evil,
but I submit it is hard to avoid in this language.
b) also consider changing some conditional jumps to their opposite and use
the "jump around a jump" approach as shown above.
6. Finally, in order to use the assembler effectively, you need to know the
default rules for which segment registers are used to complete addresses in
which situations.
I think the best way to learn about DOS internals is to read the technical
appendices in the manual. These are not as complete as we might wish, but they
really aren't bad; I certainly have learned a lot from them. What you don't
learn from them you might eventually learn via judicious disassembly of parts
of DOS, but that shouldn't really be necessary.
From reading the technical appendices, you learn that interrupts 20H through
27H are used to communicate with DOS. Mostly, you will use interrupt 21H, the
DOS function manager.
The function manager implements a great many services. You request the
individual services by means of a function code in the AH register. For
example, by putting a nine in the AH register and issuing interrupt 21H you
tell DOS to print a message on the console screen.
Usually, but by no means always, the DX register is used to pass data for the
service being requested. For example, on the print message service just
mentioned, you would put the 16 bit address of the message in the DX register.
The DS register is also implicitly part of this argument, in keeping with the
universal segmentation rules.
Most of the functions originally offered in DOS 1.0 were direct descendents
of CP/M functions; there is even a compatibility interface so that programs
which have been translated instruction for instruction from 8080 assembler to
8086 assembler might have a reasonable chance of running if they use only the
core CP/M function set. Among the most generally useful in this original
compatibility set are:
The next set provide no function above what you can get with BIOS calls or
more specialized DOS calls. However, they are preferable to BIOS calls when
portability is an issue.
00 -- terminate execution
01 -- read keyboard character
02 -- write screen character
03 -- read COM port character
04 -- write COM port character
05 -- print a character
06 -- read keyboard or write screen with no editing
The standard file I/O calls are inferior to the specialized DOS calls but
have the advantage of making the program easier to port to CP/M style systems.
Thus they are worth mentioning:
In addition to the CP/M compatible services, DOS also offers some specialized
services which have been available in all releases of DOS. These include:
All of the calls mentioned above which have anything to do with files make
use of a data area called the "FILE CONTROL BLOCK" (FCB). The FCB is anywhere
from 33 to 37 bytes long depending on how it is used. You are responsible for
creating an FCB and filling in the first 12 bytes, which contain a drive code,
a file name, and an extension.
When you open the FCB, the system fills in the next 20 bytes, which includes
a logical record length. The initial lrecl is always 128 bytes, to achieve
CP/M compatibility. The system also provides other useful information such as
the file size.
After you have opened the FCB, you can change the logical record length. If
you do this, your program is no longer CP/M compatible, but that doesn't make
it a bad thing to do. DOS documentation suggests you use a logical record
length of one for maximum flexibility.This is usually a good recommendation.
In general, you do not need to (and should not) modify other parts of the
FCB.
The FCB is pretty well described in appendix E of the DOS manual. Beginning
with DOS 2.0, there is a whole new system of calls for managing files which
don't require that you build an FCB at all. These calls are quite incompatible
with CP/M and also mean that your program cannot run under older releases of
10
DOS. However, these calls are very nice and easy to use. They have these
characteristics
2. The open and create calls return a 16 bit value which is simply placed in
the BX register on subsequent calls to refer to the file.
4. Any number of bytes can be transfered on a single call; no data area must
be manipulated to do this.
The "new" DOS calls also include comprehensive functions to manipulate the
new chained directory structure and to allocate and free memory.
It is my feeling that many people can teach themselves to use the assembler
by reading the MACRO Assembler manual if:
1. You have read and understood a book like Morse and thus have a feeling
for the instruction set
2. You know something about DOS services and so can communicate with the
keyboard and screen and do something marginally useful with files. In the
absence of this kind of knowledge, you can't write meaningful practice programs
and so will not progress.
3. You have access to some good examples (the ones supplied with the
assembler are not good, in my opinion.I will try to supply you with some more
relevant ones.
4. You ignore the things which are most confusing and least useful.
Some of
the most confusing aspects of the assembler include the facilities combining
segments. But, you can avoid using all but the simplest of these facilities in
many cases, even while writing quite substantial applications.
At this point, it is necessary to talk about COM programs and EXE programs.
As you probably know, DOS supports two kinds of executable files. EXE programs
are much more general, can contain many segments, and are generally built by
compilers and sometimes by the assembler. If you follow the lead given by the
samples distributed with the assembler, you will end up with EXE programs. A
COM program, in contrast, always contains just one segment, and receives
control with all four segment registers containing the same value. A COM
program, thus, executes in a simplified environment, a 64K address space. You
can go outside this address space simply by temporarily changing one segment
register, but you don't have to, and that is the thing which makes COM programs
nice and simple. Let's look at a very simple one.
The classic text on writing programs for the C language says that the first
thing you should write is a program which says
11
HELLO, WORLD.
when invoked. What's sauce for C is sauce for assembler, so let's start with
a HELLO program of our own. My first presentation of this will be bare bones,
not stylistically complete, but just an illustration of what an assembler
program absolutely has to have:
First, let's attend to some obvious points. The macro assembler uses the
general form
Unlike the 370 assembler, though, comments are NOT set off from operands by
blanks. The syntax uses blanks as delimiters within the operand field (see
line 6 of the example) and so all comments must be set off by semi-colons.
Line comments are frequently set off with a semi-colon in column 1. I use this
approach for block comments too, although there is a COMMENT statement which
can be used to introduce a block comment.
Being an old 370 type, I like to see assembler code in upper case, although
my comments are mixed case. Actually, the assembler is quite happy with mixed
case anywhere.
As with any assembler, the core of the opcode set consists of opcodes which
generate machine instructions but there are also opcodes which generate data
and ones which function as instructions to the assembler itself, sometimes
called pseudo-ops. In the example, there are five lines which generate machine
code (JMP, MOV, MOV, INT, RET), one line which generates data (DB) and five
pseudo-ops (SEGMENT, ASSUME, ORG, ENDS, and END).
Now, about labels. You will see that some labels in the example end in a
colon and some don't. This is just a bit confusing at first, but no real
mystery. If a label is attached to a piece of code (as opposed to data), then
the assembler needs to know what to do when you JMP to or CALL that label. By
convention, if the label ends in a colon, the assembler will use the NEAR form
of JMP or CALL. If the label does not end in a colon, it will use the FAR
form. In practice, you will always use the colon on any label you are jumping
to inside your program because such jumps are always NEAR; there is no reason
to use a FAR jump within a single code section. I mention this, though,
because leaving off the colon isn't usually trapped as a syntax error, it will
generally cause something more abstruse to go wrong.
Machine instructions will generally take zero, one or two operands. Where
there are two operands, the one which receives the result goes on the left as
in 370 assembler.
12
I tried to explain this before, now maybe it will be even clearer: there are
many more 8086 machine opcodes then there are assembler opcodes to represent
them. For example, there are five kinds of JMP, four kinds of CALL, two kinds
of RET, and at least five kinds of MOV depending on how you count them. The
macro assembler makes a lot of decisions for you based on the form taken by the
operands or on attributes assigned to symbols elsewhere in your program. In
the example above, the assembler will generate the NEAR DIRECT form of JMP
because the target label BEGIN labels a piece of code instead of a piece of
data (this makes the JMP DIRECT) and ends in a colon (this makes the JMP NEAR).
The assembler will generate the immediate forms of MOV because the form OFFSET
MSG refers to immediate data and because 9 is a constant. The assembler will
generate the NEAR form of RET because that is the default and you have not told
it otherwise.
The DB (define byte) pseudo-op is an easy one: it is used to put one or more
bytes of data into storage. There is also a DW (define word) pseudo-op and a
DD (define doubleword) pseudo-op; in the PC MACRO assembler, the fact that a
label refers to a byte of storage, a word of storage, or a doubleword of
storage can be very significant in ways which we will see presently.
About that OFFSET operator, I guess this is the best way to make the point
about how the assembler decides what instruction to assemble: an analogy with
370 assembler:
PLACE DC ......
...
LA R1,PLACE
L R1,PLACE
In 370 assembler, the first instruction puts the address of label PLACE in
register 1, the second instruction puts the contents of storage at label PLACE
in register 1. Notice that two different opcodes are used. In the PC
assembler, the analogous instructions would be:
PLACE DW ......
...
MOV DX,OFFSET PLACE
MOV DX,PLACE
If PLACE is the label of a word of storage, then the second instruction will
be understood as a desire to fetch that data into DX. If X is a label, then
"OFFSET X" means "the ordinary number which represents X's offset from the
start of the segment." And, if the assembler sees an ordinary number, as
opposed to a label, it uses the instruction which is equivalent to LA.
MOV DX,PLACE
21H
is hexidecimal 21,
00010000B
is the eight bit binary number pictured.
13
The next elements we should point to are the SEGMENT...ENDS pair and the END
instruction. Every assembler program has to have these elements. SEGMENT
tells the assembler you are starting a section of contiguous material (code
and/or data). The symmetrically named ENDS statement tells the assembler you
are finished with a section of contiguous material. I wish they didn't use the
word SEGMENT in this context. To me, a "segment" is a hardware construct: it
is the 64K of real storage which becomes addressable by virtue of having a
particular value in a segment register. Now, it is true that the "segments"
you make with the assembler often correspond to real hardware "segments" at
execution time. But, if you look at things like the GROUP and CLASS options
supported by the linker, you will discover that this correspondence is by no
means exact. So, at risk of maybe confusing you even more, I am going to use
the more informal term "section" to refer to the area set off by means of the
SEGMENT and ENDS instructions. The sections delimited by SEGMENT...ENDS pairs
are really a lot like CSECTs and DSECTs in the 370 world.
name SEGMENT
name SEGMENT PUBLIC
name SEGMENT AT nnn
Basically, you can get away with just the three forms given above. The first
form is what you use when you are writing a single section of assembler code
which will not be combined with other pieces of code at link time. The second
form says that this assembly only contains part of the section; other parts
might be assembled separately and combined later by the linker.
I have found that one can construct reasonably large modular applications in
assembler by simply making every assembly use the same segment name and
declaring the name to be PUBLIC each time. If you read the assembler and
linker documentation, you will also be bombarded by information about more
complex options such as the GROUP statement and the use of other "combine
types" and "classes." I don't recommend getting into any of that. I will talk
more about the linker and modular construction of programs a little later. The
assembler manual also implies that a STACK segment is required. This is not
really true. There are numerous ways to assure that you have a valid stack at
execution time.
Of course, if you plan to write applications in assembler which are more than
64K in size, you will need more than what I have told you; but who is really
going to do that? Any application that large is likely to be coded in a higher
level language.
The third form of the SEGMENT statement makes the delineated section into
something like a "DSECT;" that is, it doesn't generate any code, it just
describes what is present somewhere already in the computer's memory.
Sometimes the AT value you give is meaningful. For example, the BIOS work
area
is located at location 40 hex. So, you might see
in a program which was interested in mucking around in the BIOS work area.
At other times, the AT value you give may be arbitrary, as when you are mapping
14
a repeated control block:
MOV AL,EQUIP
Enough about SEGMENT. The END statement is simple. It goes at the end of
every assembly. When you are assembling a subroutine, you just say
END
but when you are assembling the main routine of a program you say
END label
I guess I have explained everything in the program except that ORG pseudo-op.
ORG means the same thing as it does in many assembly languages. It tells the
assembler to move its location counter to some particular address. In this
case, we have asked the assembler to start assembling code hex 100 bytes from
the start of the section called HELLO instead of at the very beginning. This
simply reflects the way COM programs are loaded. When a COM program is loaded
by the system, the system sets up all four segment registers to address the
same 64K of storage. The first 100 hex bytes of that storage contains what is
called the program prefix; this area is described in appendix E of the DOS
manual. Your COM program physically begins after this. Execution begins with
the first physical byte of your program; that is why the JMP instruction is
there.
Wait a minute, you say, why the JMP instruction at all? Why not put the data
at the end? Well, in a simple program like this I probably could have gotten
away with that. However, I have the habit of putting data first and would
encourage you to do the same because of the way the assembler has of assembling
different instructions depending on the nature of the operand.
Unfortunately,
sometimes the different choices of instruction which can assemble from a single
opcode have different lengths. If the assembler has already seen the data when
it gets to the instructions it has a good chance of reserving the right number
of bytes on the first pass. If the data is at the end, the assembler may not
15
have enough information on the first pass to reserve the right number of bytes
for the instruction. Sometimes the assembler will complain about this,
something like "Forward reference is illegal" but at other times, it will make
some default assumption. On the second pass, if the assumption turned out to
be wrong, it will report what is called a "Phase error," a very nasty error to
track down. So get in the habit of putting data and equated symbols ahead of
code.
OK. Maybe you understand the program now. Let's walk through the steps
involved in making it into a real COM file.
1. The file should be created with the name HELLO.ASM (actually the name is
arbitrary but the extension .ASM is conventional and useful)
2. Now type:
ASM HELLO,,;
(this is just one example of invoking the assembler; it uses the small
assembler ASM, it produces an object file and a listing file with the same name
as the source file. I am not going exhaustively into how to invoke the
assembler, which the manual goes into pretty well. I guess this is the first
time I mentioned that there are really two assemblers; the small assembler ASM
will run in a 64K machine and doesn't support macros. I used to use it all the
time; now that I have a bigger machine and a lot of macro libraries I use the
full function assembler MASM. You get both when you buy the package).
3. If you issue DIR at this point, you will discover that you have acquired
HELLO.OBJ (the object code resulting from the assembly) and HELLO.LST (a
listing file). I guess I can digress for a second here concerning the listing
file. It contains TAB characters. I have found there are two good ways to get
it printed and one bad way. The bad way is to use LPT1: as the direct target
of the listing file or to try copying the LST file to LPT1 without first
setting the tabs on the printer. The two good ways are to either:
b. direct to LPT1: but first send the right escape sequence to LPT1 to set
the tabs every eight columns. I have found that on some early serial numbers
of the IBM PC printer, tabs don't work quite right, which forces you to the
first option.
4. Now type:
LINK HELLO;
(again, there are lots of linker options but this is the simplest. It takes
HELLO.OBJ and makes HELLO.EXE). HELLO.EXE? I thought we were making a COM
program, not an EXE program. Right. HELLO.EXE isn't really executable; its
just that the linker doesn't know about COM programs. That requires another
utility. You don't have this utility if you are using DOS 1.0; you have it if
you are using DOS 1.1 or DOS 2.0. Oh, by the way, the linker will warn you
that you have no stack segment. Don't worry about it.
5. Now type:
This is the final step. It produces the actual program you will execute.
Note that you have to spell out HELLO.COM; for a nominally rational but
actually perverse reason, EXE2BIN uses the default extension BIN instead of COM
16
for its output file. At this point, you might want to erase HELLO.EXE; it
looks a lot more useful than it is. Chances are you won't need to recreate
HELLO.COM unless you change the source and then you are going to have to redo
the whole thing.
6. Now type:
HELLO
HELLO YOURSELF!!!
I started with a simple COM program because I actually think they are easier
to create than subroutines to be called from high level languages, but maybe
its really the latter you are interested in. Here, I think you should get
comfortable with the assembler FIRST with little exercises like the one above
and also another one which I will finish up with.
Next you are ready to look at the interface information for your particular
language. You usually find this in some sort of an appendix. For example, the
BASIC manual has Appendix C on Machine Language Subroutines. The PASCAL manual
buries the information a little more deeply: the interface to a separately
compiled routine can be found in the Chapter on Procedures and Functions, in a
subsection called Internal Calling Conventions.
Each language is slightly different, but here are what I think are some
common issues in subroutine construction.
1. NEAR versus FAR? Most of the time, your language will probably call your
assembler routine as a FAR routine. In this case, you need to make sure the
assembler will generate the right kind of return. You do this with a
PROC...ENDP statement pair. The PROC statement is probably a good idea for a
NEAR routine too even though it is not strictly required:
With FAR linkage, it doesn't really matter what you call the segment. you
must declare the name by which you will be called in a PUBLIC pseudo-op and
also show that it is a FAR procedure. Only CS will be initialized to your
segment when you are called. Generally, the other segment registers will
continue to point to the caller's segments. With NEAR linkage, you are
executing in the same segment as the caller. Therefore, you must give the
segment a specific name as instructed by the language manual. However, you may
be able to count on all segment registers pointing to your own segment
(sometimes the situation can be more complicated but I cannot really go into
all of the details). You should be aware that the code you write will not be
17
the only thing in the segment and will be physically relocated within the
segment by the linker. However, all OFFSET references will be relocated and
will be correct at execution time.
ARGS STRUC
DW 3 DUP(?) ;Saved BP and return address
ARG3 DW ?
ARG2 DW ?
ARG1 DW ?
ARGS ENDS
...........
PUSH BP ;save BP register
MOV BP,SP ;Use BP to address stack
MOV ...,[BP].ARG2 ;retrieve second argument
(etc.)
What you are doing here is using BP to address the stack, accounting for the
word where you saved the caller's BP and also for the two words which were
pushed by the CALL instruction.
3. How big is the stack? BASIC only gives you an eight word stack to play
with. On the other hand, it doesn't require you to save any registers except
the segment registers. Other languages give you a liberal stack, which makes
things a lot easier. If you have to create a new stack segment for yourself,
the easiest thing is to place the stack at the end of your program and:
Later, you can reverse these steps before returning to the caller. At the
end of your program, you place the stack itself:
4. Make sure you save and restore those registers required by the caller.
You can't do everything with DOS calls. You may need to learn something
about the BIOS and about the hardware itself. In this, the Technical Reference
is a very good thing to look at.
The first thing you look at in the Technical Reference, unless you are really
determined to master the whole ball of wax, is the BIOS listings presented in
Appendix A. Glory be: here is the whole 8K of ROM which deals with low level
hardware support layed out with comments and everything. In fact, if you are
just interested in learning what BIOS can do for you, you just need to read the
header comments at the beginning of each section of the listing.
BIOS services are invoked by means of the INT instruction; the BIOS occupies
interrupts 10H through 1FH and also interrupt 5H; actually, of these seventeen
interrupts, five are used for user exit points or data pointers, leaving twelve
actual services.
I am not going to summarize the most useful BIOS features here; you will see
some examples in the next sample program we will look at.
The other thing you might want to get into with the Tech reference is the
description of some hardware options, particularly the asynch adapter, which
are not well supported in the BIOS. The writeup on the asynch adapter is
pretty complete.
Actually, the Tech reference itself is pretty complete and very nice as far
as it goes. One thing which is missing from the Tech reference is information
on the programmable peripheral chips on the system board. These include:
To make your library absolutely complete, you should order the INTEL data
sheets for these beasts.
I should say, though, that the only I ever found I needed to know about was
the interrupt controller. If you happen to have the 8086 Family User's Manual,
the big book put out by INTEL, which is one of the things people sometimes buy
to learn about 8086 architecture, there is an appendix there which gives an
adequate description of the 8259.
A final example
_______________
19
I leave you with a more substantial example of code which illustrates some
good elementary techniques; I won't claim its style is perfect, but I think it
is adequate. I think this is a much more useful example than what you will get
with the assembler:
PAGE 61,132
TITLE SETSCRN -- Establish correct monitor use at boot time
;
; This program is a variation on many which toggle the equipment flags
; to support the use of either video option (monochrome or color).
; The thing about this one is it prompts the user in such a way that he
; can select the use of the monitor he is currently looking at (or which
; is currently connected or turned on) without really having to know
; which is which. SETSCRN is a good program to put first in an
; AUTOEXEC.BAT file.
;
; This program is highly dependent on the hardware and BIOS of the IBMPC
; and is hardly portable, except to very exact clones. For this
reason,
; BIOS calls are used in lieu of DOS function calls where both provide
; equal function.
;
OK. That's the first page of the program. Notice the PAGE statement, which
you can use to tell the assembler how to format the listing. You give it lines
per page and characters per line. I have mine setup to print on the host
lineprinter; I routinely upload my listings at 9600 baud and print them on the
host; it is faster than using the PC printer. There is also a TITLE statement.
This simply provides a nice title for each page of your listing. Now for the
second page:
You will also see illustrated the EQU instruction, which just gives a
symbolic name to a number. I don't make a fetish of giving a name to every
single number in a program. I do feel strongly, though, that interrupts and
function codes, where the number is arbitrary and the function being performed
is the thing of interest, should always be given symbolic names.
One last new element in this section is the define doubleword (DD)
instruction. A doubleword constant can refer, as in this case, to a location
in another segment. The assembler will be happy to use information at its
disposal to properly assemble it. In this case, the assembler knows that EQUIP
is offset 10 in the segment BIOSDATA which is at 40H.
The main code section makes use of subroutines to keep the basic flow simple.
About all that's new to you in this section is the use of the BIOS interrupt
KBD to read a character from the keyboard.
The instructions LES and LDS are useful ones for dealing with doubleword
addresses. The offset is loaded into the operand register and the segment into
ES (for LES) or DS (for LDS). By telling the assembler, with an ASSUME, that
ES now addresses the BIOSDATA segment, it is able to correctly assemble the OR
and AND instructions which refer to the EQUIP byte. An ES segment prefix is
added.
To understand the action here, you simply need to know that flags in that
particular byte control how the BIOS screen service initializes the adapters.
BIOS will only work with one adapter at a time; by setting the equipment flags
to show one or the other as installed and calling BIOS screen initialization,
we achieve the desired effect.
22