CS221 Assembly Language Fundamentals: Irvine Chapter 3: Constant Expressions
CS221 Assembly Language Fundamentals: Irvine Chapter 3: Constant Expressions
While we can write assembly directly in machine code, it is not a very convenient method
for larger programs. Programmers aren’t able to insert new lines of code very easily,
reference symbolic names, and other niceties that make programming easier. For this
reason we will start to use MASM (the Microsoft Assembler) to write x86 assembly
programs for the rest of the class. MASM is an assembler that has many of the same
features that you are probably used to when working with higher-level programming
languages.
If your are installing MASM at home on your own computer, see the link from the CS221
web page on “Installing MASM” for help on getting it up and running. I will assume that
you are using TextPad as your editor, although you are free to use any editor and
debugging environment.
Constant Expressions
By default, numeric literals in MASM are in decimal. Note that this is different from
debug, which used a default of hex. In MASM you have the ability to express numbers
in a variety of formats by adding a letter on the end of the literal to indicate the base:
100 decimal
100b binary
100h hexadecimal
100q octal
0FFh hexadecimal
Note the last example. Hex constants that start with a letter must be preceded by a zero.
This is so the assembler doesn’t get the hex value confused with a symbolic identifier
(e.g., an identifier named “FFh”).
100 * 2
-3 / 4
1+3
10 mod 6
These expressions are evaluated at assembly time, not at runtime. This means in the
example of 1+3, our assembled program will contain the number 4. The assembler does
the math, not the program during runtime.
pi = 3.14159
rows = 10 * 10
max = 100
Although it looks like these are variables, they are not the same! They are constant
expressions and may be redefined, but they cannot be used as a storage like a variable
can.
In the above, we redefined the value of the constant somenum. The following code
would be invalid:
somenum = 0FFh
MOV somenum, ah ; INVALID
Enclosing the data in either single or double quotation marks can represent character
strings in ASCII. The following are all valid strings. Note embedded quotes:
Just as with high level languages, we have a list of words called reserved words that have
special meaning. You are not allowed to use reserved words for your identifiers. A
sample of reserved words is below:
Statements
A statement consists of a name, mnemonic, operands, and comment. There are two
types of statements, instructions and directives. Instructions are executable statements
that include a mnemonic op code. Directives are statements that provide information to
the assembler, but do not include executable op codes.
somenum = \
55
Identifiers
A variable is a location in the program’s data area that has been assigned a name. Here
is an example that defines a byte named “count1” and initialized the value to 50:
count1 db 50
A label is a name that appears in the code area of a program. A label serves as a
placemarker when a program needs to jump or loop back to some other instruction.
Rather than use line numbers that might change when instructions are added or removed,
labels remain placeholders and the line number they are on is recalculated when the
program is assembled. The following is an example of a label:
Comments
We can use the ; character to comment out everything that comes after the semicolon.
To comment out an entire block, use:
COMMENT !
This line is a comment
So is this line
…
!
Sample Program
Here is a sample program that adds and subtracts numbers from chapter 3 of Irvine:
The line numbers have been added for your reference only.
When this program runs it will display the contents of the extended registers. EAX
should contain 30000h.
Line 1 is the title directive and prints the specified title at the top of the listing to identify
the program. It is optional and not necessary to include on all programs.
Line 4 is an include directive. It tells the assembler to copy definitions from the file
Irvine32.inc. This file contains macros and useful procedures written by the author of
your textbook to perform common tasks such as display the contents of registers, perform
file I/O, etc.
Line 5 is the code directive. It marks where the code segment begins in memory.
Line 6 declares a procedure called main. PROC marks the beginning of a procedure.
The format is to give the procedure name first, followed by PROC.
Line 7-11 are the body of the code for the main procedure. We have already discussed
the MOV instruction. The ADD instruction adds the operand to the EAX register and
puts the result into EAX. The SUB instruction subtracts the operand from the EAX
register and puts the result into EAX.
Line 10 invokes the DumpRegs procedure. The CALL statement calls a procedure. In
this case, the DumpRegs procedure exists in the Irvine32 library – that is why the source
code is not in our program. This procedure will be linked into the executable.
Line 11 invokes a macro called Exit in the Irvine32 library. It provides a simple way to
end a program in Windows by invoking a Windows function that halts the program.
Line 12 marks the end of the main procedure. The format is to give the procedure name
first, followed by ENDP.
Line 13 marks the end of the program. The optional word “main” behind it indicates the
location of the program entry point.
For your programs, you can use this program as your basic template for starting out with
assembly programming.
Alternate version of AddSub
Line 6 is the stack directive. It allocates 4096 bytes of stack space out of our data
segment.
Line 7-8 specifies that we will be using two external procedures, ExitProcess and
DumpRegs. ExitProcess is a Windows function that halts the current process, while
DumpRegs is the Irvine32 procedure that displays the register contents. dwExitCode is
used by ExitProcess.
Line 15 invokes the ExitProcess function, passing it a return code of zero. INVOKE is an
assembler directive that calls a procedure or function.
Real mode sample program
If we are writing a program that is going to run in real mode then we have a bit of extra
work to do. We must initialize our segments and remember that offsets (addresses) of
code and data are 16 bits rather than 32 bits.
Line 5 is a directive that indicates we are referencing the 16 bit Irvine library.
More specifically, this declares the small memory model, a stack of 4096 bytes, and uses
the .386 processor directive.
Line 8 declares a variable called “message” within the data segment. The “BYTE”
means we will declare this variable in chunks of bytes. We can also use “DB” in place of
“BYTE” for “Define Byte”. This line allocates a block of memory to hold the string
containing “Hello, world!” along with a byte containing the value 0 (NULL) which is
delimiter that indicates where the string terminates. If we leave this off, the string
procedures will think everything in memory past this string is part of the string, until we
happen to hit a zero.
Lines 10-11 move the address of the program’s data segment into the DS register. This is
needed so we’ll reference the correct offset for our string (remember we are in Real
Mode! Addresses are created by combining an offset with a segment register).
Line 13 with “offset message” moves the address of the message variable to the DX
register.
Line 14, the WriteString procedure, assumes that the address of the string to write is
stored in the DX register. This routine displays all data as ASCII to the screen until the
null character is reached.
Assembling a Program
Assembling a program is similar to compiling a program. There are two stages. First,
the source code is run through the assembler to produce an object file. The object files
contain assembled machine code, but for individual modules. Object files are sometimes
distributed as libraries; for example you have Irvine.lib as an object file containing
commonly used subroutines. Next, the object files are linked to produce an executable
program. The executable program is then run through the DOS/Windows loader.
Other files that may be produced along the way are MAP and LISTING files. The listing
files are optionally generated during assembly and contain the source code and translated
machine code in a printable format. The map files are generated during linking and
contain information about the symbols, segments, and their respective addresses.
In the hello world program, we defined a string variable named “message”. To do this,
we used the BYTE directive. Let’s look at the data allocation directives in more detail.
These directives determine how much storage to allocated based on some predefined
types.
The BYTE, WORD, and DWORD directives will be the ones we use most commonly.
DB allocates storage for one or more 8-bit values. The syntax is as follows:
A variable’s initial contents may be left undefined by using a question mark for the
initializer:
char1 byte ?
When multiple initializers are used, the data is stored sequentially in memory. Consider:
If numlist is stored at memory location 0000, then the value 10 is stored at location 0000,
the value 20 is stored at location 0001, the value 30 is stored at 0002, etc.
If you have a very long string, you can continue on multiple lines:
The DWORD storage is enough to store an offset memory location for some other label
or variable. To do this we can use reference the variable as a word:
One final operator used in defining data storage is the DUP operator. DUP appears after
a storage directive, such as DB, and with it you can duplicate one or more values. It is
most often used when allocating space for a string or array:
The other data storage directives work similarly, but allocate more space. For example
we could use:
Other Directives
EQU – This assigns a symbolic name to a string or numeric constant. Unlike using the
equal sign, a symbol defined with EQU may not be redefined later:
Pi EQU 3.14159
Max EQU 10000
We can now use Pi or Max like constants. The assembler will complain if we try to
change either one to something else later.
.data
myString db “A String”, 0
.code
move bx, MyPointer ; same as MOV BX, offset myString
In this case, we redefined the MOV instruction to MOVE and MyPointer as “offset
myString” (which gives the offset of a variable).