ASM To C Translation Table
ASM To C Translation Table
By: Enzo P.
Version: 5
Summary
1. Introduction
2. Why Translate From Assembly to C?
3. C Calling Conventions
4. ASM to C - Primitive Data Types Equivalency Table
5. ASM to C - Complex Data Types and Structures Equivalency Table
6. ASM to C - Instructions Set Equivalency Table
6.1. CPU - 8086 + FPU
6.2. CPU - 80186 + FPU
6.3. CPU - 80286 + FPU
6.4. CPU - 80386 + FPU
6.5. CPU - 80486 + FPU
6.6. CPU - Pentium + MMX + FPU
6.7. CPU - Pentium Pro + FPU
6.8. CPU - AMD + 3Dnow + FPU
7. Details about translation of FPU ASM codes to C
8. CPU Manual Reference
8.1. From Intel Manual
9. Conclusion
10.Bibliography
Introduction
Some information added in the table are taken from Intel Manual and/or AMD Manual [and without any doubt, such
information is subject to their copyright].
When I started this work, I noticed one problem, some instructions are not clearly explained enough in Intel manual to be
possible to do a 1:1 translation to C/C++, or, there are no equivalencies of such instructions in C/C++ language. What means
that we need to interpret some instructions by ourselves, and is much possible that it will generate some inconsistencies and
defective translations. In our best, the maximum we can do for such instructions that do not have C equivalents, is make use of
ASM Inline, what can generate too cryptic source codes. Without doubt, nothing can be done for some ASM instructions, if not
the use of ASM Inline.
Here, I plan to define and use my own strategy of ASM to C translation; that will use the equivalency table to make easy
definitions of possible ways of a simple code replacement rather than try to interpret any single instruction to build structures
or anything like this.
The idea of this strategy is very simple, and seems to be powerful. It is the opposed of what the compiler do. So, it
recognize the instructions that are not complex, building basic blocks of codes, or "primitives", and depending of composed
instructions, the basic blocks will be merged to build complex blocks of codes, or "structures".
So, the idea is to build a map of blocks of codes, then, as the translation is being done, the sections that was translated
is mapped depending on the type of operation that was done in the section, recognizing section by section, but firstly the
sections that seems to have the basic blocks of codes.
For instructions that are not much complex, and can be simple replaced 1:1 to a C instruction, I don’t plan to do any type
of interpretation of such instructions, but simple replace then to his equivalents. Moreover, it will be mapped as a simple
replacement.
For instructions that are composed, it is, instructions that are dependents of another instructions to build a logical
structure, I plan try find patterns to do the recognition in the simpler way as possible, without much code interpretation. In
cases that such structures will depend of codes that was already translated to C, the idea is to do simple "code merging",
joining the already translated code to the structure that will be build. Moreover, it will be mapped as a complex recognition,
with code merging or not, or pattern matching or anything like this.
Why Translate From Assembly to C?
C Calling Conventions
ASM to C - Primitive Data Types Equivalency Table
Typical limits of integral types
Implicit C Explicit C B
Bits ASM Type Minimum value Maximum value
specifier(s) specifier ytes
signed char same 8 1 Byte −128 +127
unsigned char same 8 1 Byte 0 255
char one of the above 8 1 Byte −128 or 0 +127 or 255
short signed short int 16 2 Word −32,768 +32,767
unsigned short unsigned short int 16 2 Word 0 65,535
16 or 2 or Word or Double −32,768 or +32,767 or
int signed int
32 4 Word −2,147,483,648 +2,147,483,647
16 or 2 or Word or Double 65,535 or
unsigned unsigned int 0
32 4 Word 4,294,967,295
long signed long int 32 4 Double Word −2,147,483,648 +2,147,483,647
unsigned long unsigned long int 32 4 Double Word 0 4,294,967,295
signed long long −9,223,372,036,854,7 +9,223,372,036,854,7
long long[1] 64 8 Quad Word
int 75,808 75,807
unsigned long unsigned long 18,446,744,073,709,55
64 8 Quad Word 0
long[1] long int 1,615
The size and limits of the plain int type (without the short, long, or long long modifiers) vary much more than the other integral
types among C implementations. The Single UNIX Specification specifies that the int type must be at least 32 bits, but the ISO
C standard only requires 16 bits. Refer to limits.h for guaranteed constraints on these data types. On most existing
implementations, two of the five integral types have the same bit widths.
Integral type literal constants may be represented in one of two ways, by an integer type number, or by a single character
surrounded by single quotes. Integers may be represented in three bases: decimal (48 or -293), octal with a "0" prefix (0177), or
hexadecimal with a "0x" prefix (0x3FE). A character in single quotes ('F'), called a "character constant", represents the value of
that character in the execution character set (often ASCII). In C, character constants have type int (in C++, they have type
char).
ASM to C - Complex Data Types and Structures Equivalency Table
AL ← (AL + 6);
AH ← AH + 1;
AF ← 1;
CF ← 1;
ELSE
AF ← 0;
CF ← 0;
FI;
AL ← AL AND 0FH;
AAD ASCII Adjust for Division [ASCII D5 0A tempAL ← AL; The immediate value
Adjust AX Before Division] (imm8) is taken from
tempAH ← AH;
the second byte of
AL ← (tempAL + (tempAH ∗ imm8)) AND the instruction.
FFH; (* imm8 is set to 0AH for the AAD
mnemonic *)
AH ← 0
AAM ASCII Adjust for Multiplication D4 0A tempAL ← AL; The immediate value
[ASCII Adjust AX After (imm8) is taken from
Multiplication] AH ← tempAL / imm8; (* imm8 is set to
the second byte of
0AH for the AAD mnemonic *)
the instruction.
AL ← tempAL MOD imm8;
AAS ASCII Adjust for Subtraction 3F IF ((AL AND 0FH) > 9) OR (AF = 1) THEN
[ASCII Adjust AL After
Subtraction] AL ← AL – 6;
AH ← AH – 1;
AF ← 1;
CF ← 1;
ELSE
CF ← 0;
AF ← 0;
FI;
AL ← AL AND 0FH;
ADC Add With Carry DEST ← DEST + SRC + CF;
ADD Arithmetic Addition DEST ← DEST + SRC;
AND Logical And DEST ← DEST AND SRC;
CF ← CF OR CarryFromLastAddition;
(* CF OR carry from AL ← AL + 6 *)
AF ← 1;
ELSE
AF ← 0;
FI;
IF ((AL AND F0H) > 90H) or CF = 1) THEN
AL ← AL + 60H;
CF ← 1;
ELSE
CF ← 0;
FI;
DAS Decimal Adjust AL After 2F IF (AL AND 0FH) > 9 OR AF = 1 THEN
Subtraction
AL ← AL − 6;
CF ← CF OR
BorrowFromLastSubtraction; (* CF OR
borrow from AL ← AL − 6 *)
AF ← 1;
ELSE
AF ← 0;
FI;
AL ← AL − 60H;
CF ← 1;
ELSE
CF ← 0;
FI;
DEC Decrement by 1 DEST ← DEST – 1;
DIV Unsigned Divide temp ← AX / SRC;
AL ← temp;
AH ← AX MOD SRC;
FI;
FI;
ELSE
AL ← temp;
AH ← AX SignedModulus SRC;
FI;
IMUL Signed Integer Multiply IF (NumberOfOperands = 1)THEN
IF (OperandSize = 8)THEN
AX ← AL ∗SRC (* signed
multiplication *)
ELSE
CF = 1; OF = 1;
FI;
FI;
CF = 1; OF = 1;
ELSE
CF = 0; OF = 0;
FI;
ELSE (* NumberOfOperands = 3 *)
CF = 1; OF = 1;
ELSE
CF = 0; OF = 0;
FI;
FI;
IN Input Byte or Word From Port IF ((PE = 1) AND ((CPL > IOPL) OR (VM =
1))) THEN
#GP(0);
FI;
FI;
INC Increment by 1 DEST ← DEST +1;
INT Call to Interrupt Instruction composed Used with stack and
system calls
INT03 Call to Interrupt Instruction composed Used with stack and
system calls
INT3 Call to Interrupt Instruction composed Used with stack and
system calls
INTO Call to Interrupt on Overflow Instruction composed Used with stack and
system calls
IRET Return From Interrupt CF Instruction composed Used with stack and
system calls
IRETW Return From Interrupt Instruction composed Used with stack and
system calls
Or
DX:AX ← AX ∗ SRC
CF ← 0
ELSE
CF ← 1;
FI;
DEST ← – (DEST)
NOP No Operation (90h) ???
NOT One's Compliment Negation DEST ← NOT DEST;
(Logical NOT)
#GP(0);
FI;
FI;
POP Pop Word off Stack [Only DEST ← SS:SP; (* copy a word *) Used in conjunction
works with register CS on with call structures
8086/8088] SP ← SP + 2;
and stack
POPF Pop Flags off Stack [Pop data ???
into flags register]
PUSH Push Word onto Stack ESP ← ESP − 2; Used in conjunction
with call structures
SS:ESP ← SRC; (* push word *)
and stack
PUSHF Push Flags onto Stack [Push ???
flags onto stack]
Can be used in a
LOOP construct that
takes some action
based on the setting
of the status flags
before the next
comparison is made.
SCASW Scan String (Word) [Compare Can be preceded by
word string] the REP prefix for
block comparisons of
CX bytes.
Can be used in a
LOOP construct that
takes some action
based on the setting
of the status flags
before the next
comparison is made.
SHL Shift Logical Left [Shift left
(unsigned shift left)]
SHR Shift Logical Right [Shift right
(unsigned shift right)]
STC Set Carry
STD Set Direction Flag
STI Set Interrupt Flag (Enable
Interrupts)
STOSB Store String (Byte) [Store byte Can be preceded by
in string] the REP prefix for
block comparisons of
CX bytes.
Can be used in a
LOOP construct that
takes some action
based on the setting
of the status flags
before the next
comparison is made.
STOSW Store String (Word) [Store Can be preceded by
word in string] the REP prefix for
block comparisons of
CX bytes.
Can be used in a
LOOP construct that
takes some action
based on the setting
of the status flags
before the next
comparison is made.
SUB Subtract
FABS
FADD
FADDP
FBLD
FBSTP
FCHS
FCLEX
FCOM
FCOMP
FCOMPP
FDECSTP
FDISI
FDIV
FDIVP
FDIVR
FDIVRP
FENI
FFREE
FIADD
FICOM
FICOMP
FIDIV
FIDIVR
FILD
FIMUL
FINCSTP
FINIT
FIST
FISTP
FISUB
FISUBR
FLD
FLD1
FLDCW
FLDENV
FLDENVW
FLDL2E
FLDL2T
FLDLG2
FLDLN2
FLDPI
FLDZ
FMUL
FMULP
FNCLEX
FNDISI
FNENI
FNINIT
FNOP
FNSAVE
FNSAVEW
FNSTCW
FNSTENV
FNSTENVW
FNSTSW
FPATAN
FPREM
FPTAN
FRNDINT
FRSTOR
FRSTORW
FSAVE
FSAVEW
FSCALE
FSQRT
FST
FSTCW
FSTENV
FSTENVW
FSTP
FSTSW
FSUB
FSUBP
FSUBR
FSUBRP
FTST
FXAM
FXCH
FXTRACT
FYL2X/fyl2x
p
FYL2XP1
• ZeroExtend(value)—Returns a value zero-extended to the operand-size attribute of the instruction. For example, if the
operand-size attribute is 32, zero extending a byte value of –10 converts the byte from F6H to a doubleword value of
000000F6H. If the value passed to the ZeroExtend function and the operand-size attribute are the same size, ZeroExtend
returns the value unaltered.
• SignExtend(value)—Returns a value sign-extended to the operand-size attribute of the instruction. For example, if the
operand-size attribute is 32, sign extending a byte
containing the value –10 converts the byte from F6H to a doubleword value of FFFFFFF6H. If the value passed to the
SignExtend function and the operand-size attribute
are the same size, SignExtend returns the value unaltered.
• SaturateSignedWordToSignedByte—Converts a signed 16-bit value to a signed 8-bit value. If the signed 16-bit value is less
than –128, it is represented by the saturated value –128 (80H); if it is greater than 127, it is represented by the saturated value
127 (7FH).
• SaturateSignedDwordToSignedWord—Converts a signed 32-bit value to a signed 16-bit value. If the signed 32-bit value is less
than –32768, it is represented by the saturated value –32768 (8000H); if it is greater than 32767, it is represented by the
saturated value 32767 (7FFFH).
• SaturateSignedWordToUnsignedByte—Converts a signed 16-bit value to an unsigned 8-bit value. If the signed 16-bit value is
less than zero, it is represented by the saturated value zero (00H); if it is greater than 255, it is represented by the saturated
value 255 (FFH).
• SaturateToSignedByte—Represents the result of an operation as a signed 8-bit value. If the result is less than –128, it is
represented by the saturated value –128 (80H); if it is greater than 127, it is represented by the saturated value 127 (7FH).
• SaturateToSignedWord—Represents the result of an operation as a signed 16-bit value. If the result is less than –32768, it is
represented by the saturated value –32768 (8000H); if it is greater than 32767, it is represented by the saturated value 32767
(7FFFH).
• SaturateToUnsignedByte—Represents the result of an operation as a signed 8-bit value. If the result is less than zero it is
represented by the saturated value zero (00H); if it is greater than 255, it is represented by the saturated value 255.
• SaturateToUnsignedWord—Represents the result of an operation as a signed 16-bit value. If the result is less than zero it is
represented by the saturated value zero (00H); if it is greater than 65535, it is represented by the saturated value 65535
(FFFFH).
• LowOrderWord(DEST * SRC)—Multiplies a word operand by a word operand and stores the least significant word of the
doubleword result in the destination operand.
• HighOrderWord(DEST * SRC)—Multiplies a word operand by a word operand and stores the most significant word of the
doubleword result in the destination operand.
• Push(value)—Pushes a value onto the stack. The number of bytes pushed is determined by the operand-size attribute of the
instruction. Refer to the “Operation” section in “PUSH—Push Word or Doubleword Onto the Stack” in this chapter for more
information on the push operation.
• Pop() removes the value from the top of the stack and returns it. The statement EAX ← Pop(); assigns to EAX the 32-bit value
from the top of the stack. Pop will return either a word or a doubleword depending on the operand-size attribute. Refer to the
“Operation” section in “POP—Pop a Value from the Stack” in this chapter for more information on the pop operation.
• PopRegisterStack—Marks the FPU ST(0) register as empty and increments the FPU register stack pointer (TOP) by 1.
• Switch-Tasks—Performs a task switch.
• Bit(BitBase, BitOffset)—Returns the value of a bit within a bit string, which is a sequence of bits in memory or a register. Bits
are numbered from low-order to high-order within registers and within memory bytes. If the base operand is a register, the
offset can be in the range 0..31. This offset addresses a bit within the indicated register.
Conclusion
Without no doubt, there is a lot of work to do around it, and don’t matter how much work a person do in it, if it don’t be
done in conjunct with many "specialized", prepared, and motivated people, not much feedback can be taken from this work,
because the amount of work is necessary to do in it. By the way, it is a good thing to research about, and for ones that like it, a
good subject to delight.
For now, this is just a table of instructions to standardize and define the meaning of each instruction, when possible is
planned to define all his pseudo codes and equivalent C codes.
Any one that wants to contribute any kind of information is welcome. At the current stage of the table, we need mostly
the pseudo codes, and possible solutions [mainly to instructions that don’t have a clear or direct representation in C] to
represent the instructions in a plain and clear C code. We also need reference C codes to analyze compiled ASM outputs to find
for code patterns, and backward code representations. A backward code representation is when you compile a C code to try
generate one specific ASM instruction and check what is the C instruction that generated the specific ASM instruction, in this
way you can define one ASM to C equivalency, so, can translate the ASM code back to C.
Bibliography