0% found this document useful (0 votes)
7 views

CMPS290 Class Notes Chap 03

Uploaded by

a.a.u.33
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

CMPS290 Class Notes Chap 03

Uploaded by

a.a.u.33
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

CHAPTER 3

Arithmetic for Computers


3.1 Introduction 188
3.2 Addition and Subtraction 188
3.3 Multiplication 193
3.4 Division 199
3.5 Floating Point 206
3.6 Parallelism and Computer Arithmetic: Subword Parallelism 232
3.7 Real Stuff: Streaming SIMD Extensions and Advanced Vector
Extensions in x86 234
3.8 Going Faster: Subword Parallelism and Matrix Multiply 235
3.9 Fallacies and Pitfalls 237
3.10 Concluding Remarks 241
3.11 Historical Perspective and Further Reading 245
3.12 Self-Study 245
3.13 Exercises 248

CMPS290 Class Notes (Chap03) Page 1 / 20 by Kuo-pao Yang


3.1 Introduction 188

• Operations on integers
o Addition and subtraction
o Multiplication and division
o Dealing with overflow
• Floating-point real numbers
o Representation and operations x

3.2 Addition and Subtraction 188

• Example: 7 + 6 Binary Addition

o Figure 3.1 shows the sums and carries. The carries are shown in parentheses, with
the arrows showing how they are passed.

FIGURE 3.1 Binary addition, showing carries from right to left. The rightmost bit adds 1 to 0,
resulting in the sum of this bit being 1 and the carry out from this bit being 0. Hence, the operation for
the second digit to the right is 0 + 1 + 1. This generates a 0 for this sum bit and a carry out of 1. The
third digit is the sum of 1 + 1 + 1, resulting in a carry out of 1 and a sum bit of 1. The fourth bit is 1 + 0
+ 0, yielding a 1 sum and no carry.

• Example: 7 - 6 Binary Subtraction

o Remember that c – a = c + (-a) because we subtract by negating the second


operation than add. Therefore, 7 – 6 = 7 + (-6)

CMPS290 Class Notes (Chap03) Page 2 / 20 by Kuo-pao Yang


• Figure 3.2 shows the combination of operations, operands, and results that indicate an
overflow.

FIGURE 3.2 Overflow conditions for addition and subtraction.

• Dealing with Overflow


o Some languages (e.g., C and Java) ignore overflow
▪ Use MIPS addu, addui, subu instructions
▪ Because C ignores overflows, the MIPS C compiler will always generate the
unsigned version of the arithmetic instruction addu, addiu, and subu, no matter
what type of the variables.
o Other languages (e.g., Ada and Fortran) require raising an exception
▪ Use MIPS add, addi, sub instructions
▪ Languages like Ada and Fortran require the program be notified. The
programmer or the programming environment must then decide what to do
when overflow occurs.
▪ On overflow, invoke exception handler
• Save PC in exception program counter (EPC) register
• Jump to predefined handler address
• mfc0 (move from coprocessor 0 register) instruction can retrieve EPC
value, to return after corrective action

CMPS290 Class Notes (Chap03) Page 3 / 20 by Kuo-pao Yang


3.3 Multiplication 193

• The length of the multiplication of an n-bit multiplicand and an m-bit multiplier is a


product that is n + m bit long.

Sequential Version of the Multiplication Algorithm and Hardware

• The 64-bit Product register is initialized to 0


• It is clear that we will need to move the multiplicand left one digit each step, as it
may be added to the intermediate products.
• If each step took a clock cycle, this algorithm would require almost 100 clock cycles
to multiply two 32-bit numbers.

FIGURE 3.3 First version of the multiplication hardware. The Multiplicand register, ALU, and
Product register are all 64 bits wide, with only the Multiplier register containing 32 bits. (Appendix B
describes ALUs.) The 32-bit multiplicand starts in the right half of the Multiplicand register and is
shifted left 1 bit on each step. The multiplier is shifted in the opposite direction at each step. The
algorithm starts with the product initialized to 0. Control decides when to shift the Multiplicand and
Multiplier registers and when to write new values into the Product register.

CMPS290 Class Notes (Chap03) Page 4 / 20 by Kuo-pao Yang


FIGURE 3.4 The first multiplication algorithm, using the hardware shown in Figure 3.3. If the least
significant bit of the multiplier is 1, add the multiplicand to the product. If not, go to the next step. Shift
the multiplicand left and the multiplier right in the next two steps. These three steps are repeated 32
times.

• Example: (A multiply Algorithm) Use 4-bit numbers to multiple 210 X 310, or 00102 X
00112

FIGURE 3.5 Multiply example using algorithm in Figure 3.4. The bit examined to determine the next
step is circled in color.

CMPS290 Class Notes (Chap03) Page 5 / 20 by Kuo-pao Yang


Signed Multiplication
• Signed Multiplication: Convert the multiplier and multiplicand to positive numbers
and then remember the original signed.
• The algorithm should then be run for 31 iterations, leaving the signs out of the
calculation.

Faster Multiplication

• Faster multiplications are possible by essentially providing one 32-bit adder for each
bit of the multiplier: one input is the multiplicand ANDed with a multiplier bit, and
the other is the output of a prior adder.
• Uses multiple adders: Cost / Performance tradeoff

FIGURE 3.7 Fast multiplication hardware. Rather than use a single 32-bit adder 31 times, this
hardware “unrolls the loop” to use 31 adders and then organizes them to minimize delay.

Multiply in MIPS

• MIPS provides a separate pair of 32-bit register to contain the 64-bit product, called
Hi and Lo.
o HI: most-significant 32 bits
o LO: least-significant 32-bits
• MIPS Instructions
o mult rs, rt / multu rs, rt
▪ 64-bit product in HI / LO
o mfhi rd / mflo rd
▪ Move from HI / LO to rd
▪ Can test HI value to see if product overflows 32 bits

Summary
• Multiplication hardware simply shifts and adds.
• Compiler even use shift instructions for multiplication by powers of 2.
• With much more hardware we can do the adds in parallel, and do them much faster.

CMPS290 Class Notes (Chap03) Page 6 / 20 by Kuo-pao Yang


3.4 Division 199

• Divide’s two operands, called the dividend and divisor, and result, called the quotient,
are accompanied by a second result, called the remainder.
• Here is another way to express the relationship between the components:
Dividend = Quotient X Divisor + Remainder

A Division Algorithm and Hardware


• In Figure 3.8
o Quotient: 32-bit Quotient register set 0, shift it left 1 bit each step
o Divisor: Each iteration of the algorithm needs to move the divisor to the right one
digit, so we start with divisor placed in the left half of the 64-bit Divisor register
and shift it right 1 bit each step to align it with dividend.
o Remainder register is initialized with the dividend.

FIGURE 3.8 First version of the division hardware. The Divisor register, ALU, and Remainder
register are all 64 bits wide, with only the Quotient register being 32 bits. The 32-bit divisor starts in the
left half of the Divisor register and is shifted right 1 bit each iteration. The remainder is initialized with
the dividend. Control decides when to shift the Divisor and Quotient registers and when to write the
new value into the Remainder register.

• In Figure 3.9, A division algorithm


o It must first subtract the divisor in step 1
o If the result is positive, the divisor was smaller or equal to the dividend, so we
generate a 1 in the quotient (step 2a).
o If the result is negative, the next step is to restore the original value by add the
divisor back to the remainder and generate a 0 in the quotient (step 2b).
o The divisor is shifted right and then we iterate again.

CMPS290 Class Notes (Chap03) Page 7 / 20 by Kuo-pao Yang


FIGURE 3.9 A division algorithm, using the hardware in Figure 3.8. If the remainder is positive, the
divisor did go into the dividend, so step 2a generates a 1 in the quotient. A negative remainder after step
1 means that the divisor did not go into the dividend, so step 2b generates a 0 in the quotient and adds
the divisor to the remainder, thereby reversing the subtraction of step 1. The final shift, in step 3, aligns
the divisor properly, relative to the dividend for the next iteration. These steps are repeated 33 times.

• Example: (A Divide Algorithm) Using a 4-bit version of the algorithm to save pages,
let’s try dividing 7 by 2 or 0000 0111 by 0010

FIGURE 3.10 Division example using the algorithm in Figure 3.9. The bit examined to determine the
next step is circled in color.

CMPS290 Class Notes (Chap03) Page 8 / 20 by Kuo-pao Yang


Signed Division
• The rule: the dividend and remainder must have the same signs, no matter what the
signs of the divisor and quotient.
• Signed division algorithm: Negates the quotient if signs of the operands are opposite
and makes the sign of the nonzero remainder match the dividend

Divide in MIPS

• MIPS provides a separate pair of 32-bit Hi and 32-bit Lo registers for both multiply
and divide and.
o Hi: 32-bit remainder
o Lo: 32-bit quotient
• MIPS Instructions
o div rs, rt / divu rs, rt
▪ 64-bit product in HI / LO
o mfhi rd / mflo rd
▪ Move from HI / LO to rd

Summary

• The common hardware support for multiply and divide allow MIPS to provide a
single pair of 32-bit registers (Hi and Lo) that are used both for multiply and divide.

CMPS290 Class Notes (Chap03) Page 9 / 20 by Kuo-pao Yang


• Figure 3.12 summarizes the enhancements to the MIPS architecture.

FIGURE 3.12 MIPS core architecture. The memory and registers of the MIPS architecture are not
included for space reasons, but this section added the Hi and Lo registers to support multiply and divide.
MIPS machine language is listed in the MIPS Reference Data Card at the front of this book.

CMPS290 Class Notes (Chap03) Page 10 / 20 by Kuo-pao Yang


3.5 Floating Point 206

• Representation for non-integral numbers


• Including very small and very large numbers
• Scientific notation: A single digit to the left of the decimal point. A number in
scientific notation that has no leading 0s is called a normalized number.
o Normalized: –2.34 × 1056
o Not normalized: +0.002 × 10–4 and +987.02 × 109
• The programming language C use data type names: float and double
• Just as in scientific notation, numbers are represented as a single nonzero digit to the
left of the binary point. In binary, the form is:

±1.xxxxxxx2 × 2yyyy

Floating-Point Representation

• IEEE 754 Floating Point Standard


o Single precision floating point (32-bit)
o Double precision floating point (64-bit)
• In general, floating-point numbers are of the form

x = (−1) S  (1 + Fraction)  2(Exponent−Bias)

o S: sign bit (0  non-negative, 1  negative)


o Normalize significand: 1.0 ≤ |significand| < 2.0
▪ Always has a leading pre-binary-point 1 bit, so no need to represent it
explicitly (hidden bit)
▪ Significand is Fraction with the “1.” restored
o Exponent: excess representation: actual exponent + Bias
▪ Exponent is unsigned
▪ Single: Bias = 127; Double: Bias = 1023

Single precision floating point (32-bit)

Double precision floating point (64-bit)

CMPS290 Class Notes (Chap03) Page 11 / 20 by Kuo-pao Yang


• Single Precision Range
o Exponents 0000 0000 and 1111 1111 reserved
o Smallest value
▪ Exponent: 0000 0001  actual exponent = 1 – 127 = –126
▪ Fraction: 000…00 (23bits)  significand = 1.0
▪ ±1.0 × 2–126 ≈ ±1.2 × 10–38
o Largest value
▪ exponent: 1111 1110  actual exponent = 254 – 127 = +127
▪ Fraction: 111…11 (23bits)  significand ≈ 2.0
▪ ±2.0 × 2+127 ≈ ±3.4 × 10+38
• Double Precision Range
o Exponents 000 0000 0000 and 111 1111 1111 reserved
o Smallest value
▪ Exponent: 000 0000 0001  actual exponent = 1 – 1023 = –1022
▪ Fraction: 000…00 (52bits)  significand = 1.0
▪ ±1.0 × 2–1022 ≈ ±2.2 × 10–308
o Largest value
▪ Exponent: 111 1111 1110  actual exponent = 2046 – 1023 = +1023
▪ Fraction: 111…11(52 bits)  significand ≈ 2.0
▪ ±2.0 × 2+1023 ≈ ±1.8 × 10+308
• IEEE 754 makes the leading 1-bit of normalized binary numbers implicit. Hence, the
number is actually 24 bits long in in single precision (implied 1 and a 23-bit fraction),
and 53 bit long in double precision (1 + 52).

• Figure 3.13 shows IEEE 754 encoding of single and double precision numbers

FIGURE 3.13 IEEE 754 encoding of floating-point numbers. A separate sign bit determines the sign.
Denormalized numbers are described in the Elaboration on page 232. This information is also found in
Column 4 of the MIPS Reference Data Card at the front of this book.

CMPS290 Class Notes (Chap03) Page 12 / 20 by Kuo-pao Yang


• Example: (Floating-Point Representation) Show the IEEE 754 binary representation
of the number –0.75 in single and double precision
o The number –0.75 is also
–3 ten /4 ten or –3ten / 22ten
o It is also represented by the binary fraction
–11two/22ten or –0.11two
o In scientific notation, the value, it is
–1.1two × 2–1
o The general form for single and double precision numbers
x = (−1) S  (1 + Fraction)  2(Exponent−Bias)
o Therefore, the value is
–0.75 = (–1)1 × 1.12 × 2–1
o S=1
o Fraction = 1000…002
o Exponent = –1 + Bias
▪ Single: –1 + 127 = 126 = 0111 11102
▪ Double: –1 + 1023 = 1022 = 011 1111 11102
o Single precision binary representation of -0.75 is:

o Double precision binary representation of -0.75 is:

CMPS290 Class Notes (Chap03) Page 13 / 20 by Kuo-pao Yang


• Example: (Converting Binary to Decimal Floating Point) What decimal number is
represented by this single precision float:

o The general form for single and double precision numbers


x = (−1) S  (1 + Fraction)  2(Exponent−Bias)
▪ S=1
▪ Fraction = 01000…002
▪ Exponent = 1000 00012 = 129

o x = (–1)1 × (1.01two) × 2(129 – 127)


= (–1) × 1.25ten × 22
= –5.0

CMPS290 Class Notes (Chap03) Page 14 / 20 by Kuo-pao Yang


• Denormalized Number
o Exponent = 000...0  hidden bit is 0

x = (−1)S  (0 + Fraction)  2−Bias

o Smaller than normal numbers


▪ allow for gradual underflow, with diminishing precision
• Infinities and NaNs
o Exponent = 111...1, Fraction = 000...0
▪ ±Infinity
▪ Can be used in subsequent calculations, avoiding need for overflow check
o Exponent = 111...1, Fraction ≠ 000...0
▪ Not-a-Number (NaN)
▪ Indicates illegal or undefined result (e.g., 0.0 / 0.0)
▪ Can be used in subsequent calculations

CMPS290 Class Notes (Chap03) Page 15 / 20 by Kuo-pao Yang


Floating-Point Addition

• Example: (Binary Floating-Point Addition) Add the number 0.5 and -04375 in binary.
o Now consider a 4-digit binary example
▪ 1.0002 × 2–1 + –1.1102 × 2–2 (0.5 + –0.4375)
o 1. Align binary points
▪ Shift the smaller number to right until its exponent would match the larger
exponent
▪ 1.0002 × 2–1 + –0.1112 × 2–1
o 2. Add significands
▪ 1.0002 × 2–1 + –0.1112 × 2–1 = 0.0012 × 2–1
o 3. Normalize result & check for over/underflow
▪ 1.0002 × 2–4, with no overflow / underflow
o 4. Round and renormalize if necessary
▪ 1.0002 × 2–4 (no change) = 0.0625

FIGURE 3.14 Floating-point addition. The normal path is to execute steps 3 and 4 once, but if
rounding causes the sum to be unnormalized, we must repeat step 3.

CMPS290 Class Notes (Chap03) Page 16 / 20 by Kuo-pao Yang


Floating-Point Multiplication

• Example: (Binary Floating-Point Multiplication) Multiply the number 0.5 and -04375
in binary.
o Now consider a 4-digit binary example
▪ 1.0002 × 2–1 × –1.1102 × 2–2 (0.5 × –0.4375)
o 1. Add exponents
▪ Unbiased: –1 + –2 = –3
▪ Biased: (–1 + 127) + (–2 + 127) = –3 + 254 – 127 = –3 + 127 = 124
o 2. Multiply significands
▪ 1.0002 × 1.1102 = 1.1102  1.1102 × 2–3
o 3. Normalize result & check for overflow / underflow
▪ 1.1102 × 2–3 (no change) with no overflow /underflow
o 4. Round and renormalize if necessary
▪ 1.1102 × 2–3 (no change)
o 5. Determine sign: positive × negative  negative
▪ –1.1102 × 2–3 = –0.21875

FIGURE 3.16 Floating-point multiplication. The normal path is to execute steps 3 and 4 once, but if
rounding causes the sum to be unnormalized, we must repeat step 3.

CMPS290 Class Notes (Chap03) Page 17 / 20 by Kuo-pao Yang


Floating-Point Instructions in MIPS

• MIPS supports the IEEE 754 single precision and double precision formats with these
instructions:
o Floating-point addition: single (add.s) and double (add.d)
▪ e.g., add.s $f0, $f4, $f6 # $f2 = $f4 + $f6
o Floating-point subtraction: single (sub.s) and double (sub.d)
▪ e.g., sub.d $f2, $f4, $f6 # $f2 = $f4 - $f6
o Floating-point multiplication: single (mul.s) and double (mul.d)
▪ e.g., mul.s $f2, $f4, $f6 # $f2 = $f4 X $f6
o Floating-point division: single (div.s) and double (div.d)
▪ e.g., div.d $f2, $f4, $f6 # $f2 = $f4 / $f6
o Floating-point comparison: single (c.x.s) and double (c.x.d)
Where x may be equal (eq), not equal (neq),
less than (lt), less than or equal (le),
greater than (gt), greater than or equal (qe)
▪ e.g., c.lt.s $f2, $f4 # if ($f2 < $f4) cond = 1; else cond = 0
o Floating-point branch: true (bclt) and false (bclf)
▪ e.g., bclt 25 # if (cond == 1) go to PC + 4 + 100

• Floating-point hardware is coprocessor 1


o Adjunct processor that extends the ISA
• Separate floating-point registers
o 32 single-precision: $f0, $f1, …, $f31
o Paired for double-precision: $f0/$f1, $f2/$f3, …, $f30/$f31
• Floating-point load and store instructions
o Load word coprocessor 1 (lwc1), store word coprocessor 1 (swc1)
▪ e.g., lwc1 $f1, 100($s2) # $f1 = Memory [$s2 + 100]

Summary

• IEEE 754 standard floating-point representation

x = (−1) S  (1 + Fraction)  2(Exponent−Bias)

• It almost always an approximation of the real number.

CMPS290 Class Notes (Chap03) Page 18 / 20 by Kuo-pao Yang


3.10 Concluding Remarks 241

FIGURE 3.24 The MIPS instruction set. This book concentrates on the instructions in the left column.
This information is also found in columns 1 and 2 of the MIPS Reference Data Card at the front of this
book.

CMPS290 Class Notes (Chap03) Page 19 / 20 by Kuo-pao Yang


FIGURE 3.25 Remaining MIPS-32 and Pseudo MIPS instruction sets. f means single (s) or double
(d) precision floating-point instructions, and s means signed and unsigned (u) versions. MIPS-32 also
has FP instructions for multiply and add/sub (madd.f/ msub.f), ceiling (ceil.f), truncate (trunc.f), round
(round.f), and reciprocal (recip.f). The underscore represents the letter to include to represent that
datatype.

CMPS290 Class Notes (Chap03) Page 20 / 20 by Kuo-pao Yang

You might also like