Lecture 14 - Arithmetic Subsystems - Numbering Systems and Floating Point Unit (FPU)
Lecture 14 - Arithmetic Subsystems - Numbering Systems and Floating Point Unit (FPU)
1
Copyright/Source
2
What we learn in this lecture
• Number systems
• Integer number representations
• Fractional number representations
• Fixed-point representations
• Floating point representations
3
Number Representation in Digital Systems
• Numbers processing is basic part of any computing system. Numbers are
represented in digital hardware using Base 2 Positional Weighting System.
• Numbers can be categorized to
• Integer and
• Fractional numbers
5
Signed Binary Numbers: Sign/Magnitude
• Sign/Magnitude Numbers
➢ 1 sign bit, N-1 magnitude bits
➢ Sign bit is the most significant (left-most) bit
o Positive number: sign bit = 0
o Negative number: sign bit = 1
6
Signed Binary Numbers: Sign/Magnitude
• Problems
– Addition doesn’t work, for example -6 + 6:
1110
+ 0110
10100 (wrong!)
7
Signed Binary Numbers: Two’s Complement
• Don’t have same problems as sign/magnitude numbers:
➢ Addition works
➢ Single representation for 0
• Most positive 4-bit number: 0111
• Most negative 4-bit number: 1000
• The most significant bit still indicates the sign
• (1 = negative, 0 = positive)
• Range of an N-bit two’s comp number: [-(2N-1), 2N-1-1]
• Taking the Two’s Complement
1. Invert the bits
2. Add 1
• Example: Flip the sign of 310 = 00112
8
Sign-Extension: Increasing Number of Bits
9
Number System Comparison
-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 Unsigned
1000 1001 1010 1011 1100 1101 1110 1111 0000 0001 0010 0011 0100 0101 0110 0111 Two's Complement
0000
1111 1110 1101 1100 1011 1010 1001 0001 0010 0011 0100 0101 0110 0111 1000 Sign/Magnitude
10
Fractional Numbers
• Fractional numbers are more complex and more challenging in hardware
implementation.
• Specialized co-processors and optimized hardware accelerators are usually
designed specifically to process fractional numbers.
• Two main types of fractional number representations:
• Fixed point representation
• Floating point representation
11
Fractional Numbers: Fixed-Point representation
1
2
Fractional Numbers: Fixed-Point representation
It can be represented by 1’s complement or 2’s complement formats as
shown.
In both encodings, the smallest numerical difference between the
decoded numbers is 𝟐−𝑹. For example, in integers numbers where R=0,
smallest increment is 20=1.
• 𝟐−𝑹 quantifies the “imprecision” of the representation.
L R
2’s complement
13
Fractional Numbers: Fixed-Point representation
• Imprecision: The smallest difference between two consecutive numbers
under certain word length and fraction point position. Imprecision = 2−𝑅
L R
➢ Integer length L
➢ Fraction Length R
➢ Word Length B = L+R w = bL−1 …b0 .b−1 …b−R , bi {0,1}
2−𝑅+2−𝑅 + 2−𝑅
2−𝑅+2−𝑅
2−𝑅 2−𝑅
0
−2−𝑅
i
−2−𝑅 − 2−𝑅
1
4
Fractional Numbers: Fixed-Point representation
• Because the computer has limited number of digits to store numbers, some
numbers such as 7 or 𝜋 cannot be exactly represented due to the
imprecision. The number of digits the computer uses to store numbers is
called “significant digits” or “significant figures”.
w
2−𝑅+2−𝑅 + 2−𝑅
2−𝑅+2−𝑅
2−𝑅 2−𝑅
0
−2−𝑅
i
−2−𝑅 − 2−𝑅
15
Fractional Numbers: Fixed-Point representation
• The imprecision can be decreased by increasing R. For fixed word size B,
increasing R means decreasing L. Decreasing L decreases the largest
numbers that can be represented.
1
6
Fractional Numbers: Fixed-Point representation
• The imprecision can be decreased by increasing R. For fixed word size B,
increasing R means decreasing L. Decreasing L decreases the largest
numbers that can be represented.
17
Fractional Numbers: Fixed-Point representation
• For a fixed word length B, there is a tradeoff between the range (the
interval from the largest positive to the largest negative number)
and imprecision of the represented numbers, i.e., improving the
precision entails decreasing the range.
• This tradeoff results in errors in computer numerical calculations. As the
smallest number that can be represented in fixed-point representation is
2-R, if a number x has fractions less than 2-R, one of the following errors
will occur:
➢ Truncation: drop off the fraction that is less than 2−𝑅
➢ Round-off : approximate the fraction that is less than 2-R to be exactly 2-R
18
Floating-Point Representation
• Floating-Point Representation. A more flexible representation that can
accommodate large and small numbers. Implemented by allowing the
fraction point to be floating not fixed. This allows the imprecision to vary
with numeric magnitude.
Fixed-point
Floating-point
1
9
Floating-Point Representation
• A Floating-point binary word forma consists of
➢ Sign bit
➢ Signed exponent (usually integer)
➢ Mantissa (Can be any number, usually fraction in fixed point format)
➢ And base b (In binary equals 2)
20
Floating-Point Representation
• Note: by varying the exponent, we vary the fraction point
position. This means the fraction point is “floating”
Exponent Mantissa
21
Floating-Point Representation
1
• Mantissa normalization: Consider the number =
34
0.02941176 . Possible Floating-point representation is
0.0294*100. However, the leading zero in 0.0294 is useless
and we lost significant digits. A solution to this limitation is to
normalize the mantissa as follows:
➢ 0.2941𝑥10−1(an additional significant digit is retained)
1
• A normalized mantissa has limited range: ≤𝑚<1
𝑏𝑎𝑠𝑒
1
➢ Minimum mantissa =
𝑏𝑎𝑠𝑒
➢ Maximum mantissa is obviously less than “1” (it is a fraction)
22
IEEE-754 Floating-point Format
• IEEE-754
➢ Single precision (SP) : 32 bit word
o 1 bit mantissa sign, 8 bit exponent,23 bits mantissa fraction part
• Mantissa =
➢ 1 + (𝑏222−1 + 𝑏212−2 + ⋯ . . + 𝑏02−23) for SP
➢ 1 + (𝑏512−1 + 𝑏502−2 + ⋯ . . + 𝑏02−52) for DP
• Exponent = the encoded unsigned integer – Bias
➢ 𝑒 = 𝑒𝑏 − 𝐵𝑖𝑎𝑠
➢ Bias = 127 for SP , Bias = 1023 for DP
23
IEEE-754 Floating-point Format
• Example: convert the following IEEE-754 SP formatted number to decimal:
0 01000000 01111101110000000000010
➢ 1 bit sign
➢ 8 bits for Exponent (bias is +127)
➢ 23 bits for mantissa
• Exponent 𝑒 = 26 − 127 = −63
• Mantissa = 1 + 2−2 + 2−3 +2−4 +2−5 + 2−6 + 2−8 + 2−9 + 2−10 + 2−22 = 1.4912
• Number = 1.4912 ∗ 2−63 = 1.6168𝑥10−19
24
Special Numbers in IEEE-754 Floating-point Format
• IEEE-754 has three special numbers:
2
6
Floating-Point Addition
1. Extract exponent and fraction bits
2. Prepend leading 1 to form mantissa
3. Compare exponents
4. Shift smaller mantissa if necessary
5. Add mantissas
6. Normalize mantissa and adjust exponent if necessary
7. Round result if necessary
8. Assemble exponent and fraction back into floating-point format
27
Floating-Point Addition Example
• Extract exponent and fraction bits
1 bit 8 bits 23 bits
0 01111111 100 0000 0000 0000 0000 0000
Sign Exponent Fraction
1 bit 8 bits 23 bits
0 10000000 101 0000 0000 0000 0000 0000
Sign Exponent Fraction
28
Floating-Point Multiplication/Division
• Multiplication
➢ Multiply mantissas and add exponents!
• Division
➢ Divide mantissas and subtract exponents!
29
Floating Point Unit (FPU)
• Floating point operations can be done by hardware (circuitry) or by
software (program code). The programmer will not know which
design is used without prior knowledge of the system hardware
design. Software method is approximately 1000 times slower than
hardware method. The hardware unit that performs floating point
arithmetic operations is called Floating Point Unit (FPU).
• FPU is also known as “Floating Point ‘Co-processor’”. In SoC design,
this co-processor is embedded within the processor core in a separate
section.
30
Floating Point Unit (FPU)
• Floating Point Unit (FPU):
• The Zynq SoC Platform Dual-core ARM A9 Processor has an
FPU co-processor within each processor core
3
1
Thank You ☺
32