Fixed _And_Floating_Point_representation
Fixed _And_Floating_Point_representation
System Sem-1
Architecture
By :Goutam Sanyal
Fixed and floating point representation
STORING NUMBERS
A number is changed to the binary system before being stored in the computer
memory, as described in . However, there are still two issues that need to be
handled:
For the decimal point, computers use two different representations: fixed-point and
floating-point. The first is used to store a number as an integer- without a fraction
part, the second is used to store a number as a real- with a fractional part.
Storing integers
Solution
First change the integer to binary, (111)2. Add five 0s to make a total of eight
bits, (00000111)2. The integer is stored in the memory location. Note that the
subscript 2 is used to emphasize that the integer is binary, but the subscript is
not stored in the computer.
Example
Solution
First change the integer to binary (100000010)2. Add seven 0s to make a total
of sixteen bits, (0000000100000010)2. The integer is stored in the memory
location.
In this method, the available range for unsigned integers (0 to 2n − 1) is divided into
two equal sub-ranges. The first half represents positive integers, the second half,
negative integers.
Solution
The integer is changed to 7-bit binary. The leftmost bit is set to 0. The 8-bit
number is stored.
Example
Store -28 in an 8-bit memory location using sign-and-magnitude
representation.
Solution
The integer is changed to 7-bit binary. The leftmost bit is set to 1. The 8-bit
number is stored.
Two’s complement representation
Almost all computers use two’s complement representation to store a signed integer
in an n-bit memory location. In this method, the available range for an unsigned
integer of (0 to 2n − 1) is divided into two equal sub-ranges. The first sub-range is
used to represent nonnegative integers, the second half to represent negative
integers. The bit patterns are then assigned to negative and nonnegative (zero and
positive) integers as shown in Figure .
Example
The following shows that we always get the original integer if we apply the
two’s complement operation twice.
Storing an integer in two’s complement format:
• The integer is changed to an n-bit binary.
• If it is positive or zero, it is stored as it is. If it is negative, take
the two’s complement and then stores it.
Solution
The integer is positive (no sign means positive), so after decimal to binary
transformation no more action is needed. Note that five extra 0s are added to
the left of the integer to make it eight bits.
Example
Store −28 in an 8-bit memory location using two’s complement
representation.
Solution
The integer is negative, so after changing to binary, the computer applies the
two’s complement operation on the integer.
There is only one zero in two’s complement notation.
A real is a number with an integral part and a fractional part. For example, 23.7 is a
real number—the integral part is 23 and the fractional part is 7/10. Although a fixed-
point representation can be used to represent a real number, the result may not be
accurate or it may not have the required precision. The next two examples explain
why.
Real numbers with very large integral parts or very small fractional parts should
not be stored in fixed-point representation.
Example
Example
7,452,000,000,000,000,000,000.00
The three sections are the sign (+), the shifter (21) and the fixed-point part
(7.425). Note that the shifter is the exponent.
Some programing languages and calculators shows the number as +7.425E21
Example
Solution
We use the same approach as in the previous example—we move the decimal
point after the digit 2, as shown below:
The three sections are the sign (-), the shifter (-14) and the fixed-point part
(2.32). Note that the shifter is the exponent.
(.1)2= (1 x2-1)10
(10)2= (1 x21)10
(100)2= (1 x22)10
(1)2 1 x20 1 0
(10)2 1 x21 1 1
(100)2 1 x22 1 2
(.011)2 11x2-3 11 -3
Example
−(0.00000000000000000000000101)2
in floating-point representation.
Solution
We use the same idea, keeping only one digit to the left of the decimal point.
Normalization
To make the fixed part of the representation uniform, both the scientific method (for the
decimal system) and the floating-point method (for the binary system) use only one non-zero
digit on the left of the decimal point. This is called normalization. In the decimal system this
digit can be 1 to 9, while in the binary system it can only be 1. In the following, d is a non-zero
digit, x is a digit, and y is either 0 or 1.
Note that the point and the bit 1 to the left of the fixed-point section are not stored—
they are implicit.
The mantissa is a fractional part that, together with the sign, is treated like an integer
stored in sign-and-magnitude representation.
Excess_127 and Excess_1023 system
• The exponent, the power that shows how many bits the decimal point
should be moved to the left or right, is a signed number.
• The value of this bias is 2m−1 − 1, where m is the size of the memory
location to store the exponent.
Example
We can express sixteen integers in a number system with 4-bit allocation. By adding
seven units to each integer in this range, we can uniformly translate all integers to the
right and make all of them positive without changing the relative position of the
integers with respect to each other, as shown in the figure. The new system is referred
to as Excess-7, or biased representation with biasing value of 7.
01000000110110000000000000000000
Example
11000011010000111100000000000000
Example
Solution
10111100110000000000000000000000
Retrieving numbers stored in IEEE standard floating point format:
1. Find the value of S,E, and M.
2. If S=0, set the sign to positive, otherwise set the sign to negative.
3. Find the shifter (E-127).
4. Denormalize the mantissa.
5. Change the denormalized number to binary to find the absolute value.
6. Add the sign.
Example
Solution
a. The first bit represents S, the next eight bits, E and the remaining 23 bits, M.
Storing Zero
A real number with an integral part and the fractional part set to zero, that is,
0.0, cannot be stored using the steps discussed above. To handle this special
case, it is agreed that in this case the sign, exponent and the mantissa are set
to 0s.
Truncation errors
The value of the number stored using floating-point representation may not
be exactly as we expect it to be.
Ex: (1111111111111111.11111111111)2
in memory using excess_127 representation. After normalization, we have:
(1.11111111111111111111111111)2
the mantissa has 27 1s. This mantissa needs to be truncated to 23 1s.
(1111111111111111.11111111)2
the difference between the original number and what is retrieved is called
the truncation error.
Thank You