0% found this document useful (0 votes)
6 views

Fixed _And_Floating_Point_representation

The document discusses fixed and floating point representation in computer systems, focusing on how numbers are stored in binary format. It explains the differences between unsigned integers, sign-and-magnitude, and two's complement representations, as well as the significance of floating-point representation for real numbers. Additionally, it covers the IEEE standard for floating-point representation, including normalization and the handling of special cases like zero and truncation errors.

Uploaded by

goutam sanyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Fixed _And_Floating_Point_representation

The document discusses fixed and floating point representation in computer systems, focusing on how numbers are stored in binary format. It explains the differences between unsigned integers, sign-and-magnitude, and two's complement representations, as well as the significance of floating-point representation for real numbers. Additionally, it covers the IEEE standard for floating-point representation, including normalization and the handling of special cases like zero and truncation errors.

Uploaded by

goutam sanyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Course :Computer Class :

System Sem-1
Architecture

Lesson :Fixed and floating


point representation

By :Goutam Sanyal
Fixed and floating point representation
STORING NUMBERS

A number is changed to the binary system before being stored in the computer
memory, as described in . However, there are still two issues that need to be
handled:

1. How to store the sign of the number.


2. How to show the decimal point.

For the decimal point, computers use two different representations: fixed-point and
floating-point. The first is used to store a number as an integer- without a fraction
part, the second is used to store a number as a real- with a fractional part.
Storing integers

Integers are whole numbers (numbers without a fractional part). For


example, 134 and −125 are integers, whereas 134.23 and −0.235 are not.
An integer can be thought of as a number in which the position of the
decimal point is fixed: the decimal point is to the right of the least significant
(rightmost) bit. For this reason, fixed-point representation is used to store an
integer, as shown in Figure. In this representation the decimal point is
assumed but not stored.
Unsigned representation
An unsigned integer is an integer that can never be negative and can take only 0 or
positive values. Its range is between 0 and positive infinity.
0 → (2n -1)

An input device stores an unsigned integer using the following steps:

1. The integer is changed to binary.


2. If the number of bits is less than n, 0s are added to the left.
Example

Store 7 in an 8-bit memory location using unsigned representation.

Solution
First change the integer to binary, (111)2. Add five 0s to make a total of eight
bits, (00000111)2. The integer is stored in the memory location. Note that the
subscript 2 is used to emphasize that the integer is binary, but the subscript is
not stored in the computer.
Example

Store 258 in a 16-bit memory location.

Solution
First change the integer to binary (100000010)2. Add seven 0s to make a total
of sixteen bits, (0000000100000010)2. The integer is stored in the memory
location.

Retrieving unsigned integers


An output device retrieves a bit string from memory as a bit pattern and
converts it to an unsigned decimal integer.
Figure shows what happens if we try to store an integer that is larger than 24 − 1 =
15 in a memory location that can only hold four bits.

Applications of unsigned integers:


Figure 3.5 Overflow in unsigned integers
Counting- Addressing- storing other data types (text, images, audio and video)
Sign-and-magnitude representation

In this method, the available range for unsigned integers (0 to 2n − 1) is divided into
two equal sub-ranges. The first half represents positive integers, the second half,
negative integers.

Figure Sign-and-magnitude representation


Note that we have two 0s: positive zero and negative zero.
Range: -(2n-1 -1) to +(2n-1 -1)
Example
Store +28 in an 8-bit memory location using sign-and-magnitude
representation.

Solution
The integer is changed to 7-bit binary. The leftmost bit is set to 0. The 8-bit
number is stored.
Example
Store -28 in an 8-bit memory location using sign-and-magnitude
representation.

Solution
The integer is changed to 7-bit binary. The leftmost bit is set to 1. The 8-bit
number is stored.
Two’s complement representation

Almost all computers use two’s complement representation to store a signed integer
in an n-bit memory location. In this method, the available range for an unsigned
integer of (0 to 2n − 1) is divided into two equal sub-ranges. The first sub-range is
used to represent nonnegative integers, the second half to represent negative
integers. The bit patterns are then assigned to negative and nonnegative (zero and
positive) integers as shown in Figure .
Example
The following shows that we always get the original integer if we apply the
two’s complement operation twice.
Storing an integer in two’s complement format:
• The integer is changed to an n-bit binary.
• If it is positive or zero, it is stored as it is. If it is negative, take
the two’s complement and then stores it.

Retrieving an integer in two’s complement format:


• If the leftmost bit is 1, the computer applies the two’s
complement operation to the integer. If the leftmost bit is 0,
no operation is applied.
• The computer changes the integer to decimal.
Example
Store the integer 28 in an 8-bit memory location using two’s complement
representation.

Solution
The integer is positive (no sign means positive), so after decimal to binary
transformation no more action is needed. Note that five extra 0s are added to
the left of the integer to make it eight bits.
Example
Store −28 in an 8-bit memory location using two’s complement
representation.

Solution
The integer is negative, so after changing to binary, the computer applies the
two’s complement operation on the integer.
There is only one zero in two’s complement notation.

Overflow in two’s complement representation


Applications: it is the standard representation for storing integers in computers
today.
Storing reals

A real is a number with an integral part and a fractional part. For example, 23.7 is a
real number—the integral part is 23 and the fractional part is 7/10. Although a fixed-
point representation can be used to represent a real number, the result may not be
accurate or it may not have the required precision. The next two examples explain
why.

Real numbers with very large integral parts or very small fractional parts should
not be stored in fixed-point representation.
Example

In the decimal system, assume that we use a fixed-point representation with


two digits at the right of the decimal point and fourteen digits at the left of
the decimal point, for a total of sixteen digits. The precision of a real number
in this system is lost if we try to represent a decimal number such as 1.00234:
the system stores the number as 1.00.

Example

In the decimal system, assume that we use a fixed-point representation with


six digits to the right of the decimal point and ten digits for the left of the
decimal point, for a total of sixteen digits. The accuracy of a real number in
this system is lost if we try to represent a decimal number such as
236154302345.00. The system stores the number as 6154302345.00: the
integral part is much smaller than it should be.
Floating-point representation

The solution for maintaining accuracy or precision is to use floating-point


representation.

Figure The three parts of a real number in floating-point representation

A floating point representation of a number is made up of three parts: a sign, a


shifter and a fixed-point number.

Floating-point representation is used in science to represent very small or very large


decimal numbers. In this representation called scientific notation, the fixed-point
section has only one digit to the left of point and the shifter is the power of 10.
Example

The following shows the decimal number

7,452,000,000,000,000,000,000.00

in scientific notation (floating-point representation).

The three sections are the sign (+), the shifter (21) and the fixed-point part
(7.425). Note that the shifter is the exponent.
Some programing languages and calculators shows the number as +7.425E21
Example

Show the number


−0.0000000000000232

in scientific notation (floating-point representation).

Solution
We use the same approach as in the previous example—we move the decimal
point after the digit 2, as shown below:

The three sections are the sign (-), the shifter (-14) and the fixed-point part
(2.32). Note that the shifter is the exponent.
(.1)2= (1 x2-1)10

(.01)2= (1x 2-2)10

(.001)2= (1x 2-3)10


(1)2= (1 x20)10

(10)2= (1 x21)10

(100)2= (1 x22)10

(.011)2 = (.01)2+ (.001)2= (1x 2-2)10+(1x 2-3)10


=(10x 2-3)10+(1x 2-3)10
=(10+1) x 2-3=11x2-3
Binary Exponent Integer part Exponent
Representation
(.1)2 (1 x2-1) 1 -1
(.01)2 (1x 2-2) 1 -2
(.001)2 (1x 2-3) 1 -3
(.00001)2 (1x 2-5) 1 -5

(1)2 1 x20 1 0
(10)2 1 x21 1 1
(100)2 1 x22 1 2

(.011)2 11x2-3 11 -3
Example

Show the number

−(0.00000000000000000000000101)2

in floating-point representation.

Solution
We use the same idea, keeping only one digit to the left of the decimal point.
Normalization

To make the fixed part of the representation uniform, both the scientific method (for the
decimal system) and the floating-point method (for the binary system) use only one non-zero
digit on the left of the decimal point. This is called normalization. In the decimal system this
digit can be 1 to 9, while in the binary system it can only be 1. In the following, d is a non-zero
digit, x is a digit, and y is either 0 or 1.
Note that the point and the bit 1 to the left of the fixed-point section are not stored—
they are implicit.

The mantissa is a fractional part that, together with the sign, is treated like an integer
stored in sign-and-magnitude representation.
Excess_127 and Excess_1023 system
• The exponent, the power that shows how many bits the decimal point
should be moved to the left or right, is a signed number.

• Although this could have been stored using two’s complement


representation, a new representation, called the Excess system, is used
instead.
• In the Excess system, both positive and negative integers are stored as
unsigned integers.

• To represent a positive or negative integer, a positive integer (called a bias)


is added to each number to shift them uniformly to the non-negative side.

• The value of this bias is 2m−1 − 1, where m is the size of the memory
location to store the exponent.
Example

We can express sixteen integers in a number system with 4-bit allocation. By adding
seven units to each integer in this range, we can uniformly translate all integers to the
right and make all of them positive without changing the relative position of the
integers with respect to each other, as shown in the figure. The new system is referred
to as Excess-7, or biased representation with biasing value of 7.

Figure Shifting in Excess representation


IEEE Standard

Figure IEEE standards for floating-point representation


IEEE Specifications

Storage of IEEE standard floating point numbers:


1. Store the sign in S (0 or 1).
2. Change the number to binary.
3. Normalize.
4. Find the values of E and M.
5. Concatenate S, E, and M.
Example

Show the Excess_127 (single precision) representation of the decimal


number5.75.
Solution

a. The sign is positive, so S = 0.


b. Decimal to binary transformation: 5.75 = (101.11)2.
c. Normalization: (101.11)2 = (1.0111)2 × 22.
d. E = 2 + 127 = 129 = (10000001)2, M = 1011. We need to add nineteen
zeros at the right of M to make it 23 bits.
e. The presentation is shown below:

The number is stored in the computer as

01000000110110000000000000000000
Example

Show the Excess_127 (single precision) representation of the decimal number


–161.875.
Solution

a. The sign is negative, so S = 1.


b. Decimal to binary transformation: 161.875= (10100001.111)2.
c. Normalization: (10100001.111)2 = (1.0100001111)2 × 27.
d. E = 7 + 127 = 134 = (10000110)2 and M = (0100001111)2.
e. Representation:

The number is stored in the computer as

11000011010000111100000000000000
Example

Show the Excess_127 (single precision) representation of the decimal number


–0.0234375.

Solution

a. S = 1 (the number is negative).


b. Decimal to binary transformation: 0.0234375 = (0.0000011)2.
c. Normalization: (0.0000011)2 = (1.1)2 × 2−6.
d. E = –6 + 127 = 121 = (01111001)2 and M = (1)2.
e. Representation:

The number is stored in the computer as

10111100110000000000000000000000
Retrieving numbers stored in IEEE standard floating point format:
1. Find the value of S,E, and M.
2. If S=0, set the sign to positive, otherwise set the sign to negative.
3. Find the shifter (E-127).
4. Denormalize the mantissa.
5. Change the denormalized number to binary to find the absolute value.
6. Add the sign.
Example

The bit pattern (11001010000000000111000100001111)2 is stored in


Excess_127 format. Show the value in decimal.

Solution

a. The first bit represents S, the next eight bits, E and the remaining 23 bits, M.

b. The sign is negative.


c. The shifter = E − 127 = 148 − 127 = 21.
d. This gives us (1.00000000111000100001111)2 × 221.
e. The binary number is (1000000001110001000011.11)2.
f. The absolute value is 2,104,378.75.
g. The number is −2,104,378.75.
Overflow and Underflow

Figure Overflow and underflow in floating-point representation of reals

Storing Zero

A real number with an integral part and the fractional part set to zero, that is,
0.0, cannot be stored using the steps discussed above. To handle this special
case, it is agreed that in this case the sign, exponent and the mantissa are set
to 0s.
Truncation errors

The value of the number stored using floating-point representation may not
be exactly as we expect it to be.
Ex: (1111111111111111.11111111111)2
in memory using excess_127 representation. After normalization, we have:
(1.11111111111111111111111111)2
the mantissa has 27 1s. This mantissa needs to be truncated to 23 1s.
(1111111111111111.11111111)2
the difference between the original number and what is retrieved is called
the truncation error.
Thank You

You might also like