0% found this document useful (0 votes)
32 views

CAO - Unit IV Notes

Cao Unit IV note based on Arithmetic (Fixed and Floating Point)

Uploaded by

Snehal Dongre
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

CAO - Unit IV Notes

Cao Unit IV note based on Arithmetic (Fixed and Floating Point)

Uploaded by

Snehal Dongre
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Unit 4

NUMBERS, ARITHMETIC OPERATIONS AND CHARACTERS NUMBER


REPRESENTATION
• Numbers can be represented in 3 formats:

1) Sign and magnitude

2) 1's complement

3) 2's complement

• In all three formats, MSB=0 for +ve numbers & MSB=1 for -ve numbers.

In sign-and-magnitude system,

negative value is obtained by changing the MSB from 0 to 1 of the corresponding positive value.

For ex, +5 is represented by 0101 &

-5 is represented by 1101.

In 1's complement system,

negative values are obtained by complementing each bit of the corresponding positive number.

For ex, -5 is obtained by complementing each bit in 0101 to yield 1010.

(In other words, the operation of forming the 1's complement of a given number is equivalent to
subtracting that number from 2n-1).

In 2's complement system,

forming the 2's complement of a number is done by subtracting that number from 2n.

For ex, -5 is obtained by complementing each bit in 0101 & then adding 1 to yield 1011. (In other words,
the 2's complement of a number is obtained by adding 1 to the 1's complement of that number).

• 2's complement system yields the most efficient way to carry out addition/subtraction operations.
ADDITION & SUBTRACTION OF SIGNED NUMBERS

• Following are the two rules for addition and subtraction of n-bit signed numbers using the 2's
complement representation system (Figure 1.6).

Rule 1:

 To Add two numbers, add their n-bits and ignore the carry-out signal from the MSB position.

 Result will be algebraically correct, if it lies in the range -2n-1 to+2n-1-1.

Rule 2:
 To Subtract two numbers X and Y (that is to perform X-Y), take the 2's complement of Y and then
add it to X as in rule 1.

 Result will be algebraically correct, if it lies in the range (2n-1) to +(2n-1-1).

• When the result of an arithmetic operation is outside the representable-range, an arithmetic overflow is
said to occur.

• To represent a signed in 2's complement form using a larger number of bits, repeat the sign bit as many
times as needed to the left. This operation is called sign extension.

• In 1's complement representation, the result obtained after an addition operation is not always correct.
The carry-out(cn) cannot be ignored. If cn=0, the result obtained is correct. If cn=1, then a 1 must be
added to the result to make it correct.

• OVERFLOW IN INTEGER ARITHMETIC

• When result of an arithmetic operation is outside the representable-range, an arithmetic overflow is


said to occur.

• For example: If we add two numbers +7 and +4, then the output sum S is 1011( 0111+0100), which is
the code for -5, an incorrect result.

• An overflow occurs in following 2 cases

1) Overflow can occur only when adding two numbers that have the same sign.

2) The carry-out signal from the sign-bit position is not a sufficient indicator of overflow when
adding signed numbers

Half Adder

A Half-adder circuit needs two binary inputs and two binary outputs. The input variable shows the
augend and addend bits whereas the output variable produces the sum and carry.

o 'x' and 'y' are the two inputs, and S (Sum) and C (Carry) are the two outputs.
o The Carry output is '0' unless both the inputs are 1.
o 'S' represents the least significant bit of the sum.

The simplified sum of products (SOP) expressions is:

S = x'y+xy', C = xy

The logic diagram for a half-adder circuit can be represented as:


Full Adder

This circuit needs three binary inputs and two binary outputs.

Two of the input variable 'x' and 'y', represent the two significant bits to be added.

The third input variable 'z', represents the carry from the previous lower significant position.

The outputs are designated by the symbol 'S' for sum and 'C' for carry.
Adder Circuits

4 Bit Ripple Carry Adder/ Serial Adder

Here carry is rippled to next stage of addition. So, it is


referred ripple carry adder.

Here addition is forwarded serially. So it is also referred serial


carry adder.

It takes n numbers of machine cycles to add n bits serially. So,


it is slow adder.
DESIGN OF FAST ADDERS
Ripple carry adder is unable for computation of a large number of bits, as more stages have to be
addedwhich makes the delay much worse. Therefore to solve this solution, carry look ahead Adder
was introduced.

Carry Look ahead Adder(CLA) / Parallel


Adder:
Ripple carry adder is unable for computation of a large number of bits, as more stages have to be
addedwhich makes the delay much worse. Therefore to solve this solution, carry look ahead Adder
was introduced.

CLA adds all the bits simultaneously.


If we “Predict” or “Look ahead” carry, then only simultaneous addition is possible
CLA adds carry with independent on numbers of bits to be added. So it is very fast compared to
serialadder.
CLA is fastest adder circuit.it reduces the propagation delay, which occurs during addition.
It is designed by transforming the ripple carry adder circuit such that the carry logic of the adder
is changed into two level logic.
Signed and unsigned binary numbers

Decimal Unsigned Sign and 1’s complement 2’s complement


magnitude
+10 1010 0|1010 0|1010 0|1010
-10 ---------- 1|1010 1|0101 1|0110
-25 ----------- 1|11001 1|00110 1|00111

1’s complement invert each digit 10 and 01. There is no unsigned representation of negative
numbers. Add 1 to 1’s complement to get 2’s complement. Negative numbers are represented in 2’s
complement form .Only negative numbers have sign , 1’s and 2’s complement.

Range of numbers: (for 3 bit number : 0 to 7)

Binary number Unsigned Sign and 1’s complement 2’s complement


number magnitude
number
000 0 +0 0 0
001 1 +1 1 1
010 2 +2 2 2
011 3 -1 3 3
100 4 -0 -3 -4
101 5 -1 -2 -3
110 6 -2 -1 -2
111 7 -3 -0 -1

Range of unsigned numbers: 0 to 2n-1

Range of sign and magnitude numbers: -(2n-1 -1) to (2n-1 -1)

Range of 2’s complement numbers: (-2n-1 ) to (2n-1 -1)


Multiplication of unsigned number

Question) Multiply the two unsigned number 15*15

Solution) The binary representation of 15 is : 1111*1111

1 1 1 1

*1 1 1 1

Carry- (10) 2 1

3 1 1 1 1

3 11 1 1

2 1 1 1 1

11 1 1

___________________________________________________________________________

1 1 10 0 0 0 1 ( 2=10 ; 4=100;6=110; 6=110; 5=101; 3=11)

15*15=225

11100001=1*2^7+1*2^6+1*2^5+0*2^4+0*2^3+0*2^2+0*2^1+1*2^0

128+64+32+0+0+0+0+1=225

The Main drawback of this manual multiplication of unsigned number is that more number of
register is needed to store the partial product as well as more cycle is needed for summation
operation.

The solution for this is the optimization algorithm which is also called as the accumulated addition
approach. The flow chart for this unsigned multiplication method is shown below:
Start

M = Multiplicand

Q= Multiplier

C,A=0

Count= n

No Yes
Is q(0)=1 C,A=A+M

Shift Right C, A ,Q

Count=Count-1

NO
Count=0

Yes

Final Product is in register A, Q

Example: 11*13 M(multiplicand)=11= 1011 Q(multiplier)=13=1101

Count= Number of Bits= 4

C A Q

0 0000 110 1 (q(0)) As q(0)=1 performC,A=A+M


A+M=0000+1011=1011

0 1011 1101 lost

Shift Right C,A,Q

Count=count-1

Count=4-1=3
0 0101 111 0q(0)

Step 1

_______________________________________________________________________________

0 0010 111 1q(0) perform Shift Right as q(0)=0 2


Count=count-1=3-1=2

Step=2

__________________________________________________________________________________

0 perform C, A=A+M

A+M=0010+1011=1101

0 1101 1111 perform shift Right

Count=count-1=2-1=1

0 0110 111 1q(0)

Step 3

__________________________________________________________________________________
perform C, A=A+M

A+M=0110+1011= 1 0001

1 0001 1111 perform shift right, count=1-1=0

0 1000 1111

Step=4

_________________________________________________________________________________

Final product is in register A,Q; A Q

1000 1111 (in decimal its value is 1+2+4+8+128=143)

11*13=143
Signed Multiplication: (Booth’s Multiplication or Bit Pair Multiplication)
1. Signed multiplication is performed based on the bit pair operation
2. Bit pair operation are formulated based on the multiplier
3. To get initial bit pair there is a need of one assumption that q(-1) is always set to 0

Example : Multiplier assumption

q3 q2 q1 q0 q(-1)

Bit pairs Booth coding operation

00 0 SR(shift right)

01 +1 addition and shift right

10 -1 Subtraction and SR

11 0 SR

Question) Consider following Booth’s multiplier and identify the recorded Booth’s multiplier and
also identify how many alternative operations are required during the multiplication process.

Multiplier: q(-1)

101101101011 0

Bit pair Booth’s coding

10 -1

11 0

01 +1

10 -1

01 +1

10 -1 Total operations are:

11 0 -1+1 0 -1+1 0 -1 +1 -1 +1 0 -1= 9operations

01 +1

10 -1

11 0

01 +1

10 -1______
Booth’s Multiplication:

The sequence of operations required in the Booth’s multiplication is shown below:

Start

M = Multiplicand

Q= Multiplier

q(-1), A=0

Count= n

10 01
q(0) , q(-1) A=A+M

A=A-M Shift Right A ,Q,q(-1)

Count=Count-1

NO
Count=0

Yes

Final Product is in register A, Q

Examples of Booth’s Algorithm

Multiplier(Q)= 7 =0111 Multiplicand(M)= -6 =0110 =2’s complement = 1010=M

Register size is 4(value of count)

Q(7) q(-1)
0111 0

Bit Pair Booth’s Coding

10 -1

11 0

11 0

01 +1

A Q q(-1)

0000 011 1q(0) 0 Initial

_________________________________________________________

A=A-M

A=0000-M=0000+(2’s complement of 1010)

A=0000+0110=0110

0110 0111 0

0011 0011 1 perform SR , count=count-1=4-1=3

Step 1

As q(0)=1 and q(-1)=1 only Shift Right is performed

Count=count-1=3-1=2

0011 0011 1 (lost) ( SR performed)

0001 1001 1

Step 2

As q(0)=1 and q(-1)=1 only Shift Right is performed

Count=count-1=2-1=1

0001 1001 1

0000 1100 1 (SR is Performed)

Step 3

As q(0)=0 and q(-1)=1 A=A+M=0000+1010=1010

1010 1100 1 Shift Right is performed; Count=count-1=1-1=0

101 0110 0

Step 4
Final result in A and Q

A Q

Sign magnitude

1 | 101 0110 (0*2^0+1*2^1+1*2^2+0*2^3+1*2^4+0*2^5+1*2^6-1*2^7)

( 0+2+4+0+16+0+64-128) = -42

Division

The bits of the dividend are examined from left to right, until the sets of bits examined represents a
number greater than or equal to the divisor. The divisor being able to divide the number until this
event occurs 0’s are placed in the quotient from left to right when the event occurs ,a 1 is placed in
the quotient and the divisor is subtracted from the partial dividend. The result is referred to as a
partial remainder.

Example: a) 9/3 = 1001/011 11----

SOLVE YOURSELF

b) 56/7=111000/111 ;111------

SOLVE YOURSELF

There are two techniques of division:

1. Restoring division
2. Non- Restoring division
Start

A=0 ; M=Divisor; Q= Dividend Count= n

Shift Left C,A,Q

C,A =A-M

Yes
NO C=1 ? q(0)=0
q(0)=1
A=A+M

Count=Count-1

No Yes
Count=0
Stop
Question) Perform division of the following number using Restoring Method

Dividend(Q) = 17

Divisor(M)= 03

17/3  17= (10001)2 (Q) ; 3= (00011)2 (M) number of bits=5; Divisor should be (n+1) bits for
handling borrow .

Restoring Method :

1. Left shift
2. Subtraction
3. Carry control bits (using NOT gate)

Solution) Dividend(Q) = 17

Divisor(M)= 03

17/3  17= (10001)2 (Q) ; 3= (00011)2 (M) number of bits=5; Divisor should be (n+1) bits for

handling borrow.

03=(000011)2 2’s complement (111101)(-M(Divisor))

Therefore M= 000011 (-M)= 111101

C A Q(17 Dividend) :

0 00000 1000 1(q(0)) Initial step (count=5)

Left shift and (C,A= A-M)

0 00001 0001 _ C,A=00001+111101=111110

1 11110 00010 (invert of C) C=0 or 1(C=1, so (q(0))=0)

C,A=A+M=11110+000011(restoring operation)

=(discard 1) 000001

0 00001 00010 Count=count-1=5-1=4

Step 1

Left shift perform

0 00010 0010_ C,A=A-M=00010+111101=111111

1 11111 00100 (invert of C) C=0 or 1( as C=1 so q(0)=0)

C,A=A+M=11111+000011 =1000010(restoring operation)

0 00010 00100 count=count-1=4-1=3

Step 2

Left shift perform


0 00100 0100_ C,A=A-M =000100+111101=1 000001
(C =0 or 1; as C=0 q(0)=1)

0 00001 01001(invert of C)

Count=3-1=2

Step 3

0 00001 01001

Perform left shift

0 00010 1001_ C,A=A-M=00010+111101

=111111

C=0 or 1(as C=1 q(0)=0)

1 11111 10010 A=A+M=11111+000011=1 000010

0 00010 10010 count=2-1=1

Step 4

0 00010 10010

Perform left shift

0 00101 0010_ C,A=A-M=00101+ 111101=1 000010

0 00010 00101 C=0 or 1(as C=0 q(0)=1)

Count=0

Quotient in Q=00101(decimal value 5) Remainder in A=00010 (decimal value 2)

2) Non – Restoring Division:

Perform division using Non-Restoring division : (10)10 /(3)10


Start

A=0 ; M=Divisor; Q= Dividend


Count= n ;C=0

Shift Left Yes NO Shift Left C,A,Q


A is
C,A,Q; A=A-M positive A=A+M

NO Yes
q(0)=0 A is positive q(0)=1

Count=Count-1

No Yes A
Count=0

Yes
A=A+M A is negative

NO

End
M(divisor)=03= 0011 =(n+1) bits ; M= 00011 ; -M=11101 (2’s complement of M)

C A Q

0 0000 1010 Initial step

Step 1

0 0001 010_ : Left Shift( A is positive ; so perform LS and C,A=A-M)

C,AA-M: 00001+ 11101=11110

1 1110 0100 (q(0)=0) Count=4-1=3

Step 2

1 1100 100_ : 1 means A is negative so perform LS and C,A=A+M

C,A=11100+00011=11111

1 1111 1000 : As Carry( C ) is 1 its invert will be the value of q(0)=0

: count=3-1=2

Step 3

1 1111 000_ : 1 means A is negative so perform LS and C,A=A+M

C,A=11111+00011=1 00010

0 0010 0001 : As Carry( C ) is 0 its invert will be the value of q(0)=1

: count= 2-1=1

Step 4

0 0100 001_ : 0 means A is positive so perform LS and C,A=A-M

C,A=00100+11101=1 00001

0 0001 0011 : As Carry( C ) is 0 its invert will be the value of q(0)=1

: count= 1-1=0

Result : As A is positive ; Remainder is in A ; Q is Quotient

Divide 10 by 3(10/3) * Remainder is 1 (0001)2) and quotient is 3(0011)2)


Floating point numbers
To represent very large numbers and very small fraction takes more memory space . Example:
978000000000000 or 0.00000000978

a. To represent the above number with less memory space there is a need of a special format
called as floating point format.
b. The general form of floating point is: (+-) M*B + - e
+ - = sign ; M : Mantissa ; B: Base/ Radix ; e: exponent
To store any number into floating point format verify the number is in general format or not
Example 1 : 9780000000000000. * 10 0 – e (shift right ; exponent Will increase(positive))

M= 9780000000000000 ; B= 10 ; 9.78 * 10 + 13

Example 2 : 0.0000000000978 * 10 0

9.78 * 10 -10 (shift left exponent will be negative)

To store the floating point number in the main memory, there is a needof internal structure of the
floating point number. Floating point number consist of three parts:

1. Sign
2. Biased exponent
3. Mantissa
Sign bit is either positive(0) or negative(1)
Bias exponent is a equal to Actual exponent +Bias. Bias is a maximum possible positive
exponent.

Bias Exponent Range= -(2n-1) to (2n-1 -1) ; if n=6 ; Range is -32 to +31; if n=9 Range is -256 to +255

Mantissa floating point number is always represented in the form of normalized format. The general
form of normalization is : 1.bbbb…..; (1 before decimal point is by default ;and the value of b is 0/1)

Question) Consider the following number and represent it in 20 bit hypothetical floating point
format. + 101011 ; Check General Form: (+-)M * B+- e ; 101011. * 20 (check the
normalization(1.bbbb…))

1.01011 *25 Mantissa=01011 ; Bias=+31 ; sign = 0(positive) ; Actual Exponent= 5

Floating point format is

1 6 13

S(sign) Bias Exponent Mantissa

Bias Exponent= Actual Exponent + Bias =5+(+31)= +36 (Range of Bias=26-1 - 1 =31)

Binary form of 36= 100010 Mantissa =0101100000000

0 100|010 0|1011|0000|0000= 44B00

Range of floating point numbers: IEEE standard floating point number. According to IEEE Standard
format 2 kind of floating point format are defined

1. Single precision floating point number(32 bit)


2. Double precision floating point number (64 bit)
1. 32 bit floating point format:

__________________________ 32 _________________________________________

Sign Biased Exponent Mantissa


1 8 23

General Format: (+-) M*B+- e ; Actual Exponent=Biased Exponent – Bias

Bias= +2n-1 - 1=128-1=128-1=127

Minimum Biased Exponent = 00000000 =0-127= -127 ; Maximum= 11111111=255 ; 255-127=+128

Mantissa: 1.00000000000000000000000

Min= 1 ; maximum= 1.11111111111111111111111(perform shift left on mantissa upto 23 times)=


111111111111111111111111.(24 bits)

(224 -1) * 2-23 = 2-2-23

(2) 64 bit floating point format:

__________________________________ 64 bit ________________________________

Sign Biased Exponent Mantissa


1 11 bit 52 bit

a. Bias Range value = 211-1 – 1= 1024-1=1023

b. Actual Exponent= Bias Exponent- Bias ; minimum exponent=00000000000 =0-1023=-1023

Maximum Exponent= 11111111111 ; 211-1-1023=2048-1-1023=2047-1023=+1024

c. Mantissa : Min= 1.0000000---52 times=1 ; Maximum=1.1111----upto 52 bits (perform shift


left(exponent will be negative) on mantissa upto 52 times)=11111111111----- 53 bits ; (253 -
1)*2-52 =(2- 2-51.2)

Question) The value of a float type variable is represented using the single precision 32 bit floating
point format IEEE 754 standard that uses 1 bit for sign 8 bit for biased exponent and 23 bits for
mantissa, a float type variable X is assigned the decimal value of -14.25 .The representation of X in
hexadecimal notation is:
___________________________________32bit__________________________________________

sign Biased Exponent Mantissa


1 8 23

X=-14.25 ; Bias=28-1 – 1=128-1=127; sign= 1(negative) =(1110.01)2

General format of floating point is : (+-) M*B+- e = 1110.01 *20 = 1.11001*2+3

Biased Exponent (B.E)= Actual Exponent + Bias=3+127=130 ; (binary format of 130 is=(10000010)2

1 100|0001|0 110|0100|0000|0000|0000|0000

C1640000 (answer)

You might also like