0% found this document useful (0 votes)

14 views

Lecture 14 - Arithmetic Subsystems - Numbering Systems and Floating Point Unit (FPU)

Uploaded by

celestemelody

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Lecture 14 - Arithmetic Subsystems - Numbering Systems and Floating Point Unit (FPU)

Uploaded by

celestemelody

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Department of Systems and Engineering

Design, Carleton University

SYSC 3320 Computer Systems Design

Number Systems and
Floating Point Unit (FPU)

1
Copyright/Source

• Credit for material: Dr. Mohamed Atia

2
What we learn in this lecture
• Number systems
• Integer number representations
• Fractional number representations
• Fixed-point representations
• Floating point representations

3
Number Representation in Digital Systems
• Numbers processing is basic part of any computing system. Numbers are
represented in digital hardware using Base 2 Positional Weighting System.
• Numbers can be categorized to
• Integer and
• Fractional numbers

Example: decimal 173 in binary

Integer numbers in Base 2 (Binary) 4

Number Representation in Digital Systems
• Integer representation is straight forward and mathematical operations on integers
are relatively easy and efficient to implement in hardware.
• Adder, Multiplier and Divider Circuits for Integer Binary Numbers

4-bit Array Multiplier

5
Signed Binary Numbers: Sign/Magnitude
• Sign/Magnitude Numbers
➢ 1 sign bit, N-1 magnitude bits
➢ Sign bit is the most significant (left-most) bit
o Positive number: sign bit = 0
o Negative number: sign bit = 1

• Example:4-bit sign/mag representations of ± 6:

➢ +6 = 0110
➢ - 6 = 1110
• Range: N-bit sign/magnitude number:
➢ [-(2N-1-1), 2N-1-1]

6
Signed Binary Numbers: Sign/Magnitude
• Problems
– Addition doesn’t work, for example -6 + 6:
1110
+ 0110
10100 (wrong!)

– Two representations of 0 (± 0):

1000
0000

7
Signed Binary Numbers: Two’s Complement
• Don’t have same problems as sign/magnitude numbers:
➢ Addition works
➢ Single representation for 0
• Most positive 4-bit number: 0111
• Most negative 4-bit number: 1000
• The most significant bit still indicates the sign
• (1 = negative, 0 = positive)
• Range of an N-bit two’s comp number: [-(2N-1), 2N-1-1]
• Taking the Two’s Complement
1. Invert the bits
2. Add 1
• Example: Flip the sign of 310 = 00112
8
Sign-Extension: Increasing Number of Bits

• Sign bit copied to most significant bits

• Number value is same
• Example 1:
– 4-bit representation of 3 = 0011
– 8-bit sign-extended value: 00000011
• Example 2:
– 4-bit representation of -5 = 1011
– 8-bit sign-extended value: 11111011

9
Number System Comparison

Number System Range

Unsigned [0, 2N-1]
Sign/Magnitude [-(2N-1-1), 2N-1-1]
Two’s Complement [-2N-1, 2N-1-1]

For example, 4-bit representation:

-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 Unsigned

1000 1001 1010 1011 1100 1101 1110 1111 0000 0001 0010 0011 0100 0101 0110 0111 Two's Complement

0000
1111 1110 1101 1100 1011 1010 1001 0001 0010 0011 0100 0101 0110 0111 1000 Sign/Magnitude

10
Fractional Numbers
• Fractional numbers are more complex and more challenging in hardware
implementation.
• Specialized co-processors and optimized hardware accelerators are usually
designed specifically to process fractional numbers.
• Two main types of fractional number representations:
• Fixed point representation
• Floating point representation

11
Fractional Numbers: Fixed-Point representation

• Fixed number of bits are considered for integer part

• More bits means bigger range of numbers
• Fixed number of bits are considered for fractional part
• More bits means higher accuracy
• Assume an imaginary decimal point to separate the integer and
fractional digits

Fractional numbers in Base 2

1
2
Fractional Numbers: Fixed-Point representation
It can be represented by 1’s complement or 2’s complement formats as
shown.
In both encodings, the smallest numerical difference between the
decoded numbers is 𝟐−𝑹. For example, in integers numbers where R=0,
smallest increment is 20=1.
• 𝟐−𝑹 quantifies the “imprecision” of the representation.

L R
2’s complement

x = −bL−1 2 L−1 + i=− R bi 2 i , bi {0,1}

i=L−2

w = bL−1 …b0 .b−1 …b−R , bi {0,1}

13
Fractional Numbers: Fixed-Point representation
• Imprecision: The smallest difference between two consecutive numbers
under certain word length and fraction point position. Imprecision = 2−𝑅
L R
➢ Integer length L
➢ Fraction Length R
➢ Word Length B = L+R w = bL−1 …b0 .b−1 …b−R , bi {0,1}

2−𝑅+2−𝑅 + 2−𝑅

2−𝑅+2−𝑅

2−𝑅 2−𝑅
0

−2−𝑅
i
−2−𝑅 − 2−𝑅

−2−𝑅 −2−𝑅 −2−𝑅

1
4
Fractional Numbers: Fixed-Point representation
• Because the computer has limited number of digits to store numbers, some
numbers such as 7 or 𝜋 cannot be exactly represented due to the
imprecision. The number of digits the computer uses to store numbers is
called “significant digits” or “significant figures”.
w

2−𝑅+2−𝑅 + 2−𝑅

2−𝑅+2−𝑅

2−𝑅 2−𝑅
0

−2−𝑅
i
−2−𝑅 − 2−𝑅

−2−𝑅 −2−𝑅 −2−𝑅

15
Fractional Numbers: Fixed-Point representation
• The imprecision can be decreased by increasing R. For fixed word size B,
increasing R means decreasing L. Decreasing L decreases the largest
numbers that can be represented.

• Example: For a 4-digits signed representation

– L=R=2
– the imprecision is …
– the largest magnitude is …
Sign bit Decimal Point

– If L=1 and R=3

– imprecision is decreased to …
– the largest magnitude is halved to … Sign bit Decimal Point

1
6
Fractional Numbers: Fixed-Point representation
• The imprecision can be decreased by increasing R. For fixed word size B,
increasing R means decreasing L. Decreasing L decreases the largest
numbers that can be represented.

• Example: For a 4-digits signed representation

– L=R=2
– the imprecision is 0.25
– the largest magnitude is 1.75 = 1+0.5+0.25
Sign bit Decimal Point

– If L=1 and R=3

– imprecision is decreased to 0.125
– the largest magnitude is halved to 0.875 = 0+.5+.25+ 0.125 Sign bit Decimal Point

17
Fractional Numbers: Fixed-Point representation
• For a fixed word length B, there is a tradeoff between the range (the
interval from the largest positive to the largest negative number)
and imprecision of the represented numbers, i.e., improving the
precision entails decreasing the range.
• This tradeoff results in errors in computer numerical calculations. As the
smallest number that can be represented in fixed-point representation is
2-R, if a number x has fractions less than 2-R, one of the following errors
will occur:
➢ Truncation: drop off the fraction that is less than 2−𝑅
➢ Round-off : approximate the fraction that is less than 2-R to be exactly 2-R

• In either case, error is bounded by 2-R. If two numbers x and y less

than 2-R apart are subtracted, result will be truncated to zero (error).
• To control this error, it is better to have a “floating” fraction point not
fixed point. This will lead us to the Floating-Point format.

18
Floating-Point Representation
• Floating-Point Representation. A more flexible representation that can
accommodate large and small numbers. Implemented by allowing the
fraction point to be floating not fixed. This allows the imprecision to vary
with numeric magnitude.

Fixed-point

Floating-point

• Floating-point representation makes the imprecision proportional to the

magnitude of number (good idea!)

1
9
Floating-Point Representation
• A Floating-point binary word forma consists of
➢ Sign bit
➢ Signed exponent (usually integer)
➢ Mantissa (Can be any number, usually fraction in fixed point format)
➢ And base b (In binary equals 2)

• The value of the number in this form = -1(sign-bit) x (mantissa) x (base)exponent

20
Floating-Point Representation
• Note: by varying the exponent, we vary the fraction point
position. This means the fraction point is “floating”

Exponent Mantissa

21
Floating-Point Representation
1
• Mantissa normalization: Consider the number =
34
0.02941176 . Possible Floating-point representation is
0.0294*100. However, the leading zero in 0.0294 is useless
and we lost significant digits. A solution to this limitation is to
normalize the mantissa as follows:
➢ 0.2941𝑥10−1(an additional significant digit is retained)
1
• A normalized mantissa has limited range: ≤𝑚<1
𝑏𝑎𝑠𝑒
1
➢ Minimum mantissa =
𝑏𝑎𝑠𝑒
➢ Maximum mantissa is obviously less than “1” (it is a fraction)

22
IEEE-754 Floating-point Format
• IEEE-754
➢ Single precision (SP) : 32 bit word
o 1 bit mantissa sign, 8 bit exponent,23 bits mantissa fraction part

➢ Double precision (DP) : 64 bit word

o 1 bit mantissa sign,11 bit exponent,52 bits mantissa

• Mantissa =
➢ 1 + (𝑏222−1 + 𝑏212−2 + ⋯ . . + 𝑏02−23) for SP
➢ 1 + (𝑏512−1 + 𝑏502−2 + ⋯ . . + 𝑏02−52) for DP
• Exponent = the encoded unsigned integer – Bias
➢ 𝑒 = 𝑒𝑏 − 𝐵𝑖𝑎𝑠
➢ Bias = 127 for SP , Bias = 1023 for DP
23
IEEE-754 Floating-point Format
• Example: convert the following IEEE-754 SP formatted number to decimal:

0 01000000 01111101110000000000010

➢ 1 bit sign
➢ 8 bits for Exponent (bias is +127)
➢ 23 bits for mantissa
• Exponent 𝑒 = 26 − 127 = −63
• Mantissa = 1 + 2−2 + 2−3 +2−4 +2−5 + 2−6 + 2−8 + 2−9 + 2−10 + 2−22 = 1.4912
• Number = 1.4912 ∗ 2−63 = 1.6168𝑥10−19

24
Special Numbers in IEEE-754 Floating-point Format
• IEEE-754 has three special numbers:

Number Sign Exponent Fraction

0 X 00000000 00000000000000000000000
∞ 0 11111111 00000000000000000000000
-∞ 1 11111111 00000000000000000000000
NaN X 11111111 non-zero

• Encoded exponent reserved for the special numbers

➢ 0...0 (all zeroes) for zero
➢ 1...1 (all ones) for inf. And NaN.
• The permissible exponent values non-special numbers
➢ - 126 to 127 for SP
➢ - 1022 to 1023 for DP
2
5
Numeric Range of IEEE-754 Floating-point Format

• IEEE-754 DP numeric range is shown below:

• Multiplying 1.0 × 10−200 with 5.0 × 10−200 results in underflow

6.1×10150
• Performing the division results in overflow
4.0×10−200

• Possible causes for NaN: 0/0 or −10

2
6
Floating-Point Addition
1. Extract exponent and fraction bits
2. Prepend leading 1 to form mantissa
3. Compare exponents
4. Shift smaller mantissa if necessary
5. Add mantissas
6. Normalize mantissa and adjust exponent if necessary
7. Round result if necessary
8. Assemble exponent and fraction back into floating-point format

27
Floating-Point Addition Example
• Extract exponent and fraction bits
1 bit 8 bits 23 bits
0 01111111 100 0000 0000 0000 0000 0000
Sign Exponent Fraction
1 bit 8 bits 23 bits
0 10000000 101 0000 0000 0000 0000 0000
Sign Exponent Fraction

For N1: S1 = 0, BE1 = 127, E1 = 127-127=0, F1 = .1

For N2: S2 = 0, BE2 = 128, E2 = 128-127=1, F2 = .101

• Prepend leading 1 to form mantissa

N1= 1.1 × 20 = 0.110 × 21
N2= 1.101 × 21 = 1.101 × 21

28
Floating-Point Multiplication/Division
• Multiplication
➢ Multiply mantissas and add exponents!
• Division
➢ Divide mantissas and subtract exponents!

29
Floating Point Unit (FPU)
• Floating point operations can be done by hardware (circuitry) or by
software (program code). The programmer will not know which
design is used without prior knowledge of the system hardware
design. Software method is approximately 1000 times slower than
hardware method. The hardware unit that performs floating point
arithmetic operations is called Floating Point Unit (FPU).
• FPU is also known as “Floating Point ‘Co-processor’”. In SoC design,
this co-processor is embedded within the processor core in a separate
section.

30
Floating Point Unit (FPU)
• Floating Point Unit (FPU):
• The Zynq SoC Platform Dual-core ARM A9 Processor has an
FPU co-processor within each processor core

3
1
Thank You ☺

Architecture Is Participation
No ratings yet
Architecture Is Participation
20 pages
Evidence Riano Lecture Notes
100% (2)
Evidence Riano Lecture Notes
22 pages
NIKE INC (Reformulation)
No ratings yet
NIKE INC (Reformulation)
3 pages
Fixed Point and Floating Point Number Representations
No ratings yet
Fixed Point and Floating Point Number Representations
7 pages
3-EED220 Lecture 3
No ratings yet
3-EED220 Lecture 3
22 pages
Floating Point Representation: Reading: B&O 2.4
No ratings yet
Floating Point Representation: Reading: B&O 2.4
44 pages
Cacc
No ratings yet
Cacc
106 pages
Finite Word Length Effects
No ratings yet
Finite Word Length Effects
31 pages
Lec07 - Computer Arithmetic - Floating-Point Representation and Arithmetic
No ratings yet
Lec07 - Computer Arithmetic - Floating-Point Representation and Arithmetic
42 pages
Module 2 - PART D Floating
No ratings yet
Module 2 - PART D Floating
30 pages
Number System
No ratings yet
Number System
38 pages
Floating Point & fixed point Representation_BCA II
No ratings yet
Floating Point & fixed point Representation_BCA II
24 pages
Lecture 4 - Computer Arithmetic
No ratings yet
Lecture 4 - Computer Arithmetic
18 pages
Fixed and Floating Point Representation
No ratings yet
Fixed and Floating Point Representation
5 pages
Fixed Point and Floating Point Number Representations
No ratings yet
Fixed Point and Floating Point Number Representations
5 pages
Architetture Dei Calcolatori 2425 079 092_e00eb5fe056df070c4c89d6aa133367a
No ratings yet
Architetture Dei Calcolatori 2425 079 092_e00eb5fe056df070c4c89d6aa133367a
14 pages
DSP Arithmetic
No ratings yet
DSP Arithmetic
33 pages
#3 - Floating Point
No ratings yet
#3 - Floating Point
38 pages
COA UNIT-III PPTs Dr.G.Bhaskar ECE
No ratings yet
COA UNIT-III PPTs Dr.G.Bhaskar ECE
64 pages
arch1-LECTURE-NUMBER REPRESENTATION
No ratings yet
arch1-LECTURE-NUMBER REPRESENTATION
42 pages
L4
No ratings yet
L4
29 pages
Unit 2
No ratings yet
Unit 2
16 pages
Module2.1 of nothing
No ratings yet
Module2.1 of nothing
7 pages
3.1 Data Representation: 3.1.3 Real Numebrs and Normalized Floating-Point Representation
No ratings yet
3.1 Data Representation: 3.1.3 Real Numebrs and Normalized Floating-Point Representation
14 pages
Chapter 1 - Izaac-Wang - Computational Quantum Mechanics (2018)
No ratings yet
Chapter 1 - Izaac-Wang - Computational Quantum Mechanics (2018)
12 pages
IEEE Standard 754
No ratings yet
IEEE Standard 754
10 pages
Unit 5_share
No ratings yet
Unit 5_share
38 pages
COA - Unit 2 Data Representation 1
No ratings yet
COA - Unit 2 Data Representation 1
59 pages
Fixed & Floating Point
No ratings yet
Fixed & Floating Point
31 pages
Lect4 Floats
No ratings yet
Lect4 Floats
64 pages
FIXED and FLOAT
No ratings yet
FIXED and FLOAT
8 pages
EE 109 Unit 20: IEEE 754 Floating Point Representation Floating Point Arithmetic
No ratings yet
EE 109 Unit 20: IEEE 754 Floating Point Representation Floating Point Arithmetic
31 pages
Fix and Floting Systems
No ratings yet
Fix and Floting Systems
28 pages
Lecture Slides Week4
No ratings yet
Lecture Slides Week4
42 pages
Computer Arithmetic Representations
No ratings yet
Computer Arithmetic Representations
24 pages
Floating Point 6up
No ratings yet
Floating Point 6up
7 pages
Floating Points
No ratings yet
Floating Points
31 pages
CH03-Data-II(2) (2)
No ratings yet
CH03-Data-II(2) (2)
31 pages
Number Representation
No ratings yet
Number Representation
7 pages
Fixed Point Numbers
No ratings yet
Fixed Point Numbers
20 pages
Computer Architecture & Organization Unit 2
No ratings yet
Computer Architecture & Organization Unit 2
24 pages
Lecture4
No ratings yet
Lecture4
154 pages
3. Floating_Point_Number
No ratings yet
3. Floating_Point_Number
36 pages
COMP0068 Lecture10 High Level Data Types
No ratings yet
COMP0068 Lecture10 High Level Data Types
25 pages
Machine Level Representation of Data Part 3
100% (1)
Machine Level Representation of Data Part 3
32 pages
Fixed Versus Floating Point
No ratings yet
Fixed Versus Floating Point
5 pages
4-Floating-Point-inclass
No ratings yet
4-Floating-Point-inclass
33 pages
Week8 Slides
No ratings yet
Week8 Slides
43 pages
Soc2040 SP Week 5 Lecture1 Slides On Data Representation Part4 Spring 2024
No ratings yet
Soc2040 SP Week 5 Lecture1 Slides On Data Representation Part4 Spring 2024
46 pages
DSDV Group Study
No ratings yet
DSDV Group Study
10 pages
02 - Data Representation - Exercise Sheet 2 (Solutions) - 1271758327
No ratings yet
02 - Data Representation - Exercise Sheet 2 (Solutions) - 1271758327
5 pages
Computer Organisation
No ratings yet
Computer Organisation
4 pages
Week 5: IEEE Floating Point Revision Guide For Phase Test
No ratings yet
Week 5: IEEE Floating Point Revision Guide For Phase Test
23 pages
4.4_1 New Floating Point.pptx
No ratings yet
4.4_1 New Floating Point.pptx
22 pages
Floating Point Numbers
No ratings yet
Floating Point Numbers
8 pages
Floating Point
No ratings yet
Floating Point
33 pages
The IEEE Standard For Floating Point Arithmetic
No ratings yet
The IEEE Standard For Floating Point Arithmetic
9 pages
Slide8-Number Systems and Number Representations
No ratings yet
Slide8-Number Systems and Number Representations
24 pages
COA
No ratings yet
COA
14 pages
Computer Arithmetic Representations
No ratings yet
Computer Arithmetic Representations
24 pages
Lecture11 Slides 1
No ratings yet
Lecture11 Slides 1
52 pages
Binary Tutorial
No ratings yet
Binary Tutorial
10 pages
Principles of Digital Electronics
From Everand
Principles of Digital Electronics
Sapana Rane
No ratings yet
Accompanying Vs Non Accompanying Family Members
No ratings yet
Accompanying Vs Non Accompanying Family Members
3 pages
Resume 2021
No ratings yet
Resume 2021
1 page
[FREE PDF sample] Encyclopedia of Geomorphology 2 Volume Set 1st Edition Andrew Goudie (Editor) ebooks
100% (21)
[FREE PDF sample] Encyclopedia of Geomorphology 2 Volume Set 1st Edition Andrew Goudie (Editor) ebooks
35 pages
Welsh Legends & Myths
No ratings yet
Welsh Legends & Myths
2 pages
International Trade Theories
No ratings yet
International Trade Theories
5 pages
A221 SSYA 1013 Individual Assignment
No ratings yet
A221 SSYA 1013 Individual Assignment
4 pages
I WAS GANDHIJI'S JAILER Summary
No ratings yet
I WAS GANDHIJI'S JAILER Summary
3 pages
Learning NHibernate 4 - Sample Chapter
0% (1)
Learning NHibernate 4 - Sample Chapter
21 pages
A1-A2 General Knowledge Quiz
No ratings yet
A1-A2 General Knowledge Quiz
32 pages
Owth and Yield Performances of (Abelmoschus Esculentus) (Nur Suraya Abdullah) PP 230-233
No ratings yet
Owth and Yield Performances of (Abelmoschus Esculentus) (Nur Suraya Abdullah) PP 230-233
4 pages
Clock and Data Recovery PHD Thesis
100% (3)
Clock and Data Recovery PHD Thesis
6 pages
Assessment of Executive Functions in SCH
No ratings yet
Assessment of Executive Functions in SCH
13 pages
Thesis Oedipus Rex
100% (3)
Thesis Oedipus Rex
6 pages
Sen Hyaku Kyuu Juu Ichi)
No ratings yet
Sen Hyaku Kyuu Juu Ichi)
5 pages
Properties of Triangle Solved Questions
No ratings yet
Properties of Triangle Solved Questions
38 pages
Identification: Group 6 - Reack
No ratings yet
Identification: Group 6 - Reack
8 pages
Superintendingcontrol
No ratings yet
Superintendingcontrol
10 pages
Perioperative Medication Management
100% (1)
Perioperative Medication Management
6 pages
The Elasticity of Labor Demand
No ratings yet
The Elasticity of Labor Demand
37 pages
Materi Structure Skills 6-8
No ratings yet
Materi Structure Skills 6-8
4 pages
Case Study On Sab Miller India
No ratings yet
Case Study On Sab Miller India
163 pages
10 Cu Sinif WORD DEFINITION-9
No ratings yet
10 Cu Sinif WORD DEFINITION-9
2 pages
First Aid Lecture Notes
No ratings yet
First Aid Lecture Notes
6 pages
09 Moon Landing - Conspiracy Theories That It's All A Fake - KidsNews
No ratings yet
09 Moon Landing - Conspiracy Theories That It's All A Fake - KidsNews
4 pages
Curriculum Vitae Felista
No ratings yet
Curriculum Vitae Felista
3 pages
Module X - Specfic Relief Act - Final
0% (1)
Module X - Specfic Relief Act - Final
53 pages

Lecture 14 - Arithmetic Subsystems - Numbering Systems and Floating Point Unit (FPU)

Uploaded by

Lecture 14 - Arithmetic Subsystems - Numbering Systems and Floating Point Unit (FPU)

Uploaded by

Department of Systems and Engineering

Design, Carleton University

SYSC 3320 Computer Systems Design

• Credit for material: Dr. Mohamed Atia

Example: decimal 173 in binary

Integer numbers in Base 2 (Binary) 4

4-bit Array Multiplier

• Example:4-bit sign/mag representations of ± 6:

– Two representations of 0 (± 0):

• Sign bit copied to most significant bits

Number System Range

For example, 4-bit representation:

• Fixed number of bits are considered for integer part

Fractional numbers in Base 2

x = −bL−1 2 L−1 + i=− R bi 2 i , bi {0,1}

w = bL−1 …b0 .b−1 …b−R , bi {0,1}

−2−𝑅 −2−𝑅 −2−𝑅

−2−𝑅 −2−𝑅 −2−𝑅

• Example: For a 4-digits signed representation

– If L=1 and R=3

• Example: For a 4-digits signed representation

– If L=1 and R=3

• In either case, error is bounded by 2-R. If two numbers x and y less

• Floating-point representation makes the imprecision proportional to the

• The value of the number in this form = -1(sign-bit) x (mantissa) x (base)exponent

➢ Double precision (DP) : 64 bit word

Number Sign Exponent Fraction

• Encoded exponent reserved for the special numbers

• IEEE-754 DP numeric range is shown below:

• Multiplying 1.0 × 10−200 with 5.0 × 10−200 results in underflow

• Possible causes for NaN: 0/0 or −10

For N1: S1 = 0, BE1 = 127, E1 = 127-127=0, F1 = .1

• Prepend leading 1 to form mantissa

You might also like