Chapter 1 Computer architecture 2
Chapter 1 Computer architecture 2
scientific research
1
Objectives of the course
At the end of this course, the student should be able to design a
basic computer. Specifically, they should:
✓ Understand the role of each component in the data path of
a calculator.
✓ Understand the basic mechanisms that allow a calculator to
communicate (input/output and interrupt systems).
✓ Master the flow of information in basic circuits and
understand the operation of the control unit (sequencer).
2
Success rate of this course in 2023/2024
Pourcentage
Success Failed
3
References
4
Course Outline
General
presentation of 1
computer
Different types
2
of memories
The components related
to an input/output 3
operation.
4 The input/output
modes.
interrupt system 5
4 Control Unit
(sequencer)
5
Chapter 1:
General presentation of
computer
6
Objectives of this chapter
✓ Apprehend the fundamentals of
computer
✓ Some history about generation of
computers
✓ Types of computers
7
Question
8
Answers
• Companies like Google, IBM, Microsoft, Apple, and Cisco seek
candidates with knowledge of computer architecture, meaning that
understanding architecture can enhance your job prospects.
• Top software engineers understand the underlying hardware .In
other words, knowing about architecture can help you earn
promotions)
• On a practical level, understanding computer architecture is
essential for future courses, including systems programming,
compilers, operating systems, and embedded systems.
• Computer architecture is vital for learning embedded systems,
enabling students to understand hardware interactions and design
innovative solutions.
9
Good News About Architecture
• You will learn to think differently.
• You can grasp the basics without needing to know all the
technical details.
• Programmers only need to learn the essentials:
▪ Key features of major components
▪ Their role in the system
▪ Effects on software
10
Introduction
11
Introduction
• Information can take on various forms, such as digital data,
text, audio, drawings, graphics, images, as well as the
instructions that make up programs.
• These data are expressed and encoded as sequences of
binary digits, represented by 0s and 1s.
▪ Convertion
▪ ASCII code
12
Computer Architecture
Definition
13
Evolution of Computer Systems
• 19th century: Charles Babbage is the pioneer who described the
principles of a general-purpose mechanical calculator.
▪ Hypothetically, Babbage’s Analytical Machine could repeat sequences of
operations and choose a specific series of operations based on the state of
the calculation. Althoug, this machine was not actually completed
▪ In 1840: Building on Babbage's work, Ada Augusta invented and wrote the
first successive iterations (program; the first non-executed computer
programs).
♦ Al-Khawarizmi introduced the concept (780–850) of algorithm, while Ada
Lovelace created the first computer algorithm.
• The Von Neumann model (1946) lays the foundation for universal
machines.
• Generally, five (or six) generations (decisive stages) are
distinguished in the evolution (mainly technological) of these
machines:
14
First Generation: 1945-1958
• Dedicated computers, unique models (specific
tasks → single function)
• Large and unreliable machines (ENIAC weighs
30 tons, covering 167 m²)
• Technology based on vacuum tubes (tubes à
vide en fr.) (very large, less durable, frequent
failures, tended to burn)
• Use of machine language (or assembly
language)
• Drum memory (tambour magnétique en fr.)
• No modern operating system
• Programming done with punched cards (carte
perforée) (one card per program, where errors were
critical) First Computer : ENIAC
15
First Generation: 1945-1958
• Dedicated computers, unique models (specific
tasks → single function)
• Large and unreliable machines (ENIAC weighs
30 tons, covering 167 m²)
• Technology based on vacuum tubes (tubes à
vide en fr.) (very large, less durable, frequent
failures, tended to burn)
• Use of machine language (or assembly
language)
• Drum memory (tambour magnétique en fr.)
• No modern operating system Drum Memory
• Programming done with punched cards (carte
perforée) (one card per program, where errors were
critical)
16
Second generation: 1958-1964
• General-purpose, relatively reliable
machines
• Transistor technology
• Magnetic tape (tube magnétique en fr.)
• Basic operating system
• 105 logic elements
• Use of a few high-level languages such
as Fortran
RCA 501 :First computer based on
Transistors
17
Third Generation: 1964-1971
• Technology of integrated circuits
(Miniaturization)
• More reliable, faster (micro to nanoseconds),
efficient, cheaper, and smaller
• Magnetic disks (disque magnétique en fr.)
• More advanced operating systems: Unix
• Punched cards were replaced by mouse and
keyboards
• Use of high-level programming languages
such as BASIC, PASCAL, etc.
• Use of magnetic memory (more space)
IBM 370
18
Fourth Generation: 1971 to 1980
• Main technology: Processors
• LSI technology (Large Scale Integration,
millions of transistors)
• Emergence of more advanced operating
systems, e.g., MS-DOS, Apple DOS
• RAM, ROM, and CDROM
• Democratization of personal computing
• Use of high-level programming languages
(C, DBASE) Computer of the fourth generation
• Distributed processing
19
Fifth Generation: 1980 to 1990
• Advanced technologies: More advanced
microprocessor, Ultra Large-Scale
Integration (ULSI)
• Interactive distributed systems
• Higher performance, faster, and smaller
• Operating systems: Windows 3.0/3.1,
MacOS, Unix
• Programming languages: C++, Visual
Basic
• Multicore processors
20
Sixth Generation: 1990 and beyond
• More powerful microprocessors,
multiprocessors
• Faster storage, more RAM
• Advanced operating systems: Windows 95, 98
(GUI), Linux, etc
• The rise of the internet via WAN (Wide Area
Networking)
• Miniaturization of hardware components
(Nanotechnology)
• Programming languages: Java, Python, C#
21
Moore's Law
• Moore's Law1 is the
observation (not a law)
that the number of
transistors on an
integrated circuit will
double every two years
with minimal rise in cost.
23
Personal Computers (Microcomputers)
• A small-sized computer designed to be
used by a single user at a time
• The most popular category due to their
affordable prices (prices vary depending
on the configuration)
• Designed for general tasks: Office work,
Internet browsing, and video games
• Usage: Individual and small businesses
24
Workstations
• Workstations are high-performance
computers designed for demanding
tasks.
• They are typically more powerful than
standard PCs, equipped with high-end
components (multi-core processors,
large RAM capacity, advanced graphics
cards).
• Use cases : 3d design (unity), software,
development AI, etc.
Lenovo Thinkpad P50 laptop workstation
25
Server
• A server is a computer or system that
provides resources, data, services, or
programs to other computers, known as
clients, over a network.
• Used to host website and web applications
• Same configuration as personal computers
• Designed for specific tasks or tasks.
• Most of them runs on linux
• Various tasks : including file and print
services, web hosting, and database
management.
26
Mainframe Computers
• Large and high-performance computers
• Designed to handle critical and resource-
intensive tasks simultaneously at a sigh
speed
• Used in professional and institutional
environments, especially in large
corporations. Examples : managing
transaction processing applications (banks),
database management
• Multi-user environnements
• The goal is to ensure the stability and
security of business operations
• Expensive due to their hardware configuration
and features (lot of memory and many
processors)
27
Supercomputers
• The most powerful and high-performing
• Capable of performing complex and
resource-intensive calculations (focus
generally on one complex task). E.g.;
chatGPT.
• Used in scientific research, climate
modeling, drug design, etc.
• Include thousands of processors and are
designed for extreme parallelism.
• Examples: Fugaku 415-PFLOPS, Frontier Fugaku 415-PFLOPS
28
Supercomputers 2
29
Quantum computing
• Exploiting the quantum properties of matter
(superposition and entanglement)
• Based on qubits (quantum bits), allowing a
linear combination of two states.
▪ Superposition: can possess multiple values for a
given observable quantity.
▪ Entanglement (Intrication en français) : the
existence of an invisible link between two
particles.
• Use case : Simulation of quantum systems, Schrödinger’ cat experience
cryptography (using algorithms like Grover or Before opening the box the
Shor algorithm.) cat is the cat simultaneously
lived and died
• Examples: D-Wave One ($10 million), Google
Quantum AI.
30
Different computer architecture
Definition 2
It is the study and description of how the internal components of
a computer work. It covers:
• The type of information processed and how it is encoded,
• The communication between components,
• The logical (not electronic) functioning of the internal
components.
32
Von Neumann Architetcure
• A conceptual model of a The Von Neumann machine
processor. consists of five parts:
• Originated by the mathematician 1. Memory
of the same name in 1945. 2. Arithmetic and Logic Unit (ALU)
• The first description of a 3. Control Unit (CU)
computer where instructions and
data are stored in the same 4. Input Unit (IU)
memory (same buses). 5. Output Unit (OU)
• The foundation of all modern
computers.
33
Basic Component of computer
Hardware
34
Central Unit (CU)
• Also called the CPU (Central Processing
Unit), it is responsible for executing
programs and coordinating between the
different units.
▪ ALU (Arithmetic Logic Unit): performs
arithmetic and logical operations.
▪ CU (Control Unit): manages the execution of
operations (sequencing) and handles the
transfer of information between different units
35
Main Memory (MM)
• Stores information (instructions and
data).
• MM holds the programs currently
being executed.
• After execution, outputs
(intermediate or final) are transferred
to MM.
• It interacts with input/output units
and auxiliary memory (hard drive,
etc.).
36
Auxiliary Memory
• Permanent storage of large
quantities of information.
Examples: magnetic disks (HDD),
solid-state drives (SSD).
37
Input/Output Units
• Communication between the central unit and peripherals.
• The Input Unit means the data we write on a computer which data
the computer receives
• the Output Unit means that the data which is sent by the computer.
• Each peripheral (external device) communicates with the CPU via
the input/output unit.
• Buses: These are the carriers of information exchanged between the
central unit and peripherals.
• I/O controllers : Electronic circuits that control the peripherals and
manage access to the buses.
38
Instruction Format
• An instruction represents an action (command) that a
processor can execute.
• There are different types of instructions (arithmetic, logical,
input/output, transfer, etc.).
• An instruction can occupy one or more memory words.
• It consists of several fields :
▪ Operation code (op code) : This allows the CPU to identify the instruction.
▪ Addressing mode : It specifies how operand is to be founded.
▪ Operands : One or more fields containing an operand or addresses of
operands and/or register numbers.
40
The 4-address format
Example ADD M.A 50 19 13 51
ADD M.A 50 19 13 51
10
10
13
15
19
.
.
.
50 ?
51 Next instruction 41
The 3-address format
• In this type of instruction, the next instruction field has been
omitted, as special register called the Program Counter (PC)
(Compteur Ordinal en français) is responsible for calculating this
address.
• Omitting this field frees up space that can be used to increase the
size of other fields within the instruction.
42
The 3-address format
Program Counter
PC ADD M.A 50 19 13
10
10 ADD M.A 50 19 13
11 Next Instruction
…
13 10
19 15
50 25
51
43
The 2-address format
• The address for the result has been removed.
• The result overwrites the address of the first operand with the
result.
• One field serves two purposes.
44
The 2-address format
ADD M.A 13 19 PC 10
Memory before the execution of
the instruction
10 ADD M.A 13 19
11 Next instruction
13 10
19 15
…
45
The 1-address format
• This type of machine is rarely used.
• The instruction contains only the address of the first operand.
• The second operand is located in the accumulator (i.e., a
special register).
• At the end of the operation, the result is stored in the
accumulator.
Op code Source
46
The 1-address format
Memory
ADD M.A 19
10 ADD M.A 19
Before 10
19 15
…
47
The 0-address format
• A 0-address instruction uses a stack (pile) to hold both
operands and the result
▪ It requires two functions : PUSH and POP
• Operations are performed between:
▪ The value on the Top Of the Stack (TOS) (sommet de la pile).
▪ The second value (SOS) on the stack.
• The result of the operation is stored on the top of the stack
(TOS) (sommet de la pile).
Op Code
48
Examples 2
• Example : expression evaluation a = (b+c)*d - e
3-Address 2-Address 1-Address 0-Address
49
Adressing modes
• The addressing mode specifies the path the central
processing unit (CPU) must take to access an operand.
• The mode is indicated in the addressing mode field within the
instruction.
• There are six main addressing modes:
▪ Immediate Addressing
▪ Direct Addressing
▪ Indirect Addressing
▪ Relative Addressing
▪ Indexed Addressing
51
Immediat addressing
• In this case, the operand is contained in the address field of
the instruction, which is directly embedded within the
instruction.
• Used to specify a constant value within an instruction.
Examples :
LOAD R1, #200
ADD R1, R2, #10
52
Direct Addressing
• The address field contains the effective address of the
operand in main memory, i.e., the address field specifies the
location in main memory.
Example :
LOAD R1,50 (R1 = value at address 50)
ADD R2,100,101 (R2 = value at address 100 + value at address 101)
R1
99
50 99
53
Indirect Addressing
• the address field in the instruction contains the memory
location where the effective address of the operand is present.
• In other words, the address field contains the address of a
pointer. MOV R1 50
• It requires two memory access.
Example : Content of R1:
50 100
LOAD R1,@50 10
100 10
54
Relative addressing
• This addressing mode is generally used with branch
(branchement) instructions.
• The address field specifies an offset relative to a base, which
is often the content of the program counter.
MC
• Effective address = (base) + offset
Example : PC (base): 50 BR 10
JMP 10
60
100
55
Indexed Addressing
• The address of the operand is
obtained by adding the contents of Index
the Index Register (IR) to the value in M
register
the address field of the instruction. C
10 LOAD R1 5 First
• Effective address = address + IR
element of
• This type is often used to define the table
arrays, where the index allows to
identify an element in the array. 10
Exemple : Offset of
LOAD R1, 5 5
15 100
R1 100
56
Indirect Indexed Addressing
• the operand field contains
the address of a pointer that Index Register MC
holds the offset to be added 5
to the index register. LOAD R1 5
• What is the content of R1 5 20
after the execution of
LOAD R1, (5) ? 10 2
20 10 R1
?
25 20
57
Instruction set
• An instruction set, or instruction set architecture (ISA), is a list of all
the commands (instructions), with all their variations, that a
processor can execute.
• Each machine has its own instruction set that is specific to it.
• Two instruction set architectures have marked the history of
machines:
❑ The RISC architecture (Reduced Instruction Set Computer), which has a
reduced instruction set and a simple hardware architecture.
❑ Uses cases : Mobile, embedded systems
❑ The CISC architecture (Complex Instruction Set Computer), which has a
complex instruction set. An additional microprogramming layer where
instructions act as small programs.
❑ Use cases : Desktop, laptops, severs, etc
58
CISC
0 0 0 1 0 1 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 0
0 0 0 1 0 0 0 1 0 1
RISC
0 0 1 0 1 0 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 0
59
Different stages of the instruction cycle
• To execute an instruction, a classic CPU goes through a
sequential cycle of 3 phases: Fetch, Decode, Execute
1. Fetch Stage:
1. Load the contents of the Program Counter (PC) into the Memory
Address Register (MAR).
2. Initiate a memory read and store the fetched word in the Memory
Instruction Register (MIR)
3. Transfer the contents of MIR (Memory Instruction Register) to the
Instruction Register (IR).
4. Increment the Program Counter: PC = PC + 1
60
Different stages of the instruction cycle
• To execute an instruction, a classic CPU goes through a
sequential cycle of 3 phases: Fetch, Decode, Execute
2. Decode Stage:
1. The Control Unit (CU) analyzes the instruction code to determine the
elementary operations to be performed
2. CU identifies the registers or memory words involved.
61
Different stages of the instruction cycle
• To execute an instruction, a classic CPU goes through a sequential
cycle of 3 phases: Fetch, Decode, Execute
3. Execute Stage:
➢ Order the actual execution of the instruction, launched by the sequencer
and sent to the ALU.
➢ Fetch the required operands from registers or memory.
❑ Transfer of the operand address (contained in the instruction) to the MAR
(Memory Address Register).
❑ Initiate a memory read (of all operands if necessary).
❑ Transfer the contents of the MIR (Memory Instruction Register) to one of
the ALU registers.
➢ Store the result in the appropriate destination (memory or register)
62
Example
Show the steps of execution of the addition instruction in direct mode
ADD R1,10 5 MC
CU R1
1 ADD R1,10
1
PC
? 10 50
MAR
Address Register
ALU
? ?
MIR
IR
Memory Instruction Register
Instruction
Register 63
Improvement of the basic architecture: Pipelining and
Superscalar.
• Since the proposal of the Von Neumann architecture,
researchers have continuously sought to improve processor
performance.
• The main points of this evolution:
▪ Clock frequency.
▪ Instruction set.
▪ Cache memory.
▪ Parallelization and optimization of instruction execution (Pipelining and
Superscalar).
64
Clock Frequency
• The processor's clock speed measures how
fast the processor can execute instructions.
• Each clock pulse triggers the execution of
tasks by the system's electronic circuits.
• It is typically expressed in gigahertz (GHz)
(Hertz = Gigahertz × 10⁹).
• The higher the clock speed, the faster the
processor can perform calculations and
operations.
• Some CPU uses overclocking to improve the
performance if necessary.
65
Example
• What is the frequency of this cpu ?
66
Vanilla Machine (without Pipeline)
• In a computer architecture without a pipeline, instructions are
executed one after the other.
• A new instruction only begins execution when the previous
instruction is completely finished.
67
Vanilla Machine (without Pipeline) II
• An instruction is executed in 3 phases (Fetch, Decode,
Execution).
• To execute 𝑛 instructions, this model requires 𝑛×𝑘 cycles
(where 𝑘 is the number of phases for an instruction).
F1 F2 F3
D1 D2 D3
Number
E1 E2 E3 of cycles
1 2 3 4 5 6 7 8 9
68
Pipilening
• This model is inspired by industrial assembly lines.
• Pipelining is a technique where multiple instructions are overlapped
during execution which reduces the program execution time.
• The principle is to break down instruction execution into steps, with
each step using a specific function of the processor.
• The number of cycles to execute an instruction: n+k−1.
Where n => number of instruction and k=> the required cycles.
n k-1
F1 F2 F3
D1 D2 D3
Number of
E1 E2 E3 cycles
1 2 3 69
4 5 6 7 8 9
Example
• Indicate the execution duration (in seconds) on a 4-cycle
processor with a clock frequency of 500 Hz for the program
below, both without and with pipelining.
• Calculate the gain of performance obtained using pipeline.
70
Superscalar architecture
• The superscalar approach consists of equipping the processor with
several processing units working in parallel.
• Instructions are distributed among the different execution units.
• This mechanism requires a high-performance cache.
• For this, the compiler must be able to identify parallelism in the
source code.
• It must also be capable of reorganizing the source code so that
instructions can be executed in parallel.
• The superscalar architecture is used in many modern processors,
such as Intel Core i7 processors and AMD Ryzen processors.
71
A single processing unit.
I1 I2 I3 I4 I5 I6 I7 I8
N cycles
N/2 cycles
72
Cache Memory
• The processor cache is a small, fast memory that stores frequently used
data and instructions.
• The processor can access it quickly without having to retrieve them from
the main memory, which is slower and further away.
• The cache can improve the performance and efficiency of the processor by
reducing latency and bandwidth usage for memory accesses.
• The cache is typically divided into several levels, such as L1, L2, and L3,
which have different sizes and speeds.
73
Multi-core
• A multi-core microprocessor is a microprocessor with multiple
physical cores operating simultaneously.
• A core is a set of circuits capable of executing programs
independently.
• The more cores a processor has, the more tasks it can
manage simultaneously, which can improve the performance
of multitasking and multithreaded applications.
• However, not all applications can take advantage of multiple
cores, and some cores may be inactive or underutilized.
74
GPU (graphics processing unit)
• It is designed for parallel processing and handling multiple tasks
simultaneously, such as 3D rendering and visual effects.
• GPUs are used in a variety of applications, including video games,
3D modeling software, virtual reality applications, and artificial
intelligence.
• There are two types: integrated GPUs and dedicated GPUs (a
separate component from the CPU).
• Fast Memory Access: GPUs typically have high-speed memory (such
as GDDR6) that allows for rapid data transfer between the GPU and
the memory.
• AMD and Nvidia are the two main manufacturers of GPUs.
75
Pipilening
• This model is inspired by industrial assembly lines.
• Pipelining is a technique where multiple instructions are overlapped
during execution which reduces the program execution time.
• The principle is to break down instruction execution into steps, with
each step using a specific function of the processor.
F1 F2 F3
D1 D2 D3
Number of
E1 E2 E3 cycles
1 2 3 4 5 6 7 8 9
76
The end.
77