Notes II
Notes II
Bus architecture in computer systems refers to the design and organization of the communication
pathways that connect different components of a computer. These pathways, known as buses, are
responsible for transferring data, addresses, and control signals between the CPU, memory, and
peripheral devices. The design of the bus architecture significantly impacts the performance,
scalability, and overall functionality of the computer system. Below are the main components and
types of bus architectures:
1. Data Bus: Transfers actual data between the CPU, memory, and peripherals. The width of the
data bus (number of bits it can carry simultaneously) directly impacts data transfer rates.
2. Address Bus: Carries the addresses of the memory locations or I/O devices where data needs
to be read or written. The width of the address bus determines the maximum addressable
memory.
3. Control Bus: Carries control signals to coordinate and manage the operations of the
computer system. These signals include read/write commands, interrupt requests, and clock
signals.
Description: All components (CPU, memory, peripherals) share a single bus for data, address,
and control signals.
Pros:
Cons:
Description: Uses multiple buses to connect different components, often separating the
system into different subsystems (e.g., a separate memory bus and I/O bus).
Pros:
Cons:
Description: Organizes buses into a hierarchy, typically with a high-speed system bus at the
top level connecting the CPU and main memory, and lower-speed peripheral buses (e.g., PCI,
USB) connecting peripheral devices.
Pros:
Cons:
o Potential for latency when data must traverse multiple bus levels.
2. USB (Universal Serial Bus): A widely used standard for connecting peripheral devices such as
keyboards, mice, storage devices, and more.
3. I2C (Inter-Integrated Circuit): A low-speed bus used for connecting low-speed peripherals like
sensors and microcontrollers in embedded systems.
4. SPI (Serial Peripheral Interface): A high-speed bus used for short-distance communication,
typically in embedded systems.
5. CAN (Controller Area Network): A robust bus standard used in automotive and industrial
applications for communication between microcontrollers and devices.
1. Performance Requirements: The need for high-speed data transfer and low latency.
2. Scalability: The ability to add more devices or increase memory capacity without significant
redesign.
3. Compatibility: Ensuring the bus design works with existing standards and devices.
4. Cost: Balancing performance with the cost of additional bus lines and arbitration
mechanisms.
In summary, bus architecture is a critical aspect of computer system design that affects how
efficiently and effectively different components communicate with each other. Choosing the right bus
architecture involves considering the specific needs and constraints of the system, including
performance, scalability, compatibility, cost, and power consumption.
Bus Arbitration in Computer Architecture
Bus arbitration is a mechanism used in computer architecture to manage access to a shared bus
among multiple devices. Since multiple devices, such as CPUs, memory, and peripherals, may need to
use the bus to transfer data, bus arbitration ensures that only one device can use the bus at any
given time, preventing conflicts and ensuring orderly communication.
1. Avoid Data Collisions: When multiple devices try to communicate simultaneously over the
same bus, data collisions can occur, leading to data corruption.
2. Resource Management: Proper bus arbitration allows fair and efficient sharing of the bus
among devices, ensuring that high-priority devices get timely access while low-priority
devices are not starved.
There are several methods of bus arbitration, each with its own way of managing access to the bus.
Two common methods are:
In daisy chaining, devices are connected in series along the bus, forming a chain. The bus grant signal
passes from one device to the next in the chain.
Operation:
o A bus request signal is sent to the bus arbiter by any device needing the bus.
o If the first device in the chain does not need the bus, it passes the grant signal to the
next device in the chain.
o This process continues until the bus grant signal reaches a device that needs the bus.
Pros:
o Simple to implement.
Cons:
o Priority is fixed, meaning devices at the beginning of the chain have higher priority.
o Devices at the end of the chain may suffer from longer wait times (starvation).
2. Centralized Arbitration
In centralized arbitration, a single bus arbiter (a dedicated controller) is responsible for managing
access to the bus.
Operation:
o The arbiter decides which device gets access to the bus based on a pre-defined
priority scheme or a round-robin algorithm.
o The arbiter sends a bus grant signal to the selected device, allowing it to use the bus.
Pros:
o Flexible and can implement various priority schemes (fixed, rotating, etc.).
Cons:
o Requires additional hardware (the arbiter), which increases system complexity and
cost.
o Single point of failure: If the central arbiter fails, bus arbitration is disrupted.
Pipelining
Concept of Pipelining in Computer Architecture
A typical instruction pipeline is divided into several stages, each responsible for a specific part of the
instruction execution process. The main stages are:
2. Decode (ID): The fetched instruction is decoded to understand what actions are needed.
5. Write Back (WB): The results of the instruction execution are written back to the register file.
Consider a non-pipelined CPU where each instruction takes five cycles to complete, one for each
stage (IF, ID, EX, MEM, WB). If we execute three instructions (A, B, and C) sequentially:
Cycle: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Instruction A: IF ID EX MEM WB
Instruction B: IF ID EX MEM WB
Instruction C: IF ID EX MEM WB
Now, consider a pipelined CPU with the same stages. The instructions are overlapped:
Cycle: 1 2 3 4 5 6 7 8 9
Instruction A: IF ID EX MEM WB
Instruction B: IF ID EX MEM WB
Instruction C: IF ID EX MEM WB
In this pipelined scenario, it takes only 9 cycles to complete three instructions. The overlapping of
stages means that after the pipeline is filled, one instruction completes every cycle.
2. Efficient Utilization of CPU Resources: Each stage of the pipeline is working on a different
instruction, leading to better utilization of CPU resources.
Challenges of Pipelining
1. Pipeline Hazards: Situations that prevent the next instruction in the pipeline from executing
in the next cycle. Types of hazards include:
o Control Hazards: Caused by branch instructions that change the flow of control.
2. Pipeline Stalls: When the pipeline must be halted to resolve hazards, reducing efficiency.
Despite these challenges, pipelining is a fundamental technique that significantly boosts the
performance of modern CPUs by enabling parallelism at the instruction level.
Micro-operations:
Shift Operations in Computer Architecture
Shift operations are used to move bits within a binary number to the left or right. They are fundamental operations in computer arithmetic
and logic. The main types of shift operations are logical and arithmetic shifts, both of which can be performed either to the left or to the
right.
Operation: Moves all bits in the binary number to the left by a specified number of positions.
Effect: The vacated bit positions on the right are filled with zeros.
Operation: Moves all bits in the binary number to the right by a specified number of positions.
Effect: The vacated bit positions on the left are filled with zeros.
Operation: Moves all bits in the binary number to the left by a specified number of positions.
Effect: Similar to logical shift left; the vacated bit positions on the right are filled with zeros.
Note: For unsigned numbers, arithmetic shift left is the same as logical shift left.
Operation: Moves all bits in the binary number to the right by a specified number of positions.
Effect: The vacated bit positions on the left are filled with the original most significant bit (sign bit), preserving the sign of the
number.
Equivalent: Dividing the number by 2 for each position shifted, while keeping the sign.
Original: 1010
Shifted: 0100
Explanation:
Original: 1010
Shifted: 0101
Explanation:
Original: 1010
Shifted: 0100
Explanation:
For unsigned numbers, arithmetic shift left is the same as logical shift left.
Original: 1010
Shifted: 1101
Explanation:
The leftmost bit is filled with the original most significant bit (1), preserving the sign.
1-bit Arithmetic Shift Left (ASL): 1010 becomes 0100 (same as LSL).
Register Transfer Level (RTL) is a level of abstraction used in the design and description of digital circuits. It focuses on the flow of data
between hardware registers and the logical operations performed on that data. RTL is used to describe the functional behavior of a digital
circuit in terms of the data transfers between registers and the combinational logic that processes this data.
1. Registers:
o At the RTL level, the focus is on the transfer of data between these registers.
2. Data Transfers:
o RTL descriptions specify how data moves from one register to another.
o This involves defining the source and destination of the data, as well as the conditions under which the transfer
occurs.
3. Combinational Logic:
o These blocks perform operations such as arithmetic, logical, and comparison operations based on the inputs they
receive.
4. Control Logic:
o Control signals determine the timing and sequence of data transfers and operations.
o These signals are often generated by a control unit or finite state machine (FSM) and are critical for the proper
functioning of the digital circuit.
1. Abstraction Level:
RTL provides an abstraction level that is higher than the gate level but lower than the algorithmic or behavioral level.
This level of abstraction allows designers to focus on the data flow and operations without worrying about the low-level
implementation details.
2. Design Specification:
RTL descriptions are used to specify the functionality of digital circuits in hardware description languages (HDLs) like VHDL or
Verilog.
These descriptions serve as the basis for synthesizing the design into gate-level representations.
RTL designs can be simulated to verify their functional correctness before moving to physical implementation.
Simulation allows designers to test the behavior of the design under various conditions and identify any functional errors early
in the design process.
4. Synthesis:
RTL descriptions are input to synthesis tools that convert them into gate-level netlists.
The synthesis process maps the high-level RTL description to the specific logic gates and interconnections required for the final
hardware.
5. Optimization:
RTL allows for the optimization of data paths and control logic.
Designers can optimize for performance, area, and power consumption by refining the RTL design.
They facilitate communication between different teams, such as design, verification, and implementation teams.
In programmed I/O, the CPU actively participates in the data transfer process. The CPU continuously checks (polls) the status of an I/O
device to see if it is ready for data transfer. When the device is ready, the CPU executes the necessary instructions to transfer data.
Advantages:
o Simple to implement.
Disadvantages:
o Inefficient as the CPU spends a lot of time checking device status, which could be used for other tasks.
2. Interrupt-Driven I/O
In interrupt-driven I/O, the I/O device interrupts the CPU when it is ready for data transfer. The CPU responds to the interrupt, suspends its
current tasks, and executes an interrupt service routine (ISR) to handle the data transfer.
Advantages:
o More efficient than programmed I/O as the CPU can perform other tasks while waiting for the interrupt.
Disadvantages:
o More complex to implement due to the need for interrupt handling mechanisms.
In DMA, a dedicated controller (DMA controller) handles the data transfer between memory and I/O devices without the direct
involvement of the CPU. The CPU initiates the DMA transfer by providing the necessary information to the DMA controller and then
proceeds with other tasks. The DMA controller manages the transfer and notifies the CPU upon completion.
Advantages:
o Very efficient for large data transfers as it frees up the CPU to perform other tasks.
o Reduces CPU overhead and improves overall system performance.
Disadvantages:
o Adds complexity and cost to the system due to the need for a DMA controller.
4. Memory-Mapped I/O
In memory-mapped I/O, I/O devices are assigned specific memory addresses, and the CPU uses regular memory instructions to read from
or write to these addresses. This method allows the CPU to treat I/O devices as if they were memory locations.
Advantages:
o Simplifies the CPU design by using the same instructions for memory and I/O operations.
Disadvantages:
o Can lead to address space conflicts between memory and I/O devices.
In isolated I/O, a separate address space is used for I/O devices, distinct from the memory address space. Special instructions (like IN and
OUT in x86 assembly) are used to access these I/O addresses.
Advantages:
o Clearly separates memory and I/O address spaces, reducing the risk of conflicts.
Disadvantages:
o Can be less efficient in terms of instruction set utilization compared to memory-mapped I/O.
6. Channel I/O
Channel I/O is used in mainframe computers and involves the use of dedicated I/O processors called channels. These channels manage
data transfers independently of the main CPU, allowing for very efficient I/O operations.
Advantages:
Disadvantages:
Conclusion
Different modes of data transfer are employed in computer systems to balance efficiency, complexity, and cost. Programmed I/O and
interrupt-driven I/O are simpler and suitable for less demanding applications, while DMA and channel I/O offer high efficiency for large
data transfers and complex systems. Memory-mapped and isolated I/O provide different approaches to integrating I/O devices within the
system’s address space, each with its own trade-offs. Understanding these modes is crucial for designing and optimizing computer systems
for various applications
Synchronous and Asynchronous Communication
Serial Communication: Synchronous & Asynchronous Communication
Serial communication refers to the process of sending data one bit at a time, sequentially, over a communication channel or computer bus.
This contrasts with parallel communication, where multiple bits are transmitted simultaneously. Serial communication is widely used for
long-distance communication and situations where the number of communication lines needs to be minimized.
Synchronous Communication
In synchronous communication, data is sent in a continuous stream accompanied by a clock signal. Both the transmitter and receiver share
a common clock signal, which ensures that they are synchronized and can accurately interpret the data being sent.
Key Features:
1. Clock Signal:
o The clock can be sent as a separate signal or embedded in the data stream.
2. Data Transmission:
o Each frame or block may contain control information such as start and stop bits, address bits, and error-checking
bits.
3. Efficiency:
o More efficient than asynchronous communication as there is no need for start and stop bits for each byte.
4. Complexity:
SPI (Serial Peripheral Interface): Commonly used for short-distance communication in embedded systems.
USART (Universal Synchronous/Asynchronous Receiver/Transmitter): Can operate in both synchronous and asynchronous
modes.
Asynchronous Communication
In asynchronous communication, data is sent without a shared clock signal. Instead, data is sent in discrete packets or characters, each of
which is preceded by a start bit and followed by a stop bit. These bits help the receiver to identify the beginning and end of each character.
Key Features:
1. No Clock Signal:
o There is no need for a shared clock signal between the transmitter and receiver.
2. Data Transmission:
3. Efficiency:
o Less efficient than synchronous communication due to the overhead of start and stop bits.
o Suitable for lower data rates and less complex communication requirements.
4. Simplicity:
UART (Universal Asynchronous Receiver/Transmitter): Commonly used for serial communication in computers and embedded
systems.
Synchronization Synchronized by the clock signal Synchronized by start and stop bits
Data Transmission Continuous stream, sent in frames/blocks Discrete packets, each with start/stop bits
Efficiency Higher, no start/stop bit overhead Lower, due to start/stop bit overhead