Chapter 2
Chapter 2
Performance
e) Speculative Execution
• The processor executes instructions before it is certain they are needed, storing results in
temporary locations.
• This prevents idle time and ensures the processor is utilized efficiently.
• If a predicted instruction path is incorrect, the results are discarded without affecting
execution.
5. Performance Balancing Challenges
• While processor speeds have increased significantly, other computer components (e.g.,
memory and I/O devices) have not kept pace.
• The most significant performance bottleneck is the interface between the processor and
main memory.
• If memory access is too slow, the processor must wait for data, resulting in wasted clock
cycles and reduced performance.
6. Solutions to Performance Bottlenecks
• Increasing Memory Access Efficiency
o Wider DRAM chips retrieve more bits at once, reducing memory access delays.
o Faster memory interfaces and improved bus architectures reduce transfer latency.
• Implementing Advanced Caching Techniques
o Caches store frequently accessed data closer to the processor, reducing the need for
slow main memory access.
o Multiple cache levels (L1, L2, L3) improve data retrieval efficiency.
• Enhancing Interconnect Bandwidth
o High-speed buses and hierarchical bus structures improve communication between
memory and the processor.
o Advanced interconnection techniques prevent data transfer bottlenecks.
• Improving I/O Device Management
o Peripheral devices (e.g., graphics cards, SSDs, and network interfaces) require
efficient data transfer mechanisms.
o Caching, buffering, and high-speed interconnects optimize I/O operations.
o Multiple-processor systems help distribute processing workloads and manage I/O-
intensive tasks more effectively.
8. Improvements in Chip Organization & Architecture
To further improve performance, three key strategies are used:
a) Increasing Hardware Speed
• Shrinking logic gates on processor chips reduces signal propagation time, allowing faster
operation.
• Higher clock speeds enable faster execution of individual instructions.
b) Enhancing Cache Size & Speed
• Placing caches directly on the processor chip reduces access time.
• Modern processors dedicate over half of their chip area to cache memory.
• Improved cache efficiency significantly reduces reliance on slower main memory.
c) Optimizing Processor Architecture
• Parallelism is used to enhance instruction execution speed.
• Pipelining, superscalar execution, and out-of-order execution improve processing efficiency.
9. Challenges in Increasing Processor Speed
As clock speed and logic density increase, new challenges arise:
• Power Consumption
o Higher transistor density leads to increased heat dissipation.
o Efficient cooling solutions and power management strategies are necessary to
prevent overheating.
• RC Delay (Resistance-Capacitance Delay)
o The speed at which signals travel on a chip is limited by wire resistance and
capacitance.
o Miniaturization increases resistance and capacitance, reducing signal transmission
speed.
• Memory Latency & Throughput
o Memory speed improvements lag behind processor advancements, creating a
performance bottleneck.
o Efficient memory management techniques, such as prefetching and caching, help
mitigate this issue.
Summary
• Multicore processors improve performance by adding multiple cores on a single chip,
enabling parallel execution.
• MICs extend this approach by massively increasing the number of cores for high-
performance computing.
• GPGPUs leverage GPU parallelism for general-purpose applications beyond graphics, making
them useful for AI, simulations, and data-intensive computations.
This evolution reflects the shift towards parallel processing to achieve higher efficiency, lower
power consumption, and improved computational capabilities in modern processors.
Best Use Cases OS, applications, Scientific computing, AI, deep learning,
databases AI simulations
Amdahl’s Law
Amdahl's Law states that the maximum speedup of a program using multiple processors is
limited by the fraction of the program that must be executed sequentially. It is given by the
formula:
Amdahl’s law can be generalized to evaluate any design or technical improvement in a
computer system. Consider any enhancement to a feature of a system that results in a
speedup. The speedup can be expressed as
Little’s Law
Little’s Law states that in a steady-state system with no leakage, the average number of
items (L) in a system is equal to the arrival rate of items (λ) multiplied by the average time
(W) each item spends in the system. Mathematically, it is represented as:
This law applies to any queuing system where items arrive, wait for service, get processed,
and then leave. It is widely used in computer systems, networking, and performance
analysis.
1. Queuing System – A system where items (e.g., processes, packets, or I/O requests)
arrive, wait, get serviced, and then depart.
2. L (Average Number in System) – The number of items, processes in a queue, or
instructions in a pipeline at any given time.
3. λ (Arrival Rate) – The rate at which items enter the system (e.g., the number of
requests per second).
4. W (Time in System) – The average time an item spends from arrival to departure.
Example: Consider a program with 2 million instructions running on a 400 MHz processor, with the
following instruction mix:
Arithmetic/Logic 1 60%
Branch 4 12%
CPI = 2.24
3. Execution Time Calculation
NOTE- The three common formulas used for calculating a mean are arithmetic, geometric,
and harmonic. Given a set of n real numbers (x1, x2, …, xn), the three means are defined as
follows: