0% found this document useful (0 votes)

15 views15 pages

Chapter 2

Chapter 2 discusses the evolution and optimization of computing performance, highlighting advancements in processor design, multicore architectures, and the use of GPUs for general-purpose computing. Key techniques for enhancing performance include pipelining, branch prediction, and speculative execution, while challenges such as memory bottlenecks and power consumption are also addressed. The chapter concludes with a comparative analysis of multicore CPUs, MICs, and GPGPUs, emphasizing the shift towards parallel processing for improved efficiency.

Uploaded by

infinitychaser.sj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views15 pages

Chapter 2

Uploaded by

infinitychaser.sj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Chapter 2

Performance

DESIGNING FOR PERFORMANCE:

1. Continuous Improvement in Computing Power
• The cost of computer systems continues to decline while performance and capacity increase
significantly each year.
• Technological advancements enable the development of highly complex and powerful
applications that were once unimaginable.
2. Applications Requiring High Computing Power
• Several modern applications demand significant processing power, including:
o Image processing – Editing, filtering, and enhancing digital images.
o 3D rendering – Creating high-quality 3D models for gaming, animation, and
simulations.
o Speech recognition – Converting spoken language into text or executing voice
commands.
o Videoconferencing – Enabling real-time video and audio communication over the
internet.
o Multimedia authoring – Creating and editing interactive digital media, including
animations, videos, and presentations.
o Voice and video annotation – Adding voice or video comments to digital files for
enhanced communication and collaboration.
o Simulation modeling – Running complex simulations for research, engineering, and
artificial intelligence applications.
3. Key Factors Behind Performance Design
• Despite advances in computing, modern processors still rely on fundamental building blocks
similar to those in early computers.
• The key to maximizing performance lies in optimizing how resources are utilized and how
efficiently tasks are executed.
• Moore’s Law states that the number of transistors on a chip doubles approximately every
two years, allowing chipmakers to release new, more powerful processors frequently.
• Memory capacity has also expanded significantly, with DRAM capacities quadrupling every
three years.
• Processor performance has improved 4-5 times every three years due to enhanced
microarchitecture and shrinking component sizes.
4. Techniques for Performance Enhancement
To fully utilize processor power, designers implement several optimization techniques:
a) Pipelining
• The execution of an instruction is divided into multiple stages (e.g., fetch, decode, execute,
write-back).
• Instead of waiting for one instruction to complete before starting another, multiple
instructions are processed simultaneously in different stages.
• This approach is similar to an assembly line, where each stage of processing is handled
independently but concurrently.
b) Branch Prediction
• The processor analyzes the instruction flow and predicts the most likely execution path.
• Prefetching the predicted instructions minimizes stalls and keeps the processor busy.
• Advanced processors predict multiple branches ahead, further improving efficiency.
c) Superscalar Execution
• Modern processors can issue multiple instructions per clock cycle, increasing instruction
throughput.
• Multiple execution units allow simultaneous processing of independent instructions,
improving parallel execution.
d) Data Flow Analysis
• The processor determines which instructions depend on each other’s outputs and optimizes
their execution order.
• Instead of following the program order strictly, instructions are executed as soon as their
dependencies are resolved.

e) Speculative Execution
• The processor executes instructions before it is certain they are needed, storing results in
temporary locations.
• This prevents idle time and ensures the processor is utilized efficiently.
• If a predicted instruction path is incorrect, the results are discarded without affecting
execution.
5. Performance Balancing Challenges
• While processor speeds have increased significantly, other computer components (e.g.,
memory and I/O devices) have not kept pace.
• The most significant performance bottleneck is the interface between the processor and
main memory.
• If memory access is too slow, the processor must wait for data, resulting in wasted clock
cycles and reduced performance.
6. Solutions to Performance Bottlenecks
• Increasing Memory Access Efficiency
o Wider DRAM chips retrieve more bits at once, reducing memory access delays.
o Faster memory interfaces and improved bus architectures reduce transfer latency.
• Implementing Advanced Caching Techniques
o Caches store frequently accessed data closer to the processor, reducing the need for
slow main memory access.
o Multiple cache levels (L1, L2, L3) improve data retrieval efficiency.
• Enhancing Interconnect Bandwidth
o High-speed buses and hierarchical bus structures improve communication between
memory and the processor.
o Advanced interconnection techniques prevent data transfer bottlenecks.
• Improving I/O Device Management
o Peripheral devices (e.g., graphics cards, SSDs, and network interfaces) require
efficient data transfer mechanisms.
o Caching, buffering, and high-speed interconnects optimize I/O operations.
o Multiple-processor systems help distribute processing workloads and manage I/O-
intensive tasks more effectively.
8. Improvements in Chip Organization & Architecture
To further improve performance, three key strategies are used:
a) Increasing Hardware Speed
• Shrinking logic gates on processor chips reduces signal propagation time, allowing faster
operation.
• Higher clock speeds enable faster execution of individual instructions.
b) Enhancing Cache Size & Speed
• Placing caches directly on the processor chip reduces access time.
• Modern processors dedicate over half of their chip area to cache memory.
• Improved cache efficiency significantly reduces reliance on slower main memory.
c) Optimizing Processor Architecture
• Parallelism is used to enhance instruction execution speed.
• Pipelining, superscalar execution, and out-of-order execution improve processing efficiency.
9. Challenges in Increasing Processor Speed
As clock speed and logic density increase, new challenges arise:
• Power Consumption
o Higher transistor density leads to increased heat dissipation.
o Efficient cooling solutions and power management strategies are necessary to
prevent overheating.
• RC Delay (Resistance-Capacitance Delay)
o The speed at which signals travel on a chip is limited by wire resistance and
capacitance.
o Miniaturization increases resistance and capacitance, reducing signal transmission
speed.
• Memory Latency & Throughput
o Memory speed improvements lag behind processor advancements, creating a
performance bottleneck.
o Efficient memory management techniques, such as prefetching and caching, help
mitigate this issue.

MULTICORE, MICS, AND GPGPUS

1. Multicore Processors
• Concept: Instead of using a single complex processor, manufacturers place multiple
processing units (cores) on the same chip.
• Advantages:
o Increases performance without increasing the clock rate.
o Consumes less power compared to a single complex processor.
o Allows efficient parallel processing if the software supports it.
• Evolution:
o Began with dual-core processors, followed by 4-core, 8-core, 16-core, and beyond.
o Led to multi-level cache systems (L1, L2, and L3) to enhance performance.

2. Many Integrated Cores (MICs)

• Concept: When the number of cores increases significantly (e.g., 50+ cores per chip), the
system is referred to as MIC architecture.
• Purpose:
o Designed to handle large-scale parallel computations.
o Aims to maximize performance for high-performance computing (HPC) applications.
• Challenge: Developing software that can effectively utilize such a large number of cores is
complex.

3. General-Purpose Computing on GPUs (GPGPUs)

• Concept: Initially, GPUs (Graphics Processing Units) were specialized for rendering 2D/3D
graphics and video processing. However, their parallel processing capabilities have made
them useful for general-purpose computing. This approach is called GPGPU computing.
• Advantages:
o Efficient for parallel operations on large datasets.
o Used in applications requiring high-speed computation, such as AI, deep learning,
and simulations.
• Blurring CPU-GPU Line:
o GPUs can now handle tasks traditionally performed by CPUs.
o Some modern processors integrate both CPU and GPU components on the same
chip.

Summary
• Multicore processors improve performance by adding multiple cores on a single chip,
enabling parallel execution.
• MICs extend this approach by massively increasing the number of cores for high-
performance computing.
• GPGPUs leverage GPU parallelism for general-purpose applications beyond graphics, making
them useful for AI, simulations, and data-intensive computations.
This evolution reflects the shift towards parallel processing to achieve higher efficiency, lower
power consumption, and improved computational capabilities in modern processors.

Comparative Analysis: Multicore CPU vs. MIC vs. GPGPU

Feature Multicore CPU MIC (Many GPGPU (General-

Integrated Cores) Purpose GPU)

Cores 2-64 50-100+ 1000s

Task Suitability General-purpose, Highly parallel Data-parallel

mixed workloads workloads workloads
Memory Access Shared cache & RAM High-bandwidth High-bandwidth,
memory optimized for large
datasets

Programming OpenMP, pthreads OpenMP, MPI, Intel CUDA, OpenCL

Models TBB

Best Use Cases OS, applications, Scientific computing, AI, deep learning,
databases AI simulations

Amdahl’s Law
Amdahl's Law states that the maximum speedup of a program using multiple processors is
limited by the fraction of the program that must be executed sequentially. It is given by the
formula:
Amdahl’s law can be generalized to evaluate any design or technical improvement in a
computer system. Consider any enhancement to a feature of a system that results in a
speedup. The speedup can be expressed as
Little’s Law
Little’s Law states that in a steady-state system with no leakage, the average number of
items (L) in a system is equal to the arrival rate of items (λ) multiplied by the average time
(W) each item spends in the system. Mathematically, it is represented as:

This law applies to any queuing system where items arrive, wait for service, get processed,
and then leave. It is widely used in computer systems, networking, and performance
analysis.
1. Queuing System – A system where items (e.g., processes, packets, or I/O requests)
arrive, wait, get serviced, and then depart.
2. L (Average Number in System) – The number of items, processes in a queue, or
instructions in a pipeline at any given time.
3. λ (Arrival Rate) – The rate at which items enter the system (e.g., the number of
requests per second).
4. W (Time in System) – The average time an item spends from arrival to departure.

Basic Measures of Computer Performance

1. Clock Speed (Frequency)
• Measured in Hertz (Hz), typically GHz (gigahertz) for modern processors.
• Represents the number of cycles a CPU can execute per second.
• Higher clock speed generally indicates faster performance but is not the sole determinant.
2. Instructions Per Cycle (IPC)
• Defines how many instructions a CPU can execute per clock cycle.
• A higher IPC means better efficiency at the same clock speed.
3. CPU Performance (CPI and MIPS)
• Cycles Per Instruction (CPI): Average number of clock cycles per instruction.
• Million Instructions Per Second (MIPS): Measures the execution speed of instructions.
• Lower CPI and higher MIPS indicate better performance.
4. Floating Point Operations Per Second (FLOPS)
• Measures performance in scientific and engineering applications.
• Used in high-performance computing (HPC) and AI workloads.
5. Throughput
• The number of tasks or processes a system can complete per unit of time.
• Important for multi-core processors and parallel computing.
6. Latency (Response Time)
• The time taken to complete a single task.
• Lower latency indicates faster system responsiveness.
7. Memory Performance (Bandwidth and Latency)
• Memory Bandwidth: The rate at which data is transferred between memory and CPU,
measured in GB/s.
• Memory Latency: The delay before data is available after a request.
8. Cache Performance
• Cache hit rate: The percentage of memory accesses served from the cache.
• Cache miss penalty: Time delay when data is not found in the cache.
9. Disk Performance (IOPS and Throughput)
• Input/Output Operations Per Second (IOPS): The number of read/write operations per
second.
• Disk Throughput: The rate at which data is read/written from storage.
10. Power Efficiency
• Measured in performance per watt.
• Important for mobile and energy-efficient computing.
11. Benchmarking Scores
• Standardized tests (SPEC, Geekbench, Cinebench) used to compare performance across
systems.

Instruction Execution Rate: Understanding Processor Performance

1. Clock Speed and Cycle Time
A processor operates based on a clock signal with a fixed frequency f (measured in Hertz (Hz)). The
cycle time t (measured in seconds per cycle) is the time required for one clock cycle:
t = 1/f
For example, a processor with a 400 MHz clock speed has a cycle time:
t = 1 / (400 × 10^6) ≈ 2.5 ns (nanoseconds)
2. Instruction Count (Ic) and CPI (Cycles Per Instruction)
• Instruction Count (Ic): The total number of instructions a program executes before completion.
• Cycles Per Instruction (CPI): The average number of clock cycles required to execute one
instruction.
Different instruction types (e.g., arithmetic, load/store, branch) take varying cycles to execute. The
overall CPI for a program is calculated as:

Example: Consider a program with 2 million instructions running on a 400 MHz processor, with the
following instruction mix:

Instruction Type CPI Instruction Mix (%)

Arithmetic/Logic 1 60%

Load/Store (Cache Hit) 2 18%

Branch 4 12%

Memory Reference (Cache 8 10%

Miss)

The average CPI is calculated as:

CPI = (1×0.6) + (2×0.18) + (4×0.12) + (8×0.10) = 2.24

CPI = 2.24
3. Execution Time Calculation

For the given program:

T = (2 × 10^6) × (2.24) × (2.5 × 10^-9) ≈ 11.2 milliseconds
4. MIPS (Million Instructions Per Second)
MIPS measures the execution rate of instructions per second:

For our example:

MIPS = (400 × 10^6) / (2.24 × 10^6) ≈ 178 MIPS
This means the processor executes 178 million instructions per second.
5. Factors Affecting Performance
The five performance factors affecting execution time are:
1. Instruction Count (Ic) → Affected by Instruction Set Architecture and Compiler Optimization.
2. CPI (p) → Depends on how the processor executes instructions.
3. Memory References (m) → More references increase execution time.
4. Memory Access Speed (k) → Ratio of memory cycle time to processor cycle time.
5. Cycle Time (t) → Determined by Processor Clock Speed.
6. Floating-Point Performance (MFLOPS)
Where
Calculating the Mean
In Computer Organization and Architecture (COA), calculating the mean (average) is essential for
performance analysis. Different types of means—Arithmetic Mean (AM), Geometric Mean (GM),
and Harmonic Mean (HM)—are used based on the nature of performance metrics.
Why Calculate the Mean in COA?
1. Comparing Performance Across Benchmarks
o Different programs have different execution times on various processors. Using the
right mean helps in fair comparisons.
2. MIPS and Execution Time Analysis
o When comparing multiple processors, AM, GM, and HM are used to average MIPS
(Millions of Instructions Per Second) or execution times.
3. Selecting the Right Mean:
o Arithmetic Mean (AM): Used when adding independent values, such as execution
times of different programs.
o Geometric Mean (GM): Used when normalizing performance ratios, e.g., when
comparing relative speedups.
o Harmonic Mean (HM): Used when averaging rates, like MIPS or IPC (Instructions Per
Cycle), since it properly handles reciprocal values.
4. Performance Ranking of Processors
o GM is commonly used in benchmarking suites (e.g., SPEC benchmarks) to provide a
fair ranking of CPU performance.

NOTE- The three common formulas used for calculating a mean are arithmetic, geometric,
and harmonic. Given a set of n real numbers (x1, x2, …, xn), the three means are defined as
follows:

EPSON 4900 Field Repair Guide
100% (3)
EPSON 4900 Field Repair Guide
532 pages
LS English 9 Mid Point Test
67% (3)
LS English 9 Mid Point Test
10 pages
DDR5 Speed Boost
From Everand
DDR5 Speed Boost
Mei Gates
No ratings yet
Veld Products-Lesson Notes
No ratings yet
Veld Products-Lesson Notes
5 pages
GLOBALG.A.P. V6 Document Templates Index
No ratings yet
GLOBALG.A.P. V6 Document Templates Index
6 pages
Form 1 Lesson Notes Term 1
No ratings yet
Form 1 Lesson Notes Term 1
25 pages
William Stallings Computer Organization and Architecture 10 Edition
No ratings yet
William Stallings Computer Organization and Architecture 10 Edition
33 pages
Charging Information For Lead Acid Batteries - Battery University
No ratings yet
Charging Information For Lead Acid Batteries - Battery University
92 pages
CH02-COA10e Spring 2025
No ratings yet
CH02-COA10e Spring 2025
24 pages
CH02-COA10e Spring 2025
No ratings yet
CH02-COA10e Spring 2025
24 pages
CH02 COA10e.performance Issues
No ratings yet
CH02 COA10e.performance Issues
19 pages
Chapter 2
No ratings yet
Chapter 2
34 pages
CH02 COA10e
No ratings yet
CH02 COA10e
67 pages
COA Midterm
No ratings yet
COA Midterm
13 pages
2. ünite
No ratings yet
2. ünite
33 pages
HPC -1
No ratings yet
HPC -1
40 pages
التحليل
No ratings yet
التحليل
32 pages
Chapter 1 Solution
No ratings yet
Chapter 1 Solution
35 pages
LEC 2
No ratings yet
LEC 2
31 pages
Study Notes COAL Mids
No ratings yet
Study Notes COAL Mids
14 pages
Performance Issues
No ratings yet
Performance Issues
19 pages
Modle 01 - HPC Introduction To Pipeline
No ratings yet
Modle 01 - HPC Introduction To Pipeline
124 pages
Aula Ch1
No ratings yet
Aula Ch1
40 pages
CC Unit 1
No ratings yet
CC Unit 1
24 pages
L5-L6-Performance Issues
No ratings yet
L5-L6-Performance Issues
47 pages
Chapter 11
No ratings yet
Chapter 11
33 pages
HPC TT1
No ratings yet
HPC TT1
29 pages
chapter 2
No ratings yet
chapter 2
14 pages
2.Week
No ratings yet
2.Week
35 pages
Defining Computer Architecture
No ratings yet
Defining Computer Architecture
6 pages
CCS 1202 Lecture 2_Computer Evolution and Performance
No ratings yet
CCS 1202 Lecture 2_Computer Evolution and Performance
32 pages
CA01_2024S2
No ratings yet
CA01_2024S2
30 pages
CSC 247 Class Lecture (1)
No ratings yet
CSC 247 Class Lecture (1)
33 pages
02. Performance
No ratings yet
02. Performance
57 pages
CAQA6e ch1
No ratings yet
CAQA6e ch1
31 pages
Fundamentals of Quantitative Design and Analysis: A Quantitative Approach, Fifth Edition
No ratings yet
Fundamentals of Quantitative Design and Analysis: A Quantitative Approach, Fifth Edition
24 pages
LEC 2
No ratings yet
LEC 2
31 pages
CH02-COA11e
No ratings yet
CH02-COA11e
34 pages
FIT9134_week11
No ratings yet
FIT9134_week11
21 pages
af933808-8d23-4c09-8d97-d44d4c730d12
No ratings yet
af933808-8d23-4c09-8d97-d44d4c730d12
49 pages
SP23 CS 212 Week 2
No ratings yet
SP23 CS 212 Week 2
23 pages
ACSA1-Introduction
No ratings yet
ACSA1-Introduction
33 pages
Lecture1 ch1 Fundamentals of Quantitative Design and Analysis
No ratings yet
Lecture1 ch1 Fundamentals of Quantitative Design and Analysis
28 pages
Chapter Two
No ratings yet
Chapter Two
33 pages
CI-0120 Arquitectura de Computadoras Ejemplos FundamentosDiseño
No ratings yet
CI-0120 Arquitectura de Computadoras Ejemplos FundamentosDiseño
52 pages
Son-CA - Lec1 - 1 - Computer Abstraction and Technology
No ratings yet
Son-CA - Lec1 - 1 - Computer Abstraction and Technology
31 pages
Syl 1
No ratings yet
Syl 1
4 pages
Archtitecure 1
No ratings yet
Archtitecure 1
64 pages
Lecture1 Introduction to Parallel Computing_2025
No ratings yet
Lecture1 Introduction to Parallel Computing_2025
38 pages
High Performance Computing: Course Introduction
No ratings yet
High Performance Computing: Course Introduction
32 pages
Processors
No ratings yet
Processors
4 pages
Chapter_1_Introduction
No ratings yet
Chapter_1_Introduction
49 pages
CMP2008 L1
No ratings yet
CMP2008 L1
47 pages
1.BookIntro
No ratings yet
1.BookIntro
23 pages
PPT#01
No ratings yet
PPT#01
30 pages
Computer Architecture
No ratings yet
Computer Architecture
56 pages
Advanced Computer Architecture ECE 6373: Pauline Markenscoff N320 Engineering Building 1 E-Mail: Markenscoff@uh - Edu
No ratings yet
Advanced Computer Architecture ECE 6373: Pauline Markenscoff N320 Engineering Building 1 E-Mail: Markenscoff@uh - Edu
151 pages
Performance Numericals
No ratings yet
Performance Numericals
24 pages
LECTURE 36
No ratings yet
LECTURE 36
15 pages
Ch.2 Performance Issues: Computer Organization and Architecture
No ratings yet
Ch.2 Performance Issues: Computer Organization and Architecture
25 pages
Trends in Computer Architecture
No ratings yet
Trends in Computer Architecture
30 pages
Fundamentals of Modern Computer Architecture: From Logic Gates to Parallel Processing
From Everand
Fundamentals of Modern Computer Architecture: From Logic Gates to Parallel Processing
Sam Steed
No ratings yet
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
From Everand
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
Steve Brown
No ratings yet
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
HPE Compute Certification Guide: 444 Practice Questions for the Advanced HPE1-H02 Exam
From Everand
HPE Compute Certification Guide: 444 Practice Questions for the Advanced HPE1-H02 Exam
Steve Brown
No ratings yet
Mastering C: Advanced Techniques and Tricks
From Everand
Mastering C: Advanced Techniques and Tricks
Ted Norice
No ratings yet
T Screamer Datasheet
No ratings yet
T Screamer Datasheet
6 pages
Investigation of Fracture Toughness of SAE 1020 Welded Using Rutile Covered Electrodes, and Determination of Mismatch Factor
No ratings yet
Investigation of Fracture Toughness of SAE 1020 Welded Using Rutile Covered Electrodes, and Determination of Mismatch Factor
8 pages
AoT - Toward Airport Sustainability
No ratings yet
AoT - Toward Airport Sustainability
34 pages
SOCIALISM
No ratings yet
SOCIALISM
1 page
IXIBS1247247675 Ticket
No ratings yet
IXIBS1247247675 Ticket
2 pages
07EEE - 2023 - Tariff Structure
No ratings yet
07EEE - 2023 - Tariff Structure
31 pages
FO-63P
No ratings yet
FO-63P
4 pages
Reconditioning of Crusher Rolls
No ratings yet
Reconditioning of Crusher Rolls
3 pages
Lecture 7ROAD FURNITURE
No ratings yet
Lecture 7ROAD FURNITURE
16 pages
Test 7
No ratings yet
Test 7
6 pages
Roberts
No ratings yet
Roberts
7 pages
Ds Burny1400plus
No ratings yet
Ds Burny1400plus
2 pages
Phraser Connector: The 1st Annual Santa Parade
No ratings yet
Phraser Connector: The 1st Annual Santa Parade
12 pages
App Note XSupreme8000 QC of Cement
No ratings yet
App Note XSupreme8000 QC of Cement
4 pages
Catalog75 Hydraulic Nozzles Metric Units WhirlJet BD
No ratings yet
Catalog75 Hydraulic Nozzles Metric Units WhirlJet BD
5 pages
Burgmann Ts2000e
No ratings yet
Burgmann Ts2000e
4 pages
Duty Drawback
100% (1)
Duty Drawback
181 pages
2016 The Three-Dimensional Bin Packing Problem
No ratings yet
2016 The Three-Dimensional Bin Packing Problem
13 pages
Powder Coating VS Painting - Which One Is Better - Performance Coating
No ratings yet
Powder Coating VS Painting - Which One Is Better - Performance Coating
10 pages
DDS New
No ratings yet
DDS New
47 pages
CM Forrester Total Economic Impact Ansible Analyst Paper f13019 201806 en
No ratings yet
CM Forrester Total Economic Impact Ansible Analyst Paper f13019 201806 en
20 pages
Curve PDF
No ratings yet
Curve PDF
1 page
Y3 Maths Problem Solving, Reasoning and Investigating
No ratings yet
Y3 Maths Problem Solving, Reasoning and Investigating
50 pages
Lab Manual
No ratings yet
Lab Manual
31 pages

Chapter 2

Uploaded by

Chapter 2

Uploaded by

Chapter 2

DESIGNING FOR PERFORMANCE:

MULTICORE, MICS, AND GPGPUS

2. Many Integrated Cores (MICs)

3. General-Purpose Computing on GPUs (GPGPUs)

Comparative Analysis: Multicore CPU vs. MIC vs. GPGPU

Feature Multicore CPU MIC (Many GPGPU (General-

Cores 2-64 50-100+ 1000s

Task Suitability General-purpose, Highly parallel Data-parallel

Programming OpenMP, pthreads OpenMP, MPI, Intel CUDA, OpenCL

Basic Measures of Computer Performance

Instruction Execution Rate: Understanding Processor Performance

Instruction Type CPI Instruction Mix (%)

Load/Store (Cache Hit) 2 18%

Memory Reference (Cache 8 10%

The average CPI is calculated as:

CPI = (1×0.6) + (2×0.18) + (4×0.12) + (8×0.10) = 2.24

For the given program:

For our example:

You might also like