0% found this document useful (0 votes)
14 views

Unit 3 - Computer Performance

The document discusses computer organisation and architecture. It covers topics like computer structure, evolution and performance, function and interconnection, cache and internal memory, input/output and operating system support. It also discusses number systems, arithmetic logic unit, control unit and central processing unit.
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Unit 3 - Computer Performance

The document discusses computer organisation and architecture. It covers topics like computer structure, evolution and performance, function and interconnection, cache and internal memory, input/output and operating system support. It also discusses number systems, arithmetic logic unit, control unit and central processing unit.
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 29

CSC 2111

Computer Organisation and Architecture


What the Course Will Cover
1. Computer Structure and Function
2. Computer Evolution and

3. Computer Performance
4. Computer Function and Interconnection
5. Cache Memory
6. Internal Memory
7. External Memory
8. Input/output
9. Operating System Support
10. Number Systems
11. The Arithmetic and Logic Unit
12. The Control Unit
13. The Central Processing Unit
3 Unit 2: Intended Learning Outcomes
 By the end of this unit, you should be able to:

 Understand the key performance issues that relate to


computer design.
 Distinguish among multicore, MIC, and GPGPU
organisations.
 Perform basic Measures of Computer Performance
Designing for Performance
Unit Introduction
 Chipmakers have been busy learning how to fabricate chips of greater and
greater density.
 But the raw speed of the microprocessor will not achieve its potential unless it
is fed a constant stream of work to perform in the form of computer
instructions.
 As such, processor designers must come up with ever more elaborate
techniques for feeding the processor with instructions.
 Among the techniques built into contemporary processors are the following:
 Pipelining
 Branch prediction
 Data flow analysis
 Speculative execution
Pipelining
 A processor can simultaneously work on multiple
instructions.
 The processor overlaps operations by moving data or
instructions into a conceptual pipe with all stages of the
pipe processing simultaneously.
Real Pipeline Example
 Consider a water bottle packaging plant.
 Let there be 3 stages that a bottle should pass through, Inserting the bottle(I),
Filling water in the bottle(F), and Sealing the bottle(S).
 Let us consider these stages as stage 1, stage 2 and stage 3 respectively.
 Let each stage take 1 minute to complete its operation.
 Without (left) Vs. With (right) Pipelining
Execution in a Pipelined Processor
 Consider a processor having 4 stages and let there be 2 instructions
to be executed.
Execution in a Pipelined Processor
 Consider a processor having 4 stages and let there be 2 instructions
to be executed.
Branch Prediction
A branch is an instruction in a computer program that can
cause a computer to begin executing a
different instruction sequence and thus deviate from its default
behavior of executing instructions in order.
The processor looks ahead in the instruction code fetched from
memory and predicts which branches, or groups of instructions,
are likely to be processed next.
If the processor guesses right most of the time, it can pre-fetch
the correct instructions and buffer them so that the processor is
kept busy.
The more sophisticated examples of this strategy predict not
just the next branch but multiple branches ahead.
Thus, branch prediction increases the amount of work available
for the processor to execute.
Data flow analysis
 The processor analyzes which instructions are dependent
on each other’s results, or data, to create an optimized
schedule of instructions.
Speculative execution
 Using branch prediction and data flow analysis, some
processors speculatively execute instructions ahead of their
actual appearance in the program execution, holding the
results in temporary locations.
 This enables the processor to keep its execution engines as
busy as possible by executing instructions that are likely to
be needed.
Performance Balance
 Need for performance balance?
 While processor power has raced ahead at breakneck
speed, other critical components of the computer have
not kept up.
 Resulting in adjusting of the organisation and
architecture to compensate for the mismatch among the
capabilities of the various components: especially at the
interface between the processor and main memory or
I/O devices.
Processor – Memory Interface
 Increase the number of bits that are retrieved at one time from
DRAMs “wider” rather than “deeper” and by using wide bus data
paths.
 Reduce the frequency of memory access by incorporating
increasingly complex and efficient cache structures between the
processor and main memory.
 Increase the interconnect bandwidth between processors and memory
by using higher-speed buses and a hierarchy of buses to buffer and
structure data flow.
Processor – I/O Interface
 As computers become faster and more capable, more sophisticated
applications are developed that support the use of peripherals with
intensive I/O demands
Processor – I/O Interface
 Strategies of getting I/O data moved between processor and
peripherals.
 Caching and buffering schemes
 Use of higher-speed interconnection buses and
interconnection structures.
 Use of multiple-processor configurations to satisfy
I/O demands.
Multicore, MICS, and GPGPUs
 Multicore
 An approach to improving performance by placing multiple processors on
the same chip, with a large shared cache.
 Many Integrated Core (MIC).
 The number of cores per chip are more than 50 cores per chip.
 GPGPUs
 A chip with multiple general-purpose processors plus graphics processing units
(GPUs) and specialised cores for video processing and other tasks.
 When a broad range of applications are supported by such a processor, the term
general-purpose computing on GPUs (GPGPU) is used.
Basic Measures of Computer
Performance
Clock Speed
 Operations performed by a processor, such as fetching an
instruction, decoding the instruction, performing an
arithmetic operation, and so on, are governed by a system
clock.
 Typically, all operations begin with the pulse of the clock.
 Thus, at the most fundamental level, the speed of a processor
is dictated by the pulse frequency produced by the clock,
measured in cycles per second, or Hertz (Hz).
 usually measured in MHz (megahertz, or millions of pulses
per second) or GHz (gigahertz, or billions of pulses per
second)
System Information

 3.2 * 109 Cycles / sec


System Clock

 Clock signals are generated by a quartz crystal, which generates a constant signal
wave while power is applied.
 This wave is converted into a digital voltage pulse stream that is provided in a
constant flow to the processor circuitry.
 For example, a 1-GHz processor receives 1 billion pulses per second.
 The rate of pulses is known as the clock rate, or clock speed.
 One increment, or pulse, of the clock is referred to as a clock cycle, or a clock tick.
 The time between pulses is the cycle time.
Instruction Cycle, Machine Cycle and T-State

 The execution of an instruction involves a number of discrete


steps:
 fetching the instruction from memory,
 decoding the various portions of the instruction,
 loading and storing data, and
 performing arithmetic and logical operations.
 Most instructions on most processors require multiple clock
cycles to complete.
Instruction Cycle, Machine Cycle and T-State
 Instruction Cycle
 Is the fetching, decoding and execution of a single instruction.
 Typically consists of one to five read or write operations between processor and
memory or input/output devices.
 Machine Cycle
 Is a particular time period required by each memory or I/O operation
 In other words, to move a byte of data in or out of the microprocessor, a
machine cycle is required.
 T-State
 Each machine cycle consists of 3 to 6 clock periods/cycles, referred to as T-
states.
 Typically, one instruction cycle consists of one to five machine cycles and one
machine cycle consists of three to six T-states i.e. three to six clock periods.
Instruction Cycle, Machine Cycle and T-State
INSTRUCTION EXECUTION RATE

 Parameters
 Instruction Count (Ic), for a program is the number of
machine instructions executed for that program until it runs
to completion or for some defined time interval.
 Average Cycles Per Instruction (CPI) for a program is the
number of clock cycles required for an instruction.
 On any given processor, the number of clock cycles required
varies for different types of instructions, such as load, store,
branch, and so on.
The Performance Equation

 Let CPIi be the number of cycles required for instruction


type i and Ii be the number of executed instructions of type
Ii for a given program.
 Then we can calculate an overall CPI as follows:
The Performance Equation

 A processor is driven by a clock with a constant frequency


f or,
 Equivalently, a constant cycle time T, where T = 1/ f.
 The processor time T needed to execute a given program
can be expressed as:
Instruction Execution Rate
 A common measure of performance for a processor is the rate at
which instructions are executed
 How many instructions per given time?

 Expressed as millions of instructions per second (MIPS)- the MIPS


rate.
 Expressed in terms of the clock rate and CPI as follows:
Instruction Execution Rate

You might also like