0% found this document useful (0 votes)
93 views

Lecture 1

Here are the steps to solve this problem: 1) Original CPI = 2.2 2) Load time reduced from 5 cycles to 2 cycles 3) New load time fraction is 10% * (2/5) = 0.4 4) New CPI = (50% * 1) + (20% * 1) + (10% * 0.4) + (20% * 0.4) = 1.52 5) Speedup from reducing load time = Original CPI / New CPI = 2.2 / 1.52 = 1.45 6) Branch time reduced from 2 cycles to 1 cycle 7) New branch time fraction is 20% * (1
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views

Lecture 1

Here are the steps to solve this problem: 1) Original CPI = 2.2 2) Load time reduced from 5 cycles to 2 cycles 3) New load time fraction is 10% * (2/5) = 0.4 4) New CPI = (50% * 1) + (20% * 1) + (10% * 0.4) + (20% * 0.4) = 1.52 5) Speedup from reducing load time = Original CPI / New CPI = 2.2 / 1.52 = 1.45 6) Branch time reduced from 2 cycles to 1 cycle 7) New branch time fraction is 20% * (1
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 20

ECE 5367 4436

Introduction to Computer Architecture and Design


Ji Chen Section : T TH 1:00PM 2:30PM Prerequisites: ECE 4436

ECE 5367 4436

Instructor:

Ji Chen Email: [email protected] Tel: (713)-743-4423 Office: W328 Office Hour: T TH 2:30-3:30 or by appointment None

TA:

ECE 5367 4436

ECE 5367 4436 Course Contents 1. 2. 3. 4. 5. 6. 7. 8. 9. Introduction, basic computer organization Instruction formats, instruction sets and their design ALU design: Adders, subtracters, logic operations Multiplication, division, floating point arithmetic Datapath design Control design: Hardwired control, microprogrammed control Pipelining Memory systems I/O

ECE 5367 4436

Web: http://www.egr.uh.edu/courses/ece/ECE5367/ Grading


HW/Quiz/Lab
Project

10 %
15 %

Exam 1
Exam 2 Exam 3

25 %
25 % 25 %

Academic Honesty Statement

ECE 5367 4436


Computer Organization and Design: The Hardware/Software Interface by David A. Patterson, John L. Hennessy, 3rd edition

Required

NOT REQUIRED

ECE 5367 4436


Home works/quiz: Labs:
There will be several graded homework/lab assignments. Home works turned in late will be accepted only under extraordinary circumstances.

Laboratory assignments may be worked in teams of two (2); however, there should be no collaboration between teams . . Lab assignments turned in late will be penalized 25 points for each calendar day. Both students in a team will receive the same grade for the project. Teams of four (4): describe computer architecture of a modern technology

Projects: Exams:

two mid-term exams, and one final exam. A missed exam will result in a grade of zero Let me know immediately if you have any situation Final Exam - TBD Your final grade will be computed as follows:

Grading:

HW/Quiz/Lab

10 %

Project
Exam 1 Exam 2 Exam 3

15 %
25 % 25 % 25 %

ECE 5367 4436


Since 1946 all computers have had 5 components

Processor Input

Control
Memory Datapath

Output

ECE 5367 4436


TI SuperSPARCtm TMS390Z50 in Sun SPARCstation20
MBus Module

SuperSPARC Floating-point Unit Integer Unit

L2 $

CC MBus DRAM Controller


STDIO
serial kbd mouse audio RTC Floppy

Inst Cache

Ref MMU

Data Cache Store Buffer

L64852 MBus control


M-S Adapter

SBus
SBus
DMA

SCSI Ethernet

Bus Interface
Message Bus (Mbus)

SBus Cards

ECE 5367 4436 Computer Architecture


Application Operating System Compiler Firmware Instruction Set Architecture

Instr. Set Proc.

I/O system

Datapath & Control Digital Design Circuit Design


Layout

Coordination of many levels of abstraction Under a rapidly changing set of forces Design, Measurement, and Evaluation

ECE 5367 4436

Forces on Computer Architecture

Technology
Applications
Computer Architecture

Programming Languages

Cleverness

Operating Systems

History

ECE 5367 4436 Mixed-Signal

ECE 5367 4436 Where are We Going?? Single/multicycle Datapaths


32=>34 signEx

Input Multiplicand
32

Input Multiplier

Multiplicand Register
32=>34 signEx <<1
34
1 0

LoadMp

32
34

34x2 MUX

Multi x2/x1

34

34

Arithmetic
Control Logic
ENC[2] ENC[1] ENC[0]

34-bit ALU 34

Sub/Add

32

32

ShiftAll

HI register (16x2 bits)


ClearHI
LoadHI

LO register (16x2 bits)


LoadLO

2 LO[1:0]

32

32

Result[HI]

Result[LO]

"LO [0]"

LO[1]

Booth Encoder

IFetch Dcd

Exec Mem WB Exec Mem WB Exec Mem WB Exec Mem WB

IFetch Dcd

ECE 5367 Spring 08

Performance

IFetchDcd

IFetch Dcd

Pipelining I/O

Memory Systems

198 198 0 1 198 198 2 198 3 4 198 5 198 6 198 198 7 8 198 9 199 0 199 1 199 2 199 199 3 4 199 5 199 6 199 199 7 8 199 9 200 0 Time

Extra 2 bits

Prev

1000 Moores Law 100

Proc CPU 60%/yr. (2X/1.5yr)

Processor-Memory Performance Gap: (grows 50% / year)

10 DRAM 9%/yr. DRAM (2X/10 yrs)

ECE 5367 4436

Purchasing perspective Given a collection of machines, which has the Best performance ? Least cost ? Best performance / cost ? Design perspective Faced with design options, which has the Best performance improvement ? Least cost ? Best performance / cost ? Both require basis for comparison metric for evaluation Our goal: understand cost & performance implications of architectural choices

Two Notions of Performance


Plane Boeing 747 DC to Paris 6.5 hours Speed 610 mph

ECE 5367 4436


Throughput (pmph) 286,700

Passengers 470

Concorde

3 hours

1350 mph

132

178,200

Which has higher performance? Time to do the task (Execution Time) execution time, response time, latency Tasks per day, hour, week, sec, ns. .. (Performance) throughput, bandwidth Response time and throughput often are in opposition

ECE 5367 4436 Definitions


Performance is in units of things-per-second bigger is better If we are primarily concerned with response time performance(x) = 1 execution_time(x) " X is n times faster than Y" means Performance(X) ---------------------Performance(Y)

Example

ECE 5367 4436

Time of Concorde vs. Boeing 747? Concord is 1350 mph / 610 mph = 2.2 times faster = 6.5 hours / 3 hours Throughput of Concorde vs. Boeing 747 ?
Concord is 178,200 pmph / 286,700 pmph Boeing is 286,700 pmph / 178,200 pmph = 0.62 times faster = 1.60 times faster

Boeing is 1.6 times (60%) faster in terms of throughput Concord is 2.2 times (120%) faster in terms of flying time We will focus primarily on execution time for a single job
Lots of instructions in a program => Instruction throughput important!

ECE 5367 4436


CPU = Seconds = Instructions x Cycles x Seconds

Performance

Program

Program

Instruction

Cycle

ECE 5367 4436

Amdahl's Law
Speedup due to enhancement E: ExTime w/o E Speedup(E) = -------------------ExTime w/ E Performance w/ E = --------------------Performance w/o E

Suppose that enhancement E accelerates a fraction F of the task by a factor S and the remainder of the task is unaffected then, ExTime(with E) = ((1-F) + F/S) x ExTime(without E) Speedup(with E) = 1 (1-F) + F/S

ECE 5367 4436 Base Machine


Op ALU Load Store Branch Freq 50% 20% 10% 20%
Typical Mix

Cycles 1 5 3 2

CPI(i) .5 1.0 .3 .4 2.2

% Time 23% 45% 14% 18%

How much faster would the machine be if a better data cache reduced the average load time to 2 cycles? How does this compare with using branch prediction to save a cycle off the branch time? What if two ALU instructions could be executed at once?

You might also like