0% found this document useful (0 votes)

16 views

1-Intro 2

This document provides an overview of an advanced computer architecture course. It discusses the instructors, topics covered, course goals, expectations, and components including readings, homework, projects, exams, and grading. The course will cover state-of-the-art computer hardware design fundamentals, current systems, and future systems.

Uploaded by

Omar Ahmed

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

1-Intro 2

Uploaded by

Omar Ahmed

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

ECE 252 / CPS 220 Administrivia

Advanced Computer Architecture I

• addresses, email, website, etc.
Fall 2003 • list of topics
Duke University • expected background
• course requirements
Instructor: Prof. Daniel Sorin ([email protected]) • grading and academic misconduct

based on slides developed by

Profs. Roth, Hill, Wood, Sohi, Smith, Lipasti, and Vijaykumar

© 2003 by by Sorin, Roth, Hill, Wood, ECE 252/ CPS 220 Lecture Notes 1 © 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 2
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

Instructors and Course Website Where to Get Answers

instructor: Prof. Daniel Sorin ([email protected]) Consult course resources in this order:
• office: 1111 Hudson Hall • Course Website (http://www.ee.duke.edu/~sorin/ece252)
• office hours: Weds 3-4, Thurs 10-11 • Course Newsgroup (duke.courses.ece252)
• TA: Phil Paik ([email protected])
TA: Phil Paik ([email protected]) • Professor Sorin
• no office hours - send questions by email

Email to TA and Professor must have subject that begins with

website: http://www.ee.duke.edu/~sorin/ece252 ECE252
• info on schedule, lectures, readings, etc. • No other email will be answered!
• please become familiar with this website
• don’t email me or Phil without first checking website for the answer

© 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 3 © 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 4
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
What is This Course All About? Course Goals and Expectations
State-of-the-art computer hardware design Course Goals
Topics + Understand how current processors work
• Uniprocessor architecture (i.e., microprocessors) + Understand how to evaluate/compare processors
• Memory architecture + Learn how to use simulator to perform experiments
• I/O architecture + Learn research skills by performing term project
• Brief look at multithreading and multiprocessors
Course expectations:
Fundamentals, current systems, and future systems • Will loosely follow text
Will read from textbook, classic papers, brand-new papers • Major emphasis on cutting-edge issues
• Students will read a list of research papers
• Term project
© 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 5 © 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 6
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

Expected Background Course Components

• Basic architecture (ECE 152 / CPS 104) Reading Materials
• Basic OS (ECE 153 / CPS 110) • Computer Architecture: A Quantitative Approach by
Hennessy and Patterson, 3rd Edition
• Readings in Computer Architecture by Hill, Jouppi, Sohi
Other useful and related courses:
• Recent research papers (online)
• Digital system design (ECE 251)
• VLSI systems (ECE 261) Homework
• Multiprocessor architecture (ECE 259 / CPS 221) • 4 to 6 homework assignments
• Fault tolerant computing (ECE 254) Project
• Computer networks and systems (CPS 114 & 214) • Groups of 3 or 2
• Programming languages & compilers (CS 106 & 206) Exams
• Advanced OS (CPS 210) • Midterm and final exam, in class

© 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 7 © 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 8
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
Grading Conduct
Grading breakdown Lectures
• Homework: 30% • Consult course website for tentative schedule
• Midterm: 15% Academic Misconduct
• Project: 25% • University policy will be followed strictly
• Final: 30% • Zero tolerance for cheating and/or plagiarism
Late policy
• Late homeworks
• late <1 day = 50% off
• late >1 day = zero
• No late term project will be accepted. Period.
Now, moving on to computer architecture ...

© 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 9 © 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 10
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

Question What is Computer Architecture?

What do these two time periods have in common? The term architecture is used here to describe the attributes of a
system as seen by the programmer, i.e., the conceptual structure
• big bang–2001 and functional behavior as distinct from the organization of the
• 2001–2003 dataflow and controls, the logic design, and the physical
implementation.
–Gene Amdahl, IBM Journal of R&D, Apr 1964

© 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 11 © 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 12
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
Architecture and Other Disciplines Levels of Computer Architecture
architecture
Application Software
• functional appearance to immediate user
• opcodes, addressing modes, architected registers
Operating Systems, Compilers, Networking Software
implementation (microarchitecture)
Computer Architecture • logical structure that performs the architecture
• pipelining, functional units, caches, physical registers

Circuits, Wires, Devices, Network Hardware realization (circuits)

• physical structure that embodies the implementation
• gates, cells, transistors, wires
architecture interacts with many other fields
• can’t be studied in a vacuum

© 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 13 © 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 14
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

Role of the Computer Microarchitect Applications -> Requirements -> Designs

architect: defines the hardware/software interface • scientific: weather prediction, molecular modeling
• need: large memory, floating-point arithmetic
microarchitect: defines the hardware implementation • examples: CRAY-1, T3E, IBM DeepBlue, BlueGene
• usually the same person • commercial: inventory, payroll, web serving, e-commerce
• need: integer arithmetic, high I/O
decisions based on
• examples: SUN SPARCcenter, Enterprise
• applications
• desktop: multimedia, games, entertainment
• performance • need: high data bandwidth, graphics
• cost • examples: Intel Pentium4, IBM Power4, Motorola PPC 620

• reliability • mobile: laptops

• need: low power (battery), good performance
• power . . . • examples: Intel Mobile Pentium III, Transmeta TM5400
• embedded: cell phones, automobile engines, door knobs
• need: low power (battery + heat), low cost
• examples: Compaq/Intel StrongARM, X-Scale, Transmeta TM3200
© 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 15 © 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 16
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
Why Study Computer Architecture? Why Study Computer Architecture?
answer #1: requirements are always changing answer #2: technology playing field is always changing

aren’t computers fast enough already? • annual technology improvements (approximate)

• SRAM (logic): density +25%, speed +20%
• are they? • DRAM (memory): density + 60%, speed: + 4%
• fast enough to do everything we will EVER want? • disk (magnetic): density +25%, speed: + 4%
• AI, VR, protein sequencing, ???? • fiber: ??
• parameters change and change relative to one another!
is speed the only goal?
• power: heat dissipation + battery life
• cost designs change even if requirements fixed
• reliability but requirements are not fixed
• etc.

© 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 17 © 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 18
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

Examples of Changing Designs Moore’s Law

example I: caches “Cramming More Components onto Integrated Circuits”
• 1970: 10K transistors, DRAM faster than logic -> bad idea –G.E. Moore, Electronics, 1965
• 1990: 1M transistors, logic faster than DRAM -> good idea
• will caches ever be a bad idea again? • observation: (DRAM) transistor density doubles annually
example II: out-of-order execution • became known as “Moore’s Law”
• wrong—density doubles every 18 months (had only 4 data points)
• 1985: 100K transistors + no precise interrupts -> bad idea
• corollaries
• 1995: 2M transistors + precise interrupts -> good idea • cost / transistor halves annually (18 months)
• 2005: 100M transistors + 10GHz clock -> bad idea? • power per transistor decreases with scaling
• speed increases with scaling
semiconductor technology is an incredible driving force • reliability increases with scaling (depends how small!)

© 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 19 © 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 20
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
Moore’s Law Evolution of Single-Chip Microprocessors
“performance doubles every 18 months” 1971–1980 1981–1990 1991–2000 2010
• common interpretation of Moore’s Law, not original intent Transistor Count 10K–100K 100K–1M 1M–100M 1B
• wrong! “performance” doubles every ~2 years
Clock Frequency 0.2–2MHz 2–20MHz 20M–1GHz 10GHz
• self-fulfilling prophecy (Moore’s Curve)
• 2X every 2 years = ~3% increase per month IPC < 0.1 0.1–0.9 0.9–2.0 10 (?)
• 3% per month used to judge performance features
• if feature adds 9 months to schedule... MIPS/MFLOPS < 0.2 0.2–20 20–2,000 100,000
• ...it should add at least 30% to performance (1.039 = 1.30 → 30%)
• Itanium: under Moore’s Curve in a big way
some perspective: 1971–2001 performance improved 35,000X!!!
• what if cars improved at this rate?
• 1971: 60 MPH / 10 MPG
Q: what do (big bang–2001 and 2000–2003) have in common?
• 2001: 2,100,000 MPH / 350,000 MPG
A: same absolute increase in computing power • but... what if cars crashed as often as computers did?

© 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 21 © 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 22
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

Performance and Cost Readings

• performance metrics Hennessy & Patterson
• CPU performance equation • Chapter 1
• benchmarks and benchmarking
• reporting averages
• Amdahl’s Law
• Little’s Law
• concepts
• balance
• tradeoffs
• bursty behavior (average and peak performance)
• cost (mostly read on your own)

© 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 23 © 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 24
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
Performance Metrics Performance Metric I: MIPS
latency: response time, execution time MIPS (millions of instructions per second)
• good metric for fixed amount of work (minimize time) • (instruction count / execution time in seconds) x 10-6
throughput: bandwidth, work per time, “performance” – instruction count is not a reliable indicator of work
• some optimizations add instructions
• = (1 / latency) when there is NO OVERLAP • work per instruction varies (FP mult >> register move)
• > (1 / latency) when there is overlap • instruction sets are not equal (3 Pentium instrs != 3 Alpha instrs)
• in real processors there is always overlap (e.g., pipelining) – may vary inversely with actual performance
• good metric for fixed amount of time (maximize work)
relative MIPS
comparing performance • (timereference / timenew) x MIPSreference
• A is N times faster than B iff + a little better than native MIPS
• perf(A)/perf(B) = time(B)/time(A) = N
– but very sensitive to reference machine
• A is X% faster than B iff
• perf(A)/perf(B) = time(B)/time(A) = 1 + X/100 • upshot: may be useful if same ISA/compiler/OS/workload

© 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 25 © 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 26
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

Performance Metric II: MFLOPS CPU Performance Equation

MFLOPS (millions of floating-point operations per second) processor performance = seconds / program
• (FP ops / execution time) x 10-6 • separate into three components
• like MIPS, but counts only FP operations
• FP ops can’t be optimized away (problem #1 from MIPS)
• FP ops have longest latencies anyway (problem #2) instructions cycles seconds
x x
• FP ops are the same across machines (problem #3) program instruction cycle
– may have been valid in 1980 (most programs were FP)
• most programs today are “integer” i.e., light on FP (#1)
• load from memory takes longer than FP divide (#2) architecture implementation realization
• Cray doesn’t implement divide, Motorola has SQRT, SIN, COS (#3) (ISA) (micro-architecture) (physical layout)
normalized MFLOPS: tries to deal with problem #3 compiler-designer processor-designer circuit-designer
• (canonical FP ops / time) x 10-6
– assign a canonical # FP ops to a HLL program
CPS 206 ECE 252 / CPS 220 ECE 261

© 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 27 © 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 28
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
CPU Performance Equation CPU Performance Comparison
instructions / program: dynamic instruction count famous example: “RISC Wars” (RISC vs. CISC)
• mostly determined by program, compiler, ISA • assume
• instructions / program: CISC = P, RISC = 2P
cycles / instruction: CPI • CPI: CISC = 8, RISC = 2
• mostly determined by ISA and CPU/memory organization • CISC time = P x 8 x T = 8PT
seconds / cycle: cycle time, clock time, 1 / clock frequency • RISC time = 2P x 2 x T = 4PT
• mostly determined by technology and CPU organization • RISC time = CISC CPU time/2
uses of CPU performance equation
• high-level performance comparisons the truth is much, much, much more complex
• back of the envelope calculations • actual data from IBM AS/400 (CISC -> RISC in 1995):
• helping architects think about compilers and technology • CISC (IMPI) time = P x 7 x T = 7PT
• RISC (PPC) time = 3.1P x 3 x T/3.1 = 3PT (+1 tech. gen.)

© 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 29 © 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 30
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

CPU Back-of-the-Envelope Calculation Actually Measuring Performance

base machine how are execution-time & CPI actually measured?
• 43% ALU ops (1 cycle), 21% loads (1 cycle) • execution time: time (Unix cmd): wall-clock, CPU, system
• 12% stores (2 cycles), 24% branches (2 cycles) • CPI = CPU time / (clock frequency * # instructions)
• note: pretending latency is 1 because of pipeline • more useful? CPI breakdown (compute, memory stall, etc.)
Q: should 1 cycle stores be implemented if it slows clock 15%? • so we know what the performance problems are (what to fix)

• old CPI = 0.43 + 0.21 + (0.12 x 2) + (0.24 x 2) = 1.36 measuring CPI breakdown
• new CPI = 0.43 + 0.21 + 0.12 + (0.24 x 2) = 1.24 • hardware event counters (PentiumPro, Alpha DCPI)
• speedup = (P x 1.36 x T) / (P x 1.24 x 1.15T) = 0.95 • calculate CPI using instruction frequencies/event costs

• Answer: NO! • cycle-level microarchitecture simulator (e.g., SimpleScalar)

+ measure exactly what you want
– model microarchitecture faithfully (at least parts of interest)
• method of choice for many architects (yours, too!)

© 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 31 © 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 32
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
Benchmarks and Benchmarking Benchmarks: Instruction Mixes
“program” as unit of work instruction mix: instruction type frequencies
• millions of them, many different kinds, which to use? – ignores dependences
benchmarks + ok for non-pipelined, scalar processor without caches
• the way all processors used to be
• standard programs for measuring/comparing performance
• example: Gibson Mix - developed in 1950’s at IBM
+ represent programs people care about • load/store: 31%, branches: 17%
+ repeatable!! • compare: 4%, shift: 4%, logical: 2%
• fixed add/sub: 6%, float add/sub: 7%
• benchmarking process • float mult: 4%, float div: 2%, fixed mul: 1%, fixed div: <1%
• define workload • qualitatively, these numbers are still useful today!
• extract benchmarks from workload
• execute benchmarks on candidate machines
• project performance on new machine
• run workload on new machine and compare
• not close enough -> repeat

© 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 33 © 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 34
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

Benchmarks: Toys, Kernels, Synthetics Benchmarks: Real Programs

toy benchmarks: little programs that no one really runs real programs
• e.g., fibonacci, 8 queens + only accurate way to characterize performance
– little value, what real programs do these represent? – requires considerable work (porting)
• scary fact: used to prove the value of RISC in early 80’s
Standard Performance Evaluation Corporation (SPEC)
kernels: important (frequently executed) pieces of real programs
• e.g., Livermore loops, Linpack (inner product) • http://www.spec.org
+ good for focusing on individual features not big picture • collects, standardizes and distributes benchmark suites
– over-emphasize target feature (for better or worse) • consortium made up of industry leaders
• ?!#$: program only included if it makes enough members look good
synthetic benchmarks: programs made up for benchmarking
• SPEC CPU (CPU intensive benchmarks)
• e.g., Whetstone, Dhrystone
• SPEC89, SPEC92, SPEC95, SPEC2000 (consortium at work)
• toy kernels++, which programs do these represent?
• other benchmark suites
• SPECjvm, SPECmail, SPECweb

© 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 35 © 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 36
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
SPEC CPU2000 Benchmarking Pitfalls
12 integer programs (C, C++) • benchmark properties mismatch with features studied
• gcc (compiler), perl (interpreter), vortex (database) • e.g., using SPEC for large cache studies
• bzip2, gzip (replace compress), crafty (chess, replaces go) • careless scaling
• eon (rendering), gap (group theoretic enumerations)
• using only first few million instructions (initialization phase)
• twolf, vpr (FPGA place and route)
• reducing program data size
• parser (grammar checker), mcf (network optimization)
• choosing performance from wrong application space
14 floating point programs (C, FORTRAN) • e.g., in a realtime environment, choosing troff
• swim (shallow water model), mgrid (multigrid field solver) • others: SPECweb, TPC-W (amazon.com)
• applu (partial diffeq’s), apsi (air pollution simulation)
• using old benchmarks
• wupwise (quantum chromodynamics), mesa (OpenGL library)
• “benchmark specials”: benchmark-specific optimizations
• art (neural network image recognition), equake (wave propagation)
• fma3d (crash simulation), sixtrack (accelerator design) benchmarks must be continuously maintained and updated!
• lucas (primality testing), galgel (fluid dynamics), ammp (chemistry)

© 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 37 © 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 38
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

Reporting Average Performance What Does The Mean Mean?

averages: one of the things architects frequently get wrong arithmetic mean (AM): average execution times of N programs
+ pay attention now and you won’t get them wrong on exams • ∑1..Ν(time(i)) / N
important things about averages (i.e., means) harmonic mean (HM): average IPCs of N programs
• ideally proportional to execution time (ultimate metric) • arithmetic mean cannot be used for rates (like IPCs)
• AM for times • 30 MPH for 1 mile + 90 MPH for 1 mile != avg. 60 MPH
• HM for rates (IPCs) • N / ∑1..N(1 / rate(i))
• GM for ratios (speedups), time proportionality impossible
• there is no such thing as the average program geometric mean (GM): average speedups of N programs
• use average when absolutely necessary • N√(∏1..N(speedup(i))
what if programs run at different frequencies within workload?
• “weighting”
• weighted AM = (∑1..N w(i) * time(i)) / N

© 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 39 © 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 40
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
GM Weirdness Amdahl’s Law
what about averaging ratios (speedups)? “Validity of the Single-Processor Approach to Achieving Large-
Scale Computing Capabilities” –G. Amdahl, AFIPS, 1967
• HM / AM change depending on which machine is the base
• let optimization speed up fraction f of program by factor s
machine A machine B B/A A/B • speedup = old / ([(1-f) x old] + f/s x old) = 1 / (1 - f + f/s)
Program1 1 10 10 0.1 •
Program2 1000 100 0.1 10
(10+.1)/2 = 5.05 (.1+10)/2 = 5.05
• f = 95%, s = 1.1 → 1/[(1-0.95) + (0.95/1.1)] = 1.094
AM B is 5.05 times faster! A is 5.05 times faster!
• f = 5%, s = 10 → 1/[(1-0.05) + (0.05/10)] = 1.047
2/(1/10+1/.1) = 5.05 2/(1/.1+1/10) = 5.05
HM B is 5.05 times faster! A is 5.05 times faster! • f = 5%, s = ∞ → 1/[(1-0.05) + (0.05/∞)] = 1.052
GM √(10*.1) = 1 √(.1*10) = 1 • f = 95%, s ∞ → 1/[(1-0.95) + (0.95/∞)] = 20
– geometric mean of ratios is not proportional to total time! make common case fast, but...
• if we take total execution time, B is 9.1 times faster
• GM says they are equal
...uncommon case eventually limits performance

© 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 41 © 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 42
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

Little’s Law System Balance

Key Relationship between latency and bandwidth: each system component produces & consumes data
Average number in system = arrival rate * avg. holding time • make sure data supply and demand is balanced
• X demand >= X supply ⇒ computation is “X-bound”
• e.g., memory bound, CPU-bound, I/O-bound
Example: • goal: be bound everywhere at once (why?)
• How big a wine cellar should I build? • X can be bandwidth or latency
• We drink (and buy) an average of 4 bottles per week • X is bandwidth ⇒ buy more bandwidth
• X is latency ⇒ much tougher problem
• On average, I want to age my wine 5 years
• bottles in cellar = 4 bottles/week * 52 weeks/year * 5 years
• = 1040 bottles

© 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 43 © 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 44
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
Tradeoffs Bursty Behavior
“Bandwidth problems can be solved with money. Latency Q: to sustain 2 IPC... how many instructions should processor be
problems are harder, because the speed of light is fixed and you able to
can’t bribe God” –someone famous (John Cocke?)
• fetch per cycle?
well... • execute per cycle?
• can convert some latency problems to bandwidth problems • complete per cycle?
• solve those with money
A: NOT 2 (more than 2)
• the famous “bandwidth/latency tradeoff”
• dependences will cause stalls (under-utilization)
• if desired performance is X, peak performance must be > X
• architecture is the art of making tradeoffs
programs don’t always obey “average” behavior
• can’t design processor only to handle average behvaior

© 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 45 © 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 46
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

Cost Startup and Unit Costs

very important to real designs startup cost: manufacturing
• startup cost • fabrication plant, clean rooms, lithography, etc. (~$3B)
• one large investment per chip (or family of chips) • chip testers/debuggers (~$5M a piece, typically ~200)
• increases with time
• few companies can play this game (Intel, IBM, Sun)
• unit cost
• cost to produce individual copies • equipment more expensive as devices shrink
• decreases with time
startup cost: research and development
• only loose correlation to price and profit
• 300–500 person years, mostly spent in verification
Moore’s corollary: price of high-performance system is constant • need more people as designs become more complex
• performance doubles every 18 months
unit cost: manufacturing
• cost per function (unit cost) halves every 18 months
• raw materials, chemicals, process time (2–5K per wafer)
• assumes startup costs are constant (they aren’t)
• decreased by improved technology & experience
© 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 47 © 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 48
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
Unit Cost and Die Size (Chip Area) Unit Cost → Price
unit cost most strongly influenced by physical size of chip (die) if chip cost 25$ to manufacture, why do they cost $500 to buy?
• semiconductors built on silicon wafers (8”) • integrated circuit costs 25$
• chemical+photolithographic steps create transistor/wire layers
• typical number of metal layers (M) today is 6 (α = ~4) • must still be tested, packaged, and tested again
• cost per wafer is roughly constant C0 + C1 * α (~$5000) • testing (time == $): $5 per working chip
• basic cost per chip proportional to chip area (mm2) • packaging (ceramic+pins): $30
• typical: 150–200mm2, 50mm2 (embedded)–300mm2 (Itanium) • more expensive for more pins or if chip is dissipates a lot of heat
• typical: 300–600 dies per wafer • packaging yield < 100% (but high)
• post-packaging test: another 5$
• yield (% working chips) inversely proportional to area and α
• non-zero defect density (manufacturing defect per unit area) • total for packaged chip: ~$65
• P(working chip) = (1 + (defect density * die area)/α)–α • spread startup cost over volume ($100–200 per chip)
• typical defect density: 0.005 per mm2 • proliferations (i.e., shrinks) are startup free (help profits)
• typical yield: (1 + (0.005 * 200) / 4)–4 = 40% • Intel needs to make a profit...
• typical cost per chip: $5000 / (500 * 40%) = $25 • ... and so does Dell
© 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 49 © 2003 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 50
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

Reading Summary: Performance

H+P
• chapter 1

next up: instruction set design (H+P, chapter 2)

4-BaseBand 6630 Moshell Commands
75% (4)
4-BaseBand 6630 Moshell Commands
23 pages
Administrivia: ECE 252 / CPS 220 Advanced Computer Architecture I
No ratings yet
Administrivia: ECE 252 / CPS 220 Advanced Computer Architecture I
13 pages
Unit Outline: DIA 10001 Construction1: Building Systems and Materials
No ratings yet
Unit Outline: DIA 10001 Construction1: Building Systems and Materials
12 pages
Ee 501 Lecture 1
No ratings yet
Ee 501 Lecture 1
57 pages
Data Structure and Algorithms: - Course Information
No ratings yet
Data Structure and Algorithms: - Course Information
22 pages
Session 1- Intro and Laws -Notes
No ratings yet
Session 1- Intro and Laws -Notes
54 pages
CIV576_w25_tut1_mass_balance
No ratings yet
CIV576_w25_tut1_mass_balance
5 pages
EE290C - Spring 2011 Course Prerequisites: High-Speed Electrical Interface Circuit Design Lecture 1: Introduction
No ratings yet
EE290C - Spring 2011 Course Prerequisites: High-Speed Electrical Interface Circuit Design Lecture 1: Introduction
4 pages
2_ResearchMethodology-New
No ratings yet
2_ResearchMethodology-New
35 pages
Lec 1
No ratings yet
Lec 1
29 pages
Bgec 107-Iit Course Outline 2025-5
No ratings yet
Bgec 107-Iit Course Outline 2025-5
7 pages
EECS3214-00-Introduction
No ratings yet
EECS3214-00-Introduction
26 pages
EPOL 472 FA2023 Week 1 Final_Students
No ratings yet
EPOL 472 FA2023 Week 1 Final_Students
66 pages
Syllabus and Review: Modern Binary Exploitation CSCI 4968 - Spring 2015 Alex Bulazel
No ratings yet
Syllabus and Review: Modern Binary Exploitation CSCI 4968 - Spring 2015 Alex Bulazel
60 pages
Tech Writing 1
No ratings yet
Tech Writing 1
14 pages
CVEN40720 Introduction
No ratings yet
CVEN40720 Introduction
9 pages
Introduction 6319
No ratings yet
Introduction 6319
12 pages
Lecture No 1 CE 201 CAO
No ratings yet
Lecture No 1 CE 201 CAO
20 pages
Week1 Lec1 Syllabus Introduction Ch1 Fa24
No ratings yet
Week1 Lec1 Syllabus Introduction Ch1 Fa24
54 pages
01 Intro
No ratings yet
01 Intro
47 pages
Syllabus slides ECE-579 W 25-st
No ratings yet
Syllabus slides ECE-579 W 25-st
33 pages
Introduction To Database Systems CSE 414
No ratings yet
Introduction To Database Systems CSE 414
38 pages
EL PPT Sample
No ratings yet
EL PPT Sample
15 pages
About This Course: Software Engineering Professional Introduction (On Software Engineering)
No ratings yet
About This Course: Software Engineering Professional Introduction (On Software Engineering)
5 pages
Academic Expectations
No ratings yet
Academic Expectations
30 pages
Digital Design and Synthesis: Instructors
No ratings yet
Digital Design and Synthesis: Instructors
11 pages
DSCI525-Lec1
No ratings yet
DSCI525-Lec1
74 pages
Lecture 01-F13 Introduction
No ratings yet
Lecture 01-F13 Introduction
17 pages
AVotW2025-Lecture-01-Tammi
No ratings yet
AVotW2025-Lecture-01-Tammi
67 pages
Chapter 0 Introduction
No ratings yet
Chapter 0 Introduction
14 pages
ICT100 Topic 01 Lecture Slides
No ratings yet
ICT100 Topic 01 Lecture Slides
6 pages
Course Introduction
No ratings yet
Course Introduction
11 pages
School of Engineering Learning Support: Senior Learning Adviser: Shalini Watson
No ratings yet
School of Engineering Learning Support: Senior Learning Adviser: Shalini Watson
27 pages
CLang Intro
No ratings yet
CLang Intro
11 pages
COMP5320 2025 Wk1-L1-Introduction - Tagged
No ratings yet
COMP5320 2025 Wk1-L1-Introduction - Tagged
58 pages
Introduction To Digital Logic Design
No ratings yet
Introduction To Digital Logic Design
56 pages
Logic design - Lec.0
No ratings yet
Logic design - Lec.0
12 pages
MECHENG 438 Fall 2014 Internal Combustion Engines: Lecturer Jason Martz GSI Sotiris Mamalis
No ratings yet
MECHENG 438 Fall 2014 Internal Combustion Engines: Lecturer Jason Martz GSI Sotiris Mamalis
20 pages
Course Introduction Today's Class
No ratings yet
Course Introduction Today's Class
7 pages
Virtual Learning For Management Education: Dr. Jeyakesavan Veerasamy Jeyv@utdallas - Edu
No ratings yet
Virtual Learning For Management Education: Dr. Jeyakesavan Veerasamy Jeyv@utdallas - Edu
21 pages
Digital System Design (EEE3544) : Fall Semester, 2013
No ratings yet
Digital System Design (EEE3544) : Fall Semester, 2013
7 pages
Lec01 Intro
No ratings yet
Lec01 Intro
6 pages
0 - Class Slides AD017 Intro To OM Course - 2022 Generica
No ratings yet
0 - Class Slides AD017 Intro To OM Course - 2022 Generica
19 pages
Lec 0 - Orientation
No ratings yet
Lec 0 - Orientation
17 pages
Geem 433 Lec 1 Part I
No ratings yet
Geem 433 Lec 1 Part I
9 pages
00-Introduction
No ratings yet
00-Introduction
6 pages
Embedded Systems I: Today's Lecture
No ratings yet
Embedded Systems I: Today's Lecture
18 pages
00-Course Overview
No ratings yet
00-Course Overview
19 pages
Lecture 1
No ratings yet
Lecture 1
27 pages
Course Syllabus: Architecture 12 - Building Construction Materials and Methods
No ratings yet
Course Syllabus: Architecture 12 - Building Construction Materials and Methods
4 pages
Week1Introtomodule 91647
No ratings yet
Week1Introtomodule 91647
21 pages
1 Introduction
No ratings yet
1 Introduction
18 pages
00 Introduction
No ratings yet
00 Introduction
12 pages
Geology For Engineers: Leslie Gertsch
No ratings yet
Geology For Engineers: Leslie Gertsch
11 pages
1intro
No ratings yet
1intro
26 pages
Lec01 Overview
No ratings yet
Lec01 Overview
17 pages
Welcome To CSE 502
No ratings yet
Welcome To CSE 502
18 pages
HO0_Lec0_Course Introduction
No ratings yet
HO0_Lec0_Course Introduction
25 pages
Course Introduction
No ratings yet
Course Introduction
6 pages
Computer Craft Coursebook 5
From Everand
Computer Craft Coursebook 5
Sarala Devi Dhanapal
No ratings yet
Motivating and Rewarding University Teachers to Improve Student Learning: A Guide for Faculty and Administrators
From Everand
Motivating and Rewarding University Teachers to Improve Student Learning: A Guide for Faculty and Administrators
Ronald R. Woods
No ratings yet
End Term Examination - Learner Instructions IIMK - 7.12.23
No ratings yet
End Term Examination - Learner Instructions IIMK - 7.12.23
2 pages
Aster Series - User Manual
No ratings yet
Aster Series - User Manual
60 pages
Lab Linux
No ratings yet
Lab Linux
2 pages
PAN MigrationTool Usersguide 310 CC
No ratings yet
PAN MigrationTool Usersguide 310 CC
108 pages
2301-2303 Log Book
No ratings yet
2301-2303 Log Book
17 pages
Neuheiten 2011 GB
No ratings yet
Neuheiten 2011 GB
52 pages
State-Of-The-Art of Historic Building Information Modelling - HBIM Trends in The Built Heritage
No ratings yet
State-Of-The-Art of Historic Building Information Modelling - HBIM Trends in The Built Heritage
14 pages
Current Affairs - 04-12-2024
No ratings yet
Current Affairs - 04-12-2024
5 pages
5 (T) InsurTech Examining The Role of Technology in Insurance Sector
No ratings yet
5 (T) InsurTech Examining The Role of Technology in Insurance Sector
17 pages
Directions: Answer Directly To The Given: How Will You Promote Arts From A Particular Region? Make Sure To Cite Examples
No ratings yet
Directions: Answer Directly To The Given: How Will You Promote Arts From A Particular Region? Make Sure To Cite Examples
31 pages
CV For Witness Kufamuni
No ratings yet
CV For Witness Kufamuni
5 pages
String Matching 0
No ratings yet
String Matching 0
40 pages
WireSense For Fronius RW 6-En
No ratings yet
WireSense For Fronius RW 6-En
72 pages
Beef
No ratings yet
Beef
5 pages
NATCAT-Team-TORS_compressed
No ratings yet
NATCAT-Team-TORS_compressed
14 pages
Smartphones Past Present and Future
No ratings yet
Smartphones Past Present and Future
4 pages
ST3189 - Machine Learning - 2019 Examiners Commentaries
No ratings yet
ST3189 - Machine Learning - 2019 Examiners Commentaries
12 pages
Data Integration Overview May 2020
No ratings yet
Data Integration Overview May 2020
37 pages
Otis elevator snap control
No ratings yet
Otis elevator snap control
2 pages
Pioneer Library Management System
No ratings yet
Pioneer Library Management System
20 pages
Java Quiz 1 Laboratory Exercise
No ratings yet
Java Quiz 1 Laboratory Exercise
4 pages
pwx-z150 Brosur
No ratings yet
pwx-z150 Brosur
4 pages
Part3 - Z-Transform, Filter Analysis, Time-Frequency Analysis
No ratings yet
Part3 - Z-Transform, Filter Analysis, Time-Frequency Analysis
22 pages
CX-Supervisor - Man Script Language W909-E2-01.gb
No ratings yet
CX-Supervisor - Man Script Language W909-E2-01.gb
170 pages
Dissertation Topics On Cyber Crime
100% (2)
Dissertation Topics On Cyber Crime
7 pages
SAP TM EWM Integration 1706252712
No ratings yet
SAP TM EWM Integration 1706252712
31 pages
HCIA-Computing V3.0 Training Material
No ratings yet
HCIA-Computing V3.0 Training Material
443 pages
LD Config
No ratings yet
LD Config
5 pages
AUI3703 Exam Notes
No ratings yet
AUI3703 Exam Notes
34 pages

1-Intro 2

Uploaded by

1-Intro 2

Uploaded by

ECE 252 / CPS 220 Administrivia

Advanced Computer Architecture I

based on slides developed by

Profs. Roth, Hill, Wood, Sohi, Smith, Lipasti, and Vijaykumar

Instructors and Course Website Where to Get Answers

Email to TA and Professor must have subject that begins with

Expected Background Course Components

Question What is Computer Architecture?

Circuits, Wires, Devices, Network Hardware realization (circuits)

Role of the Computer Microarchitect Applications -> Requirements -> Designs

• reliability • mobile: laptops

aren’t computers fast enough already? • annual technology improvements (approximate)

Examples of Changing Designs Moore’s Law

Performance and Cost Readings

Performance Metric II: MFLOPS CPU Performance Equation

CPU Back-of-the-Envelope Calculation Actually Measuring Performance

• Answer: NO! • cycle-level microarchitecture simulator (e.g., SimpleScalar)

Benchmarks: Toys, Kernels, Synthetics Benchmarks: Real Programs

Reporting Average Performance What Does The Mean Mean?

Little’s Law System Balance

Cost Startup and Unit Costs

Reading Summary: Performance

next up: instruction set design (H+P, chapter 2)

You might also like