001 Intro
001 Intro
Fall 2007
Today’s Lecture
• Course Content:
– Building the best processor
• Who cares
• How to define “best”
• Needs/Metrics
• Forces that determine “needs”
– Applications
– Technology
• What is “Computer Architecture”
– Implementation
• Role of the Architect
• Overview of course policies
Course Goal
• Advanced uni-processor/single-chip architecture
– Will use the term “processor”
– May touch on multi-core issues
• Previous courses:
• How to build a processor that works
• Some optimization techniques
• This course:
– What is the BEST processor?
– Recent Research Developments
Transistor speed
improvement
???
Recent Designs
• AMD Athlon 64 FX-62:
– 243M xtors, 90nm, 2.8Ghz, 220 mm^2, 2 cores
• Intel Core Duo Extreme X6900
– 291M xtors, 65nm, 3.2Ghz, 143 mm^2, 2 cores
• AMD Turion 64 ML-40
– 114M xtors, 90nm, 2.2Ghz, 125mm^2, 1 core
• SUN T1 “Niagara”
– 300M xtors, 90nm, 1.2Ghz, 379 mm^2, 8 cores
Understanding the Building Blocks
• Corollaries
– cost / transistor halves annually
– power decreases with scaling
– speed increases with scaling
– reliability increases with scaling (??)
• Not anymore
Shrink
New uArch
• Two challenges:
1. Understand your building blocks:
• today its semiconductors
2. Understand what best means in application
terms
What BEST means?
• Really depends on what your goal is:
– Moving: Best take truck - unless you have nothing...
– SUV? I don’t know, you tell me
– Porche? Have money to burn - cruising
• Observation #1:
– Before we can decide what is best we need to know
the Needs are.
• Moving vs. cruising
• Observation #2:
– Then we need to be able to judge how well each
option serves these needs. Metrics
• Truck vs. Porche
• What if you had to build the best car for a given
purpose?
What BEST processor means?
• Needs:
– Performance: word processing vs. weather
simulation
– Cost: would you pay 5x $ for 2x performance?
– Complexity: Design/validation time -> cost and perf.
– Power: PDA, laptop, server
– Reliability: Must work correctly
• There are a number of forces at work:
– 1. What does the user needs: applications
– 2. What does technology offers: semiconductors
• Why this is challenging:
– Many applications, some yet to be developed
– Technology changes
What is Computer Architecture?
• Architecture: How are things organized and what you
can do with them (functionality)
• What the user needs to know to reason about how the machine
behaves
– Implementation?
• Transistor based (How many can you think?)
– static, dynamic? CMOS, NMOS?
• Moshovos™ implementation
• others?
Architecture vs. µMarch vs. Impl.
The boundaries are a bit blurred, still
64-bit Adder:
— Arch: What it does
— take two 64-bit numbers produce 64-bit sum
— µMarch: How it does it:
— Ripple carry
— Carry lookahead
— Carry prediction
— Implementation
— static, dynamic, CMOS, Synthesized, Custom, etc...
• New Challenge:
– Performance does not double every two years
anymore
– Performance only from coarse-grain parallelism
Texts
• These slides
• • Computer Architecture: A Quantitative Approach, Hennessy
and Patterson, 4th Edition, Morgan Kaufmann
• Readings in Computer Architecture, Hill, Jouppi and Sohi.
• Related conference papers - both classic and cutting-edge
• Conferences:
• • ISCA (international symposium on CA)
• • ASPLOS (arch. support for progr. languages & OSes)
• • MICRO (microarchitecture)
• • HPCA (all encompassing?)
• • Others: PACT, ICS...
• GENERAL INFO: www.cs.wisc.edu/~arch/www
• Online papers: www.computer.org, citeseer.nj.nec.com
About the Course
• Instructors: Andreas Moshovos
• Office hours: via appointment only
• best way to communicate with me: e-mail
– Persist if I don’t respond the “first” time
– [email protected]
• Please use “ACA: Your header here” for all your e-mails
• Course web site: www.eecg.toronto.edu/~moshovos/ACA05
• nothing there yet
• There is no TA
1 through 7 is my responsibility
8: I provide pointers, you make the presentation, we
discuss the papers in class
Course Structure
• We’ll start with defining the sequential execution model
• We’ll then look at various ways of relaxing execution
order in the architecture
• We’ll look at an example of a modern high-performance
processor
• We’ll then look at each component separately
Marking
• This is a grad course: You are expected to be able to
seek information beyond what is discussed in class.
• Project 1/3
• Homeworks 1/3
• Presentations 1/3
2
wafer _ diameter
π× ( )
2 π × wafer _ diameter
Dies / Wafer = test _ dies
Die _ area 2 × die _ area
Die Size, Wafer and Yield
• Bigger die Æ less dies per wafer
• Interesting Discussions:
– Selection of word length and number base.
– Discussion of the instructions needed.
– Concern for the input/output structure and the idea of displays
– Rationale for not including floating-point arithmetic (caution
about the technology).
– The lack of necessity for the rather trivial binary-decimal
conversion hardware and the idea of cost effectiveness.
– Analysis of the addition, multiplication, and division hardware
implementation. (This description includes a nice, one-page
discussion of the average carry length for addition.)
The Task of the Referee: Reading #4 – Must
Read
• Evaluating research/engineering work in computer
architecture
Homework #1
• Fill in an index card
– Provide a photo
– List what program you are in M.A.Sc. Or M.Eng.
• Optional but greatly appreciated:
– Education
– Current Position
• Any details on what you work on
– Programming Languages/OS experience
Homework #2
• Next lecture
• Study performance with pipelining
Out-of-Order Execution the Big Picture
Program Form Processing Phase
Static program
Dispatch/ dependences
dynamic inst.
Stream (trace) inst. Issue
completed
instructions
A Generic Superscalar OOO Processor
Branch
Fetch Prediction
Pre-decode
Unit
I-CACHE
Dispatch
Rename
buffer
Load/Store Scheduler
scheduler scheduler
Reorder buffer
RF RF
FUs
FUs
Memory Interface
A Modern System
I$ D$ I$ D$ I$ D$
interconnect
L2
Main Memory