L01-Intro
L01-Intro
Lecture 1 - Introduction
Krste Asanovic
Electrical Engineering and Computer Sciences
University of California at Berkeley
http://people.eecs.berkeley.edu/~krste
http://inst.eecs.berkeley.edu/~cs152
What is Computer Architecture?
Application
Physics
3
Computing Devices Then…
Routers Robots
Smart
phones
Automobiles
Supercomputers
5
Architecture continually changing
Applications
suggest how to Improved
Applications technologies
improve
technology, make new
provide applications
revenue to possible
Technology
fund
development
6
?
Major
Technology
Generations Bipolar
CMOS
nMOS
Vacuum
Tubes pMOS
Relays
[from Kurzweil]
Electromechanical
7
Single-Thread Processor Performance
9
Today’s Dominant Target Systems
• Mobile (smartphone/tablet)
– >1 billion sold/year
– Market dominated by ARM-ISA-compatible general-purpose processor in
system-on-a-chip (SoC)
– Plus sea of custom accelerators (radio, image, video, graphics, audio,
motion, location, security, etc.)
• Warehouse-Scale Computers (WSCs)
– 100,000’s cores per warehouse
– Market dominated by x86-compatible server chips
– Dedicated apps, plus cloud hosting of virtual machines
– Now seeing increasing use of GPUs, FPGAs, custom hardware to
accelerate workloads
• Embedded computing
– Wired/wireless network infrastructure, printers
– Consumer TV/Music/Games/Automotive/Camera/MP3
– Internet of Things!
10
This Year: Combined CS152/CS252
• CS152/CS252 share lectures in 306 Soda, MW 1:00-2:30pm
– For CS252 students, initial lectures are optional review material
– some later lectures include some CS252-only material
• CS152/CS252 share two midterms (in class, 80 minutes each)
– but some questions marked as CS152 only or CS252 only
• CS152 has problem sets
– CS252 students welcome to use PS for revision, self-learning
• CS152 has labs
– CS252 students welcome to use labs for self-learning
• CS152 has discussion sections F 2-4pm, 3113 Etcheverry
• CS152 has final exam
11
CS152/CS252 Administrivia
Instructor: Prof. Krste Asanovic, [email protected]
Office: (inside ADEPT Lab)
Office Hours: Wed. 10-11AM (email to confirm), 567 Soda
T. A.s: David Biancolin, biancolin@eecs OH: Tue, 2pm, Room TBD
Albert Magyar, albert.magyar@berkeley OH: Wed, 4:30pm,
Room TBD
Lectures: MW, 1:00-2:30PM, 306 Soda
252 Readings discussion: Monday 11am-noon, 405 Soda
152 Sections: F 12-2/PM, 2-4PM, 3113 Etcheverry (start 2/1)
Text: Computer Architecture: A Quantitative Approach, Hennessey and
Patterson, 6th Edition (2017)
Readings assigned from this edition, some readings available in older
editions – see web page.
Web page: http://inst.eecs.berkeley.edu/~cs152
Lectures available online by noon before class
Piazza: http://piazza.com/berkeley/spring2019/cs152
12
CS152 Course Grading
• 15% Problem Sets
– Intended to help you learn the material. Feel free to discuss with other
students and instructors, but must turn in your own solutions. Grading
based mostly on effort, but exams assume that you have worked through
all problems. Solutions released after PSs handed in.
• 25% Labs
– Labs use advanced full architectural simulators, including Amazon-hosted
FPGA simulators of working RISC-V systems
– Directed plus open-ended sections to each lab
13
CS252 Course Grading
• 20% Paper readings
– Paper summaries, discussion participation
• 30% Exams (two midterms, 15%+15%)
– Closed-book, no calculators, no smartphones, no smartwatch, no
laptops,...
– Based on lectures, readings, problem sets, and labs
• 50% Class Project
– Substantial research project in pairs, regular 1-1 meetings with staff,
10-page conference-style paper and class presentation,
14
CS152/CS252 Crossovers
• Berkeley undergrads cannot take CS252 before CS152
15
CS152 Labs
• Each lab has directed plus open-ended assignments
• Directed portion (2/7) is intended to ensure students learn
main concepts behind lab
– Each student must perform own lab and hand in their own lab report
• Open-ended assignment (5/7) is to allow you to show your
creativity
– Roughly a one-day “mini-project”
» E.g., try an architectural idea and measure potential, negative results
OK (if explainable!)
– Students can work individually or in groups of two or three
– Group open-ended lab reports must be handed in separately
– Students can work in different groups for different assignments
• Lab reports must be readable English summaries – not
dumps of log files!!!!!!
– We will reward good reports, and penalize undecipherable reports
16
Class ISA is RISC-V
• RISC-V is a new free, simple, clean, extensible ISA we
developed at Berkeley for education (61C/151/152/252)
and research (ParLab/ASPIRE/ADEPT)
– RISC-I/II, first Berkeley RISC implementations
– Berkeley research machines SOAR/SPUR considered RISC-III/IV
• Both of the dominant ISAs (x86 and ARM) are too complex
to use for teaching or research
• RISC-V has taken off commercially
• RISC-V Foundation manages standard riscv.org
• Now upstream support for many tools (gcc, Linux,
FreeBSD, …)
• Nvidia is using RISC-V in all future GPUs
• Western Digital is using RISC-V in all future products
• Govt. India selected RISC-V as national ISA
17
Foundation: 200+ Members
Chisel simulators
• Chisel is a new hardware description language we
developed at Berkeley based on Scala
– Constructing Hardware in a Scala Embedded Language
• Labs will use RISC-V processor simulators derived from
Chisel processor designs
– Gives you much more detailed information than other simulators
– Can map to FPGA or real chip layout
• You need to learn some minimal Chisel in CS152, but we’ll
make Chisel RTL source available so you can see all the
details of our processors
• Can do lab projects based on modifying the Chisel RTL
code if desired
19
Chisel Design Flow
Chisel Design
Description
Chisel Compiler
FPGA ASIC
Verilog Verilog
FPGA
Emulation GDS Layout
20
Questions?
21
Computer Architecture:
A Little History
Throughout the course we’ll use a historical narrative to
help understand why certain ideas arose
22
Analog Computers
Analog computer represents problem variables as
some physical quantity (e.g., mechanical
displacement, voltage on a capacitor) and uses scaled
physical behavior to calculate results
[Marsyas, Creative Commons BY-SA 3.0] Wingtip vortices off Cesna tail in wind tunnel
Antikythera mechanism c.100BC
Digital Computers
Represent problem variables as numbers encoded
using discrete steps
- Discrete steps provide noise immunity
Enables accurate and deterministic calculations
- Same inputs give same outputs exactly
Not constrained by physically realizable functions
Programmable digital computers are CSx52 focus
24
Charles Babbage (1791-1871)
Lucasian Professor of
Mathematics, Cambridge
University, 1828-1839
A true “polymath” with interests
in many areas
Frustrated by errors in printed
tables, wanted to build machines
to evaluate and print accurate
tables
Inspired by earlier work
organizing human “computers” to
methodically calculate tables by
hand
[Copyright expired and in public domain.
Image obtained from Wikimedia Commons.]
25
Difference Engine 1822
Continuous functions can be approximated by
polynomials, which can be computed from difference
tables:
f(n) = n2 + n + 41
d1(n) = f(n) – f(n-1) = 2n
d2(n) = d1(n) – d1(n-1) = 2
n 0 1 2 3 4
d2(n) 2 2 2
d1(n) 2 4 6 8
f(n) 41 43 47 53 61
26
Realizing the Difference Engine
Mechanical calculator, hand-cranked, using decimal digits
Babbage did not complete the DE, moving on to the Analytical
Engine (but used ideas from AE in improved DE 2 plan)
Schuetz in Sweden completed working version in 1855, sold
copy to British Government
28
Analytical Engine Design Choices
Decimal, because storage on mechanical gears
- Babbage considered binary and other bases, but no clear
advantage over human-friendly decimal
40-digit precision (equivalent to >133 bits)
- To reduce impact of scaling given lack of floating-point
hardware
Used “locking” or mechanical amplification to
overcome noise in transferring mechanical motion
around machine
- Similar to non-linear gain in digital electronic circuits
Had a fast “anticipating” carry
- Mechanical version of pass-transistor carry propagate used
in CMOS adders (and earlier in relay adders)
29
Ada Lovelace (1815-1852)
Translated lectures of Luigi
Menabrea who published notes of
Babbage’s lectures in Italy
Lovelace considerably embellished
notes and described Analytical
Engine program to calculate
Bernoulli numbers that would
have worked if AE was built
- The first program!
Imagined many uses of computers
beyond calculations of tables
Was interested in modeling the
brain
31
Atanasoff-Berry Linear Equation Solver (1939)
Fixed-function calculator for solving up to 29 simultaneous
linear equations
Digital binary arithmetic (50-bit fixed-point words)
Dynamic memory (rotating drum of capacitors)
Vacuum tube logic for processing
35
ENIAC
Changing the program could take days!
37
Manchester SSEM “Baby” (1948)
Manchester University group build small-scale experimental
machine to demonstrate idea of using cathode-ray tubes
(CRTs) for computer memory instead of mercury delay lines
Williams-Kilburn Tubes were first random access electronic
storage devices
32 words of 32-bits, accumulator, and program counter
Machine ran world’s first stored-program in June 1948
Led to later Manchester Mark-1 full-scale machine
- Mark-1 introduced index registers
- Mark-1 commercialized by Ferranti
[Piero71, Creative
Commons BY-SA 3.0 ]
Williams-Kilburn
Tube Store
38
Cambridge EDSAC (1949)
Maurice Wilkes came back from workshop in US and set about
building a stored-program computer in Cambridge
EDSAC used mercury-delay line storage to hold up to 1024
words (512 initially) of 17 bits (+1 bit of padding in delay line)
Two’s-complement binary arithmetic
Accumulator ISA with self-modifying code for indexing
David Wheeler, who earned the world’s first computer science
PhD, invented the subroutine (“Wheeler jump”) for this
machine
- Users built a large library of useful subroutines
UK’s first commercial computer, LEO-I (Lyons Electronic Office),
was based on EDSAC, ran business software in 1951
- Software for LEO was still running in the 1980s in emulation on ICL
mainframes!
EDSAC-II (1958) was first machine with microprogrammed
control unit
39
Commercial computers:
BINAC (1949) and UNIVAC (1951)
Eckert and Mauchly left U.Penn after patent rights
disputes and formed the Eckert-Mauchly Computer
Corporation
World’s first commercial computer was BINAC with
two CPUs that checked each other
- BINAC apparently never worked after shipment to first
(only) customer
Second commercial computer was UNIVAC
- Used mercury delay-line memory, 1000 words of 12 alpha
characters
- Famously used to predict presidential election in 1952
- Eventually 46 units sold at >$1M each
- Often, mistakingly called the IBM UNIVAC
40
IBM 701 (1952)
IBM’s first commercial scientific computer
Main memory was 72 William’s Tubes, each 1Kib, for
total of 2048 words of 36 bits each
- Memory cycle time of 12µs
Accumulator ISA with multipler/quotient register
18-bit/36-bit numbers in sign-magnitude fixed-point
Misquote from Thomas Watson Sr/Jr:
“I think there is a world market for maybe five
computers”
Actually TWJr said at shareholder meeting:
“as a result of our trip [selling the 701], on which we
expected to get orders for five machines, we came
home with orders for 18.”
41
IBM 650 (1953)
The first mass-produced computer
Low-end system with drum-based storage and digit
serial ALU
Almost 2,000 produced
Digit-serial
ALU
20-digit
accumulator [From 650 Manual, © IBM]
43
IBM 650 Instruction Set
Address and data in 10-digit decimal words
Instructions encode:
- Two-digit opcode encoded 44 instructions in base
instruction set, expandable to 97 instructions with options
- Four-digit data address
- Four-digit next instruction address
- Programmer’s arrange code to minimize drum latency!
Special instructions added to compare value to all
words on track
44
Early Instruction Sets
Very simple ISAs, mostly single-address accumulator-
style machines, as high-speed circuitry was expensive
- Based on earlier “calculator” model
Over time, appreciation of software needs shaped ISA
Index registers (Kilburn, Mark-1) added to avoid need
for self-modifying code to step through array
Over time, more index registers were added
And more operations on the index registers
Eventually, just provide general-purpose registers
(GPRs) and orthogonal instruction sets
But some other options explored…
45
Burrough’s B5000 Stack Architecture:
Robert Barton, 1960
Hide instruction set completely from programmer
using high-level language (ALGOL)
Use stack architecture to simplify compilation,
expression evaluation, recursive subroutine calls,
interrupt handling,…
46
Evaluation of Expressions
(a + b * c) / (a + d * c - e)
/
+ -
a * + e
b c a *
d
c
c
*
b b* c
Reverse Polish a
abc*+adc*+e-/
Evaluation Stack
push
pushabc
push
multiply
47
Evaluation of Expressions
(a + b * c) / (a + d * c - e)
/
+ -
a * + e
b c a *
d
c
b*c
+
Reverse Polish a + ba * c
abc*+adc*+e-/
Evaluation Stack
add
48
IBM’s Big Bet: 360 Architecture
By early 1960s, IBM had several incompatible families
of computer:
701 → 7094
650 → 7074
702 → 7080
1401 → 7010
49
IBM 360 : Design Premises
Amdahl, Blaauw and Brooks, 1964
The design must lend itself to growth and successor
machines
General method for connecting I/O devices
Total performance - answers per month rather than bits
per microsecond → programming aids
Machine must be capable of supervising itself without
manual intervention
Built-in hardware fault checking and locating aids to
reduce down time
Simple to assemble systems with redundant I/O devices,
memories etc. for fault tolerance
Some problems required floating-point larger than 36 bits
50
Stack versus GPR Organization
Amdahl, Blaauw and Brooks, 1964
51
IBM 360: A General-Purpose Register
(GPR) Machine
Processor State
- 16 General-Purpose 32-bit Registers
- may be used as index and base register
- Register 0 has some special properties
- 4 Floating Point 64-bit Registers
- A Program Status Word (PSW)
- PC, Condition codes, Control flags
A 32-bit machine with 24-bit addresses
- But no instruction contains a 24-bit address!
Data Formats
- 8-bit bytes, 16-bit half-words, 32-bit words, 64-bit double-
words
The IBM 360 is why bytes are 8-bits long today!
52
IBM 360: Initial Implementations
54
Server Market
55
And in conclusion …
• Computer Architecture >> ISAs and RTL
• CSx52 is about interaction of hardware and software, and
design of appropriate abstraction layers
• Computer architecture is shaped by technology and
applications
– History provides lessons for the future
• Computer Science at the crossroads from sequential to
parallel computing
– Salvation requires innovation in many fields, including computer
architecture
• Read Chapter 1 & Appendix A for next time! (5th edition)
56
Acknowledgements
• These slides contain material developed and copyright by:
– Arvind (MIT)
– Krste Asanovic (MIT/UCB)
– Joel Emer (Intel/MIT)
– James Hoe (CMU)
– John Kubiatowicz (UCB)
– David Patterson (UCB)
57