Lecture 1
Lecture 1
Instructor:
Ji Chen Email: [email protected] Tel: (713)-743-4423 Office: W328 Office Hour: T TH 2:30-3:30 or by appointment None
TA:
ECE 5367 4436 Course Contents 1. 2. 3. 4. 5. 6. 7. 8. 9. Introduction, basic computer organization Instruction formats, instruction sets and their design ALU design: Adders, subtracters, logic operations Multiplication, division, floating point arithmetic Datapath design Control design: Hardwired control, microprogrammed control Pipelining Memory systems I/O
10 %
15 %
Exam 1
Exam 2 Exam 3
25 %
25 % 25 %
Required
NOT REQUIRED
Laboratory assignments may be worked in teams of two (2); however, there should be no collaboration between teams . . Lab assignments turned in late will be penalized 25 points for each calendar day. Both students in a team will receive the same grade for the project. Teams of four (4): describe computer architecture of a modern technology
Projects: Exams:
two mid-term exams, and one final exam. A missed exam will result in a grade of zero Let me know immediately if you have any situation Final Exam - TBD Your final grade will be computed as follows:
Grading:
HW/Quiz/Lab
10 %
Project
Exam 1 Exam 2 Exam 3
15 %
25 % 25 % 25 %
Processor Input
Control
Memory Datapath
Output
L2 $
Inst Cache
Ref MMU
SBus
SBus
DMA
SCSI Ethernet
Bus Interface
Message Bus (Mbus)
SBus Cards
I/O system
Coordination of many levels of abstraction Under a rapidly changing set of forces Design, Measurement, and Evaluation
Technology
Applications
Computer Architecture
Programming Languages
Cleverness
Operating Systems
History
Input Multiplicand
32
Input Multiplier
Multiplicand Register
32=>34 signEx <<1
34
1 0
LoadMp
32
34
34x2 MUX
Multi x2/x1
34
34
Arithmetic
Control Logic
ENC[2] ENC[1] ENC[0]
34-bit ALU 34
Sub/Add
32
32
ShiftAll
2 LO[1:0]
32
32
Result[HI]
Result[LO]
"LO [0]"
LO[1]
Booth Encoder
IFetch Dcd
IFetch Dcd
Performance
IFetchDcd
IFetch Dcd
Pipelining I/O
Memory Systems
198 198 0 1 198 198 2 198 3 4 198 5 198 6 198 198 7 8 198 9 199 0 199 1 199 2 199 199 3 4 199 5 199 6 199 199 7 8 199 9 200 0 Time
Extra 2 bits
Prev
Purchasing perspective Given a collection of machines, which has the Best performance ? Least cost ? Best performance / cost ? Design perspective Faced with design options, which has the Best performance improvement ? Least cost ? Best performance / cost ? Both require basis for comparison metric for evaluation Our goal: understand cost & performance implications of architectural choices
Passengers 470
Concorde
3 hours
1350 mph
132
178,200
Which has higher performance? Time to do the task (Execution Time) execution time, response time, latency Tasks per day, hour, week, sec, ns. .. (Performance) throughput, bandwidth Response time and throughput often are in opposition
Example
Time of Concorde vs. Boeing 747? Concord is 1350 mph / 610 mph = 2.2 times faster = 6.5 hours / 3 hours Throughput of Concorde vs. Boeing 747 ?
Concord is 178,200 pmph / 286,700 pmph Boeing is 286,700 pmph / 178,200 pmph = 0.62 times faster = 1.60 times faster
Boeing is 1.6 times (60%) faster in terms of throughput Concord is 2.2 times (120%) faster in terms of flying time We will focus primarily on execution time for a single job
Lots of instructions in a program => Instruction throughput important!
Performance
Program
Program
Instruction
Cycle
Amdahl's Law
Speedup due to enhancement E: ExTime w/o E Speedup(E) = -------------------ExTime w/ E Performance w/ E = --------------------Performance w/o E
Suppose that enhancement E accelerates a fraction F of the task by a factor S and the remainder of the task is unaffected then, ExTime(with E) = ((1-F) + F/S) x ExTime(without E) Speedup(with E) = 1 (1-F) + F/S
Cycles 1 5 3 2
How much faster would the machine be if a better data cache reduced the average load time to 2 cycles? How does this compare with using branch prediction to save a cycle off the branch time? What if two ALU instructions could be executed at once?