1. The document discusses performance metrics like clock rate, CPI, instructions per cycle (IPC), and execution time for different processors P1, P2, and P3 executing the same instruction set. It provides calculations to compare their performance in instructions per second (MIPS) and the number of cycles and instructions needed to execute a program in 10 seconds.
2. The document also analyzes the performance of two implementations (P1 and P2) of the same instruction set architecture with different clock rates and CPI values for instruction classes A, B, C, and D. It calculates total execution times for programs with different distributions of instruction classes to determine which implementation is faster.
3. Further sections
1. The document discusses performance metrics like clock rate, CPI, instructions per cycle (IPC), and execution time for different processors P1, P2, and P3 executing the same instruction set. It provides calculations to compare their performance in instructions per second (MIPS) and the number of cycles and instructions needed to execute a program in 10 seconds.
2. The document also analyzes the performance of two implementations (P1 and P2) of the same instruction set architecture with different clock rates and CPI values for instruction classes A, B, C, and D. It calculates total execution times for programs with different distributions of instruction classes to determine which implementation is faster.
3. Further sections
3 If a compter connected to a 1 Gigabit Ethernet nerwork needs to send
a 256Kbytes file, how long it would take? Answer: Network speed: 1 gigabit network ==> 1 gigabit/per second = 125 Mbytes/ second. File size: 256 Kbytes = 0.256 Mbytes. Time for 0.256 Mbytes = 0.256/125 = 2.048 ms
For problems below, use the information about access time for every type of memory in the following table. Cache DRAM Flash Memory Magnetic Disk 5ns 50 ns 5 s 5 ms
1.2.4 Find how long it takes to read a file from a DRAM if it takes 2 microseconds from the cache memory.
Answer: 2 microseconds from cache ==> 20 microseconds from DRAM. 20 micro- seconds from DRAM ==> 2 seconds from magnetic disk. 20 microseconds from DRAM ==> 2 ms from ash memory
Exercise 1.3 Consider three different processors P1, P2, and P3 executing the same instruction set with the clock rates and CPIs given in the following table.
1.3.3 [10] <1.4> We are trying to reduce the time by 30% but this leads to an increase of 20% in the CPI. What clock rate should we have to get this time reduction? Answer:
1.3.4 Find the IPC (instructions per cycle) for each processor For problems below, use the information in the following table. Processor Rate Clock No. Instructions Time P1 3 GHz 20.10 9 7s P2 1.5 GHz 30.10 9 10s P3 3 GHz 90.10 9 9s
1.3.5 [5] <1.4> Find the clock rate for P2 that reduces its execution time to that of P1 Answer: f_new = No. instr. CPI/time_new f_old = No. instr. CPI/time_old f_new/f_old = time_old/time_new f_new = (f_old * 10/7) = 1.5 Ghz *10/7 = 2.14 Ghz
1.3.6 [5] <1.4> Find the number of instructions for P2 that reduces its execution time to that of P3 Answer: No.instr_new = (f * time_new) / CPI No.instr_old = (f * time_old) / CPI No.instr_new / No.instr_old = time_new / time_old No.instr_new = No.instr_old * 9/10 = 30*10 9 *9 / 10 = 27 * 10 9
Exercise 1.4 Consider two different implementations of the same instruction set architecture. There are four classes of instructions, A, B, C, and D. The clock rate and CPI of each implementation are given in the following table. Clock rate CPI Class A CPI Class B CPI Class C CPI Class D P1 1.5 Ghz 1 2 3 4 P2 2 Ghz 2 2 2 2
1.4.1 Given a program with 10 6 instructions divided into classes as follows: 10% class A, 20% class B, 50% class C, and 20% class D, which implementation is faster?
Answer: P2 Class A: 10 5 instr. Class B: 2 10 5 instr. Class C: 5 10 5 instr. Class D: 2 10 5 instr. Time = No. instr. CPI/clock rate
P1: Time class A = (10 5 /1.5*10 9 ) = 0.66 10 -4
Time class B = 2.66 10 -4
Time class C = 10 10 -4
Time class D = 5.33 10 -4
Total time P1 = 18.65 10 -4
P2: Time class A = 10 -4
Time class B = 2 10 -4
Time class C = 5 10 -4
Time class D = 3 10 -4
Total time P2 = 11 10 -4
1.4.2 [5] <1.4> What is the global CPI for each implementation? Answer: CPI = time clock rate/No. instr. CPI(P1) = 18.65 10 -4 1.5 10 9 /10 6 = 2.79 CPI(P2) = 11 10 -4 2 10 9 /10 6 = 2.2
1.4.4 [5] <1.4> Assuming that arith instructions take 1 cycle, load and store 5 cycles, and branches 2 cycles, what is the execution time of the program in a 2 GHz processor?
The following table shows the number of instructions for a program. Arith Store Load Branch Total 500 50 100 50 700
Answer: CPU Time =
Clock rate = (500 *1 + 50 * 5+ 100*5+50*2)/(2*10 9 ) = 675 * 10 -9 s = 675 ns 1.4.5 [5] <1.4> Find the CPI for the program. Answer: CPI = time clock rate/No. instr. CPI = 675 10 -9 2 10 9 /700 = 1.92
1.4.6 [10] <1.4> If the number of load instructions can be reduced by one half, what is the speedup and the CPI? Answer: Time = (500 1 + 50 5 + 50 5 + 50 2) 0.5 10 -9 = 550 ns Speed-up = 675 ns/550 ns = 1.22 CPI = 550 10 -9 2 10 9 /700 = 1.57
Exercise 1.5 Consider two different implementations, P1 and P2, of the same instruction set. There are five classes of instructions (A, B, C, D, and E) in the instruction set. The clock rate and CPI of each class is given below. Clock Rate CPI Class A CPI Class B CPI Class C CPI Class D CPI Class E a P1 1.0 GHz 1 2 3 4 3 P2 1.5 Ghz 2 2 2 4 4 b P1 1.0 GHz 1 1 2 3 2 P2 1.5 Ghz 1 2 3 4 3
1.5.1 [5] <1.4> Assume that peak performance is defined as the fastest rate that a computer can execute any instruction sequence. What are the peak performances of P1 and P2 expressed in instructions per second?
Answer: a. Peak performance on P1 occurs when only class A instructions are executed peakP1 = 1 inst/cycle x 1 x 10 9 cycles/sec = 1 x 10 9 inst/sec = 1G inst/sec peak P2 = (1/2) inst/cycle x 1.5 x 10 9 cycles/sec = 0.75 x 10 9 inst/sec = 0.75G inst/sec b. Peak performance on P1 occurs when only class A instructions are executed peakP1 = 1 inst/cycle x 1 x 10 9 cycles/sec = 1 x 10 9 inst/sec = 1G inst/sec peak P2 = 1 inst/cycle x 1.5 x 10 9 cycles/sec = 1.5 x 10 9 inst/sec = 1.5G inst/sec
1.5.2 [10] <1.4> If the number of instructions executed in a certain program is divided equally among the classes of instructions except for class A, which occurs twice as often as each of the others, which computer is faster? How much faster is it? Answer:
CPI Freq Freq*CPI Freq Freq*CPI a 1 0.333 0.333 2 0.666 b 2 0.167 0.334 2 0.334 c 3 0.167 0.501 2 0.334 d 4 0.167 0.668 4 0.668 e 3 0.167 0.501 4 0.668 Total 2.337 2.67
Cpu-time = <cpi> I/F Cpu-time1 = 2.337 I / 1Ghz Cpu-time2 = 2.67 I / 1.5Ghz perf2/perf1 = cpu-time1/cpu-time2 = 1.5 * 2.337/2.67 = 1.3 (Performance = 1/ execution time) a. P2 is 1.33 times faster than P1 b. Same as question a: P1 is 1.03 times faster than P2
1.5.3 [10] <1.4> If the number of instructions executed in a certain program is divided equally among the classes of instructions except for class E, which oc- curs twice as often as each of the others, which computer is faster? How much faster is it? Answer: Same as 1.5.2 a. P2 is 1.31 times faster than P1 b. P1 is 1.00 times faster than P2
1.5.4 [5] <1.4> Assuming that computes take 1 cycle, loads and store instructions take 10 cycles, and branches take 3 cycles, find the execution time on a 3 GHz MIPS processor.
The table below shows instruction type breakdown for different programs. Using this data, you will be exploring the performance trade-offs for different changes made to an MIPS processor. No Instruction Compute Load Store Branch total Program 1 1000 400 100 50 15500 Program 2 1500 300 100 100 1750
Cpu-time1 = Cycle*instruct1/F = 6150/3Ghz = 2.05*10 6 s = 2.05 s Cpu-time2 = Cycle*instruct2/F = 5800/3Ghz = 1.93 s
1.5.5 [5] <1.4> Assuming that computes take 1 cycle, loads and store instructions take 2 cycles, and branches take 3 cycles, find the execution time on a 3 GHz MIPS processor