OLD DAT105 Exercise4
OLD DAT105 Exercise4
Minh Quang Do
2007-11-29
[3]
[4]
11/8/2007
11/8/2007
Cache Organization
Size of a Tag field = s (n + m + w)
Where: s: # of memory address bits w: Byte offset (2w = # bytes per a word) m: word offset (2m= # of words per a block) n: Index (2n = # of sets)
11/8/2007
Cache Organization
Size of a Tag field = s (n log2A + m + w)
11/8/2007
4-way set associative cache: 4KB s=32, w=2,m=0, n=8 -> tag= 22
11/8/2007 8
L1: direct-mapped 8KB L2: direct-mapped 4MB L1, L2 use 64B blocks Page size: 8KB TLB: direct-mapped with 256 entries Virtual: 64b, Physical: 40b
11/8/2007 9
11/8/2007
10
CACTI Algorithm to find the best Power*Delay Product and Area Estimation for it
(from [2])
11/8/2007
11
11/8/2007
12
http://quid.hpl.hp.com:9081/cacti/
11/8/2007 13
Web Interface
11/8/2007
14
11/8/2007
15
Exercise 5.1a
Using CACTI 4.2, for direct-mapped, 2-way and 4-way set associative caches of 32KB with 64B block size implemented in 90-nm process: (OBS! No leakage power for 90-nm) 32% 21%
90nm (Vdd=1.04869097076) Cache size: 32KB with 64B line Access time (ns) Total dynamic Read Power at max. freq. (W) Cycle Time (ns) Total area subbanked (mm^2) Nbank N of sets per bank Ndwl Ndbl Nspd Ntwl Ntbl Ntspd 11/8/2007 1-way 0.727237609 0.041520158 0.474678046 0.555439619 1 512 1 8 1 32 4 16 2-way 0.95916641708 0.05737028351 0.47078647474 0.78984622349 1 256 4 4 2 16 1 32
more
more
4-way 0.883799463 0.149968343 0.421119816 0.743759781 1 128 16 1 2 1 16 1 16
Exercise 5.1b
Using CACTI 4.2, for 2-way set associative caches of 16KB, 32KB and 64KB with 64B block size implemented in 90-nm process:
90nm (Vdd=1.04869097076)
18% more
16K
0.8154413713 0.0668911783 0.4630303771 0.3412720341 1 128 1 4 0.5 8 1 16
23% more
64K
1.004488549 0.078455944 0.495774489 1.142408636 1 512 4 4 2 8 1 16 17
32K
0.95916641708 0.05737028351 0.47078647474 0.78984622349 1 256 4 4 2 16 1 32
11/8/2007
19
Exercise 5.1d
From the Fig. 5.29, the current version of CACTI states that 16 KB 8-way set-associative caches with 64 byte blocks have an access time of 0.88 ns. This has the lowest miss rate for 16 KB caches, except for fully associative caches, which would have an access time greater than 0.90 ns.
11/8/2007
20
11/8/2007
21
Exercise 5.2a: AMAT Miss Penalty = 20 cycles; Miss rate (32KB L1 cache, 2-way, single-bank)= 0.0056101 AMAT = 0.0056101 x 20 + (1 - 0.0056101) = 1.106 cycles
Way-predicted cache (16KB, direct-mapped), 85% prediction accuracy, mispredicted penalty = 2 cycles: AMAT = 0.0056101 x 20 + + (0.85 x1 + (1 0.85) x 2) x (1 - 0.0056101) = 1.26
11/8/2007
22
Exercise 5.2b Using CACTI 4.2, for 16KB direct-mapped with 64B block size implemented in 90-nm process: Access time = 0.66318353 ns; Cycle time = 0.36661061 ns Total dynamic power = 0.033881430 W A 32KB 2-way set associative caches with 64B block size implemented in 90-nm process: Access time = 0.95916641708 ns Improvement in execution = 0. 9591 / 0.6631 = 1.446, or 44.6 % faster
11/8/2007
23
Exercise 5.2c: Way-prediction on a data cache Assumptions: Miss Penalty = 20 cycles; Miss rate (32KB L1 cache, 2-way, single-bank)= 0.0056101 Way-predicted data cache (16KB, direct-mapped), 85% prediction accuracy, mispredicted penalty = 15 cycles: AMAT = 0.0056101 x 20 + + (0.85 x1 + (1 0.85) x 15) x (1 - 0.0056101) = 3.19 Increase in: 3.19 1.26 = 1.93 ns
11/8/2007
24
Exercise 5.2d
Using CACTI 4.2, for 1MB 4-way with 64B block size, 144b read out, 1 bank, 1 read/write port, 30b tags implemented in 90-nm process:
90nm (Vdd=1.04869097076) Cache size: 1MB with 64B line Access time (ns) Total dynamic Read Power at max. freq. (W) Cycle Time (ns) Total area subbanked (mm^2) Nbank N of sets per bank Ndwl Ndbl Nspd Ntwl Ntbl Nspt 11/8/2007
37% increase in access time, 17% reduction in total dynamic read power
Normal 2.542393257 0.360252018 0.466345737 19.71918143 1 4096 32 8 4.5 4 32 4 Fast 1.715168589 0.611948165 0.513336315 27.28437726 1 4096 8 32 1.125 8 32 16 Serial
Using CACTI 4.2, for 64KB 2-way, 2 banks, with 64B block size implemented in 90-nm process: Access time = 0.958448597337 (ns) Cycle time = 0.47078647474 (ns) Total dynamic power = 0.114334683539 (W) Total area subbanked = 1.64216420153 (mm^2) Number of potential pipe stages = 0.958 / 0.471 = 2.03
11/8/2007
26
Assumptions: (from Fig. 5.29) Miss penalty = 40 cycles Miss rate (64KB L1 cache, 2-way, 1 bank)= 0.0036625 AMAT = 0.00367 x 40 + (1 - 0.00367) x 1 = 1.14 cycles If 20% of cache access pipe stages is added: AMAT = 0.00367 x 40 + (1 - 0.00367) x 1.2 = 1.34 cycles
11/8/2007
27
Exercise 5.3c Assumptions: 2 banks; a bank conflict causes 1 cycle delay A random distribution of addresses, a steady stream of accesses, each access has a 50% probability of conflicting with the previous access. Miss rate (similar to the one of 64KB L1 cache, 2-way, 1 bank) = 0.0036625 Miss penalty = 20 cycles AMAT = 0.00367 x 20 + + (0.5 x 1 + 0.5 x 2) x (1 - 0.00367) = 1.57 cycles
11/8/2007
28
11/8/2007
29
Exercise 5.4b: Early restart and critical word first It depends on: 1. The contribution to AMAT of the L1 and L2 cache misses 2. The percent reduction in miss service times provided by critical word first and early restart. In case if 2) is aproximately the same for both L1 and L2, then the AMAT contribution for L1 and L2 decides the importance of critical word first and early restart
11/8/2007
30