0% found this document useful (0 votes)
12 views

Lec 4b

Uploaded by

medo.losy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Lec 4b

Uploaded by

medo.losy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

COMPUTER ORGANIZATION AND DESIGN

5th
Edition
The Hardware/Software Interface

Chapter 5
Large and Fast:
Exploiting Memory
Hierarchy (cont.)
Caching Example Block size = 16 bytes,
4 blocks in cache.
Request 0 164 83 192 10 90 175 673 168 59
(byte addr. in (00101
decimal) 0,0100)

Block addr. 000000 001010 000101 001100 000000 000101 001010 101010 001010 000011
(binary) 0 10 5 12 0 5 10 42 10 3

Index 00 10 01 00 00 01 10 10 10 11
(direct-map)
Cache 0000 0000 0000 0011 0000 0000 0000 0000 0000 0000
Set 0

Cache - - 0001 0001 0001 0001 0001 0001 0001 0001


Set 1
Cache - 0010 0010 0010 0010 0010 0010 1010 0010 0010
Set 2
Cache - - - - - - - - - 0000
Set 3
Hit/Miss M M M M M H H M M M

Miss type CM CM CM CM CF - - CM CF CM

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 2


Block size = 16 bytes,
Caching Example 4 blocks in cache.

Request 0 164 83 192 10 90 175 673 168 59


(byte addr. in (00101
decimal) 0,0100)

Block addr. 000000 001010 000101 001100 000000 000101 001010 101010 001010 000011
(binary) 0 10 5 12 0 5 10 42 10 3

Index 0 0 1 0 0 1 0 0 0 1
(2-way cache)

Cache 00000 00101 00101 00110 00000 00000 00101 10101 00101 00101
Set 0 - 00000 00000 00101 00110 00110 00000 00101 10101 10101

Cache - - 00010 00010 00010 00010 00010 00010 00010 00001


Set 1 - - - - - - - 00010

Hit/Miss M M M M M H M M H M

Miss type CM CM CM CM CF - CF CM - CM

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 3


number keys: Instructions

Radix sort
Quick (Instr/key)
800
Radix (Instr/key)
700

600

500

400

300

200 Quick
sort Instructions/key
100

0
1000 10000 100000 1000000 1E+07

Job size in keys


number keys: Instrs & Time

Radix sort
Quick (Instr/key)
800
Radix (Instr/key)
700 Quick (Clocks/key)
600 Radix (clocks/key)
Time
500

400

300
Quick
200
sort
100
Instructions

0
1000 10000 100000 1000000 1E+07

Job size in keys


number keys: Cache misses
5 Quick(miss/key)
Radix sort Radix(miss/key)
4

3
Cache misses
2

1
Quick
0 sort
1000 10000 100000 1000000 10000000

Job size in keys


Interactions with Software
 Misses depend on Inst./item

memory access
patterns
 Algorithm behavior Clock cycles/item
 Compiler

optimization for
memory access
Cache miss/item
More misses

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 7


Naïve Matrix Multiply
Number of slow memory references on unblocked matrix
multiply
m = n3 to read each column of B n times
+ n2 to read each row of A once
+ n2 to read and write each element of C once
= n3 + 2n2

C(i,j) C(i,j) A(i,:)


B(:,j)
= + *

Lec20.8
Blocked Matrix Multiply
Consider A,B,C to be N-by-N matrices of b-by-b subblocks where
b=n / N is called the block size
for i = 1 to N
for j = 1 to N
{read block C(i,j) into fast memory}
for k = 1 to N
{read block A(i,k) into fast memory}
{read block B(k,j) into fast memory}
C(i,j) = C(i,j) + A(i,k) * B(k,j) {do a matrix multiply on
blocks} {write block C(i,j) back to slow memory}

C(i,j) C(i,j) A(i,k)

= + * B(k,j)

9
Lec20.9
Blocked Matrix Multiply
m is amount memory traffic between slow and fast memory
matrix has nxn elements, and NxN blocks each of size bxb

m = N*n2 B: N2 blocks of size b2 are read N times (N3 * b2 = N3 * (n/N)2 = N*n2)


+ N*n2 A: same as B
+ n2 read and write each block of C once
= (2N + 1) * n2 =o(n3/b)

So we can improve performance by increasing the blocksize b

01/19/2012 CS267 - Lecture 2


Blocked Matrix Multiply
Only this portion in cache

Unoptimized Blocked

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 11

You might also like