0% found this document useful (0 votes)
23 views18 pages

Lecture 11: Consistency Models: Topics: Sequential Consistency, HW and HW/SW Optimizations

Uploaded by

Bait Ullah Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views18 pages

Lecture 11: Consistency Models: Topics: Sequential Consistency, HW and HW/SW Optimizations

Uploaded by

Bait Ullah Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 18

Lecture 11: Consistency Models

• Topics: sequential consistency, hw and hw/sw optimizations

1
Coherence Vs. Consistency

• Recall that coherence guarantees (i) that a write will


eventually be seen by other processors, and (ii) write
serialization (all processors see writes to the same location
in the same order)

• The consistency model defines the ordering of writes and


reads to different memory locations – the hardware
guarantees a certain consistency model and the
programmer attempts to write correct programs with
those assumptions

2
Example Programs

Initially, A = B = 0 P1 P2
Data = 2000 while (Head == 0)
P1 P2 Head = 1 {}
A=1 B=1 … = Data
if (B == 0) if (A == 0)
critical section critical section

Initially, A = B = 0

P1 P2 P3
A=1
if (A == 1)
B=1
if (B == 1)
register = A
3
Sequential Consistency

P1 P2
Instr-a Instr-A
Instr-b Instr-B
Instr-c Instr-C
Instr-d Instr-D
… …

We assume:
• Within a program, program order is preserved
• Each instruction executes atomically
• Instructions from different threads can be interleaved arbitrarily

Valid executions:
abAcBCDdeE… or ABCDEFabGc… or abcAdBe… or
aAbBcCdDeE… or …..

4
Sequential Consistency

• Programmers assume SC; makes it much easier to


reason about program behavior

• Hardware innovations can disrupt the SC model

• For example, if we assume write buffers, or out-of-order


execution, or if we drop ACKS in the coherence protocol,
the previous programs yield unexpected outputs

5
Consistency Example - I

• Consider a multiprocessor with bus-based snooping cache


coherence and a write buffer between CPU and cache
Initially A = B = 0
P1 P2
A1 B1
… …
if (B == 0) if (A == 0)
Crit.Section Crit.Section

The programmer expected the


above code to implement a
lock – because of write
buffering, both processors
can enter the critical section

The consistency model lets the programmer know what assumptions


they can make about the hardware’s reordering capabilities 6
Consistency Example - 2

P1 P2
Data = 2000 while (Head == 0) { }
Head = 1 … = Data

Sequential consistency requires program order


-- the write to Data has to complete before the write to Head can begin
-- the read of Head has to complete before the read of Data can begin

7
Consistency Example - 3

Initially, A = B = 0

P1 P2 P3
A=1
if (A == 1)
B=1
if (B == 1)
register = A

Sequential consistency can be had if a process makes sure that


everyone has seen an update before that value is read – else,
write atomicity is violated

8
Sequential Consistency

• A multiprocessor is sequentially consistent if the result


of the execution is achieveable by maintaining program
order within a processor and interleaving accesses by
different processors in an arbitrary fashion

• The multiprocessors in the previous examples are not


sequentially consistent

• Can implement sequential consistency by requiring the


following: program order, write serialization, everyone has
seen an update before a value is read – very intuitive for
the programmer, but extremely slow

9
HW Performance Optimizations

• Program order is a major constraint – the following try to


get around this constraint without violating seq. consistency
 if a write has been stalled, prefetch the block in
exclusive state to reduce traffic when the write happens
 allow out-of-order reads with the facility to rollback
if the ROB detects a violation (detected by re-executing
the read later)

10
Relaxed Consistency Models (HW/SW)

• We want an intuitive programming model (such as


sequential consistency) and we want high performance

• We care about data races and re-ordering constraints for


some parts of the program and not for others – hence,
we will relax some of the constraints for sequential
consistency for most of the program, but enforce them
for specific portions of the code

• Fence instructions are special instructions that require


all previous memory accesses to complete before
proceeding (sequential consistency)
11
Fences
P1 P2
{ {
Region of code Region of code
with no races with no races
} }

Fence Fence
Acquire_lock Acquire_lock
Fence Fence

{ {
Racy code Racy code
} }

Fence Fence
Release_lock Release_lock
Fence Fence 12
Potential Relaxations

• Program Order: (all refer to different memory locations)


 Write to Read program order
 Write to Write program order
 Read to Read and Read to Write program orders

• Write Atomicity: (refers to same memory location)


 Read others’ write early

• Write Atomicity and Program Order:


 Read own write early

13
Relaxations

Relaxation WR W  W R RW Rd others’ Wr Rd own Wr


Order Order Order early early
IBM 370 X

TSO X X

PC X X X

SC X

 IBM 370: a read can complete before an earlier write to a different address, but a
read cannot return the value of a write unless all processors have seen the write
 SPARC V8 Total Store Ordering (TSO): a read can complete before an earlier
write to a different address, but a read cannot return the value of a write by another
processor unless all processors have seen the write (it returns the value of own
write before others see it)
 Processor Consistency (PC): a read can complete before an earlier write (by any
processor to any memory location) has been made visible to all 14
Performance Comparison

• Taken from Gharachorloo, Gupta, Hennessy, ASPLOS’91

• Studies three benchmark programs and three different


architectures:

 MP3D: 3-D particle simulator


 LU: LU-decomposition for dense matrices
 PTHOR: logic simulator

 LFC: aggressive; lockup-free caches, write buffer with


bypassing
 RDBYP: only write buffer with bypassing
 BASIC: no write buffer, no lockup-free caches
15
Performance Comparison

16
Summary

• Sequential Consistency restricts performance (even more


when memory and network latencies increase relative to
processor speeds)

• Relaxed memory models relax different combinations of


the five constraints for SC

• Most commercial systems are not sequentially consistent


and rely on the programmer to insert appropriate fence
instructions to provide the illusion of SC

17
Title

• Bullet

18

You might also like