22 Monads STM
22 Monads STM
Optional Reading:
“Beautiful Concurrency”,
“The Transactional Memory / Garbage Collection Analogy”
“A Tutorial on Parallel and Concurrent Programming in Haskell”
Atomic blocks
are pieces of
code that you Library
can count on to Atomic blocks are
operate exactly much easier to
like sequential Library use, and do
programs Library compose
Library
Atomic blocks
Hardware
Tricky gaps, so a
little harder than
immutable data but
you can do more
stuff
Difficulty of queue
Coding style
implementation
Sequential code Undergraduate (COS 226)
Efficient parallel code
Publishable result at
with locks and
international conference1
condition variables
read x read x
write x write x
read x read x with transactions:
write x write x
read x read x
write x write x
read x read x
without transactions: write x write x
or
read x read x
read x write x write x
read x
read x read x
write x write x write x
write x
read x
read x (programmer gets to cut down non-determinism
as much as he/she wants)
write x
write x
type ‘a M
+ some equations specifying
val return : ‘a -> ‘a M how return and bind are
required to interact
val (>>=) : ‘a M -> (‘a -> ‘b M) -> ‘b M
end
end
put value in
a container
take value v out
of a container c
and then apply f,
producing a new container
module type MONAD = sig module OptionMonad = struct
let record f x s = (f x, s)
nothing logged
let do x = concatenate the yet
record read x “read it” >>= (fun v ->
log of c with
record write v “wrote it” >>= (fun _ ->
record write v “wrote it again” >>= (fun _ -> the log produced
return v by running f
;;
Just like one expects any CONTAINER to behave in a particular way, one
has expectations of MONADs.
let (>>=) c f =
let (v, s) = c in (fun x -> return x) 3
let (v’,s’) = f v in == return 3
(v’, s ^ s’) == (3, “start”)
end
What are the consequences of breaking the law?
Well, if you told your friend you’ve implemented a monad and they can use it
in your code, they will expect that they can rewrite their code using equations
like this one:
return x >>= f == f x
If you tell your friend you’ve implemented the monad interface but none of the
monad laws hold your friend will probably say: Ok, tell me what your
functions do then and please stop using the word monad because it is
confusing. It is like you are claiming to have implemented the QUEUE
interface but insert and remove are First-In, First-Out like a stack.
In Haskell or Fsharp or Scala, breaking the monad laws may have more
severe consequences, because the compiler actually uses those laws to do
some transformations of your code.
module type MONAD = sig
type ‘a M
return : ‘a -> ‘a M
(>>=) : ‘a M -> (‘a -> ‘b M) -> ‘b M
end
suspended (lazy)
computation
<code> :: IO int that performs effects
when executed
foo :: int -> int totally pure function
suspended (lazy)
computation
<code> :: IO int that performs effects
when executed
suspended (lazy)
computation
<code> :: IO int that performs effects
when executed
all effects in Haskell are treated as a kind of book keeping IO is the catch-all monad
print :: string -> IO ()
the “IO monad”
-- contains effectful computations
like printing
r :: Ref int
(read r) + 3 :: int
Doesn’t type
check
read :: Ref a -> IO a
r :: Ref int
r :: Ref int
do
x <- read r
return (x + 3)
Prettier!!
new :: a -> IO (Ref a)
read :: Ref a -> IO a
write :: Ref a -> a -> IO ()
main = do
id <- fork action1
action2
...
action 1 and
action 2 in
parallel
Idea: add a function atomic that guarantees atomic
execution of a suspended (effectful) computation
main = do
id <- fork (atomic action1)
atomic action2
...
action 1 and
action 2
atomic
and parallel
main = do
id <- fork (atomic action1)
atomic action2
...
action 1: action 2:
read x read x
write x write x
read x read x with transactions:
write x write x
read x read x
write x write x
read x read x
without transactions: write x write x
or
read x read x
read x write x write x
read x
read x read x
write x write x write x
write x
read x
read x (programmer gets to cut down non-determinism
as much as he/she wants)
write x
write x
TVar a == ‘a ref
Haskell OCaml
inc r = do
v <- read r
write r (v+1)
main = do
r <- atomic (new 0)
fork (atomic (inc r))
atomic (inc r);
-- inc adds 1 to the mutable reference r
inc :: TVar Int -> STM ()
inc r = do
v <- read r
write r (v+1)
main = do
r <- atomic (new 0)
fork (atomic (inc r))
atomic (inc r);
waits for:
a1 balance > 3
and a2 balance > 7
without any change to withdraw function.
Suppose we want to transfer 3 dollars from
either account a1 or a2 into account b.
atomic (
do
(withdraw a1 3) `orElse` (withdraw a2 3)
deposit b 3
)
then afterward, do this
transfer a1 a2 b =
do
withdraw a1 3 `orElse` withdraw a2 3
deposit b 3
- if you want to see more math like the kind underpinning the semantics of STM,
check out COS 510
This is just an intro. There’s way more to learn. Have fun with FP.
- Take the next step: http://lambda-the-ultimate.org/
A complete, multiprocessor implementation of
STM exists as of GHC 6.
Experience to date: even for the most
mutation-intensive program, the Haskell STM
implementation is as fast as the previous MVar
implementation.
- The MVar version paid heavy costs for (usually
unused) exception handlers.
Need more experience using STM in practice,
though!
You can play with it. See the course website.
At first, atomic blocks look insanely expensive.
A naive implementation (c.f. databases):
- Every load and store instruction logs information into
a thread-local log.
- A store instruction writes the log only.
- A load instruction consults the log first.
- Validate the log at the end of the block.
If succeeds, atomically commit to shared memory.
If fails, restart the transaction.
Fine-grained Traditional STM
locking (2.57x) (5.69x)
Normalised execution time
Coarse-grained
locking (1.13x)
Sequential
baseline (1.00x)
Workload: operations on
a red-black tree, 1
thread, 6:1:1
lookup:insert:delete mix
with keys 0..65535
Coarse-grained Direct-update
locking (1.13x) STM (2.04x)
Workload: operations on
a red-black tree, 1
thread, 6:1:1
lookup:insert:delete mix
Scalable to multicore with keys 0..65535
Coarse-grained locking
Fine-grained locking
Microseconds per operation
Traditional STM
Direct-update STM +
compiler integration
#threads
Naïve STM implementation is hopelessly inefficient.
There is a lot of research going on in the compiler and
architecture communities to optimize STM.
This work typically assumes transactions are smallish
and have low contention. If these assumptions are
wrong, performance can degrade drastically.
We need more experience with “real” workloads and
various optimizations before we will be able to say for
sure that we can implement STM sufficiently efficiently
to be useful.
Consider the following program:
Initially, x = y = 0
Thread 1 Thread 2
// atomic { //A0 atomic { //A3
atomic { x = 1; } //A1 if (x==0) abort;
atomic { if (y==0) abort; } //A2 y = 1;
//} }
Thread 1
Thread 2
Thread 3
In languages like ML or Java, the fact that the language
is in the IO monad is baked in to the language. There is
no need to mark anything in the type system because IO
is everywhere.
In Haskell, the programmer can choose when to live in
the IO monad and when to live in the realm of pure
functional programming.
Interesting perspective: It is not Haskell that lacks
imperative features, but rather the other languages that
lack the ability to have a statically distinguishable pure
subset.
This separation facilitates concurrent programming.
Useful Arbitrary effects
No effects
Useless
Dangerous Safe
Plan A
(everyone else)
Plan B
(Haskell)
No effects
Useless
Dangerous Safe
Arbitrary effects
Examples
Default = Any effect
Plan = Add restrictions
Regions
Ownership types
Vault, Spec#, Cyclone
Default = No effects
Plan = Selectively permit effects
Envy
Plan B
(Haskell)
No effects
Useless
Dangerous Safe
Plan A
(everyone else)
No effects
Useless
Dangerous Safe
One of Haskell’s most significant
contributions is to take purity seriously, and
relentlessly pursue Plan B.