Concurrency Models
Concurrency Models
Of course distributed systems have the extra challenge that the network
may fail, or a remote computer or process is down etc. But a concurrent
system running on a big server may experience similar problems if a CPU
fails, a network card fails, a disk fails etc. The probability of failure may be
lower, but it can theoretically still happen.
Parallel Workers
The first concurrency model is what I call the parallel worker model.
Incoming jobs are assigned to different workers. Here is a diagram
illustrating the parallel worker concurrency model:
If the parallel worker model was implemented in a car factory, each car
would be produced by one worker. The worker would get the specification
of the car to build, and would build everything from start to end.
The parallel worker concurrency model is the most commonly used
concurrency model in Java applications (although that is changing). Many
of the concurrency utilities in the java.util.concurrent Java package are
designed for use with this model. You can also see traces of this model in
the design of the Java Enterprise Edition application servers.
For instance, if you were implementing a web crawler, you could crawl a
certain amount of pages with different numbers of workers and see which
number gives the shortest total crawl time (meaning the highest
performance). Since web crawling is an IO intensive job you will probably
end up with a few threads per CPU / core in your computer. One thread per
CPU would be too little, since it would be idle a lot of the time while waiting
for data to download.
In reality the parallel worker concurrency model is a bit more complex than
illustrated above. The shared workers often need access to some kind of
shared data, either in memory or in a shared database. The following
diagram shows how this complicates the parallel worker concurrency
model:
Some of this shared state is in communication mechanisms like job
queues. But some of this shared state is business data, data caches,
connection pools to the database etc.
As soon as shared state sneaks into the parallel worker concurrency model
it starts getting complicated. The threads need to access the shared data in
a way that makes sure that changes by one thread are visible to the others
(pushed to main memory and not just stuck in the CPU cache of the CPU
executing the thread). Threads need to avoid race conditions, deadlock and
many other shared state concurrency problems.
Additionally, part of the parallelization is lost when threads are waiting for
each other when accessing the shared data structures. Many concurrent
data structures are blocking, meaning one or a limited set of threads can
access them at any given time. This may lead to contention on these
shared data structures. High contention will essentially lead to a degree of
serialization of execution of the part of the code that access the shared
data structures.
For instance, a persistent list will add all new elements to the head of the
list, and return a reference to the newly added element (which then point to
the rest of the list). All other threads still keep a reference to the previously
first element in the list, and to these threads the list appear unchanged.
They cannot see the newly added element.
Stateless Workers
Re-reading data every time you need it can get slow. Especially if the state
is stored in an external database.
Another disadvantage of the parallel worker model is that the job execution
order is nondeterministic. There is no way to guarantee which jobs are
executed first or last. Job A may be given to a worker before job B, yet job
B may be executed before job A.
Assembly Line
The second concurrency model is what I call the assembly line concurrency
model. I chose that name just to fit with the "parallel worker" metaphor from
earlier. Other developers use other names (e.g. reactive systems, or event
driven systems) depending on the platform / community. Here is a diagram
illustrating the assembly line concurrency model:
The workers are organized like workers at an assembly line in a factory.
Each worker only performs a part of the full job. When that part is finished
the worker forwards the job to the next worker.
Each worker is running in its own thread, and shares no state with other
workers. This is also sometimes referred to as a shared
nothing concurrency model.
Systems using the assembly line concurrency model are usually designed
to use non-blocking IO. Non-blocking IO means that when a worker starts
an IO operation (e.g. reading a file or data from a network connection) the
worker does not wait for the IO call to finish. IO operations are slow, so
waiting for IO operations to complete is a waste of CPU time. The CPU
could be doing something else in the meanwhile. When the IO operation
finishes, the result of the IO operation ( e.g. data read or status of data
written) is passed on to another worker.
In reality, the jobs may not flow along a single assembly line. Since most
systems can perform more than one job, jobs flows from worker to worker
depending on the job that needs to be done. In reality there could be
multiple different virtual assembly lines going on at the same time. This is
how job flow through assembly line system might look in reality:
Jobs may even be forwarded to more than one worker for concurrent
processing. For instance, a job may be forwarded to both a job executor
and a job logger. This diagram illustrates how all three assembly lines finish
off by forwarding their jobs to the same worker (the last worker in the
middle assembly line):
The assembly lines can get even more complex than this.
Vert.x
Akka
Node.JS (JavaScript)
Actors and channels are two similar examples of assembly line (or
reactive / event driven) models.
In the actor model each worker is called an actor. Actors can send
messages directly to each other. Messages are sent and processed
asynchronously. Actors can be used to implement one or more job
processing assembly lines, as described earlier. Here is a diagram
illustrating the actor model:
In the channel model, workers do not communicate directly with each other.
Instead they publish their messages (events) on different channels. Other
workers can then listen for messages on these channels without the sender
knowing who is listening. Here is a diagram illustrating the channel model:
At the time of writing, the channel model seems more flexible to me. A
worker does not need to know about what workers will process the job later
in the assembly line. It just needs to know what channel to forward the job
to (or send the message to etc.). Listeners on channels can subscribe and
unsubscribe without affecting the workers writing to the channels. This
allows for a somewhat looser coupling between workers.
No Shared State
The fact that workers share no state with other workers means that they
can be implemented without having to think about all the concurrency
problems that may arise from concurrent access to shared state. This
makes it much easier to implement workers. You implement a worker as if
it was the only thread performing that work - essentially a singlethreaded
implementation.
Stateful Workers
Since workers know that no other threads modify their data, the workers
can be stateful. By stateful I mean that they can keep the data they need to
operate in memory, only writing changes back the eventual external
storage systems. A stateful worker can therefore often be faster than a
stateless worker.
Singlethreaded code has the advantage that it often conforms better with
how the underlying hardware works. First of all, you can usually create
more optimized data structures and algorithms when you can assume the
code is executed in single threaded mode.
It may also be harder to write the code. Worker code is sometimes written
as callback handlers. Having code with many nested callback handlers may
result in what some developer call callback hell. Callback hell simply means
that it gets hard to track what the code is really doing across all the
callbacks, as well as making sure that each callback has access to the data
it needs.
With the parallel worker concurrency model this tends to be easier. You
can open the worker code and read the code executed pretty much from
start to finish. Of course parallel worker code may also be spread over
many different classes, but the execution sequence is often easier to read
from the code.
Functional Parallelism
Functional parallelism is a third concurrency model which is being talked
about a lot these days (2015).
The basic idea of functional parallelism is that you implement your program
using function calls. Functions can be seen as "agents" or "actors" that
send messages to each other, just like in the assembly line concurrency
model (AKA reactive or event driven systems). When one function calls
another, that is similar to sending a message.
All parameters passed to the function are copied, so no entity outside the
receiving function can manipulate the data. This copying is essential to
avoiding race conditions on the shared data. This makes the function
execution similar to an atomic operation. Each function call can be
executed independently of any other function call.
When each function call can be executed independently, each function call
can be executed on separate CPUs. That means, that an algorithm
implemented functionally can be executed in parallel, on multiple CPUs.
The hard part about functional parallelism is knowing which function calls to
parallelize. Coordinating function calls across CPUs comes with an
overhead. The unit of work completed by a function needs to be of a certain
size to be worth this overhead. If the function calls are very small,
attempting to parallelize them may actually be slower than a
singlethreaded, single CPU execution.
Additionally, splitting a task over multiple CPUs with the overhead the
coordination of that incurs, only makes sense if that task is currently the
only task being executed by the the program. However, if the system is
concurrently executing multiple other tasks (like e.g. web servers, database
servers and many other systems do), there is no point in trying to
parallelize a single task. The other CPUs in the computer are anyways
going to be busy working on other tasks, so there is not reason to try to
disturb them with a slower, functionally parallel task. You are most likely
better off with an assembly line (reactive) concurrency model, because it
has less overhead (executes sequentially in singlethreaded mode) and
conforms better with how the underlying hardware works.
As is often the case, the answer is that it depends on what your system is
supposed to do. If your jobs are naturally parallel, independent and with no
shared state necessary, you might be able to implement your system using
the parallel worker model.
Many jobs are not naturally parallel and independent though. For these
kinds of systems I believe the assembly line concurrency model has more
advantages than disadvantages, and more advantages than the parallel
worker model.
You don't even have to code all that assembly line infrastructure yourself.
Modern platforms like Vert.xhas implemented a lot of that for you.
Personally I will be exploring designs running on top of platforms like Vert.x
for my next projects. Java EE just doesn't have the edge anymore, I feel.
Same-threading
Why Single-threaded Systems?
Same-threading, Single-threading Scaled Out
o One Thread Per CPU
No Shared State
Load Distribution
o Single-threaded Microservices
o Services With Sharded Data
Thread Communication
Simpler Concurrency Model
Illustrations
Jakob Jenkov
Last update: 2016-05-02
The lack of shared state is what makes each thread behave as it if was a
single-threaded system. However, since a same-threaded system can
contain more than a single thread, so it is not really a "single-threaded
system". In lack of a better name, I found it more precise to call such a
system a same-threaded system, rather than a "multi-threaded system with
a single-threaded design". Same-threaded is easier to say, and easier to
understand.
Same-threaded basically means that data processing stays within the same
thread, and that no threads in a same-threaded system share data
concurrently.
Load Distribution
Obviously, a same-threaded system needs to share the work load between
the single-threaded instances running. If not, only a single instance will get
any work, and the system would in effect be single-threaded.
Exactly how you distribute the load over the different instances depend on
the design of your system. I will cover a few in the following sections.
Single-threaded Microservices
If your system does actually need to share data, or at least a database, you
may be able to shard the database. Sharding means that the data is
divided among multiple databases. The data is typically divided so that all
data related to each other is located together in the same database. For
instance, all data belonging to some "owner" entity will be inserted into the
same database. Sharding is out of the scope of this tutorial, though, so you
will have to search for tutorials about that topic.
Thread Communication
If the threads in a same-threaded need to communicate, they do so by
message passing. A thread that wants to send a message to thread A can
do so by generating a message (a byte sequence). Thread B can then copy
that message (byte sequence) and read it. By copying the message thread
B makes sure that thread A cannot modify the message while thread B
reads it. Once it is copied it is immutable for thread A.
Illustrations
Here are illustrations of a single-threaded, multi-threaded and same-
threaded system, so you can easier get an overview of the difference
between them.
The short answer is "no". They are not the same terms, although they
appear quite similar on the surface. It also took me some time to finally find
and understand the difference between concurrency and parallelism.
Therefore I decided to add a text about concurrency vs. parallelism to this
Java concurrency tutorial.
Concurrency
Concurrency means that an application is making progress on more than
one task at the same time (concurrently). Well, if the computer only has
one CPU the application may not make progress on more than one task
at exactly the same time, but more than one task is being processed at a
time inside the application. It does not completely finish one task before it
begins the next.
Parallelism
Parallelism means that an application splits its tasks up into smaller
subtasks which can be processed in parallel, for instance on multiple CPUs
at the exact same time.
Concurrency vs. Parallelism In Detail
As you can see, concurrency is related to how an application handles
multiple tasks it works on. An application may process one task at at time
(sequentially) or work on multiple tasks at the same time (concurrently).
As you can see, an application can be concurrent, but not parallel. This
means that it processes more than one task at the same time, but the tasks
are not broken down into subtasks.
An application can also be parallel but not concurrent. This means that the
application only works on one task at a time, and this task is broken down
into subtasks which can be processed in parallel.