Implementing The Clean Architecture: Sebastian Buczyński
Implementing The Clean Architecture: Sebastian Buczyński
IMPLEMENTING
THE CLEAN ARCHITECTURE
Foreword........................................................................................................5
Disclaimer ..............................................................................................................20
Control flow in the Clean Architecture ......................................................................20
Business requirements ..............................................................................................22
Implementation .......................................................................................................23
Chapter summary ...................................................................................................30
The Clean Architecture modifications ............................................................32
Introduction ............................................................................................................50
What does it have to do with the Clean Architecture?................................................52
Separate read stack - why? ......................................................................................52
Separate read stack - how? .....................................................................................53
CQRS vs REST API ..................................................................................................57
CQRS vs GraphQL ..................................................................................................58
Chapter summary ...................................................................................................58
Sharp boundary ...........................................................................................59
At the same time, I came across some limitations which I had to overcome. Sometimes the
cure was to use other technique (such as CQRS - Command Query Responsibility Segregation),
sometimes it would be better not to use the Clean Architecture at all.
In short, this book was conceived to share all experience me and my colleagues got during
the implementation of the Clean Architecture.
TOOLS-DRIVEN ERA
World of Python is a magical, enchanting place. Imagine you are to write some boilerplate
code needed to implement an actual feature. Virtually every time you are about to fall under
such an evil spell, you can break it by casting a counter-spell:
Ease of using this command combined with libraries profusion enables everyone, including
apprentices of sorcery, to solve seemingly complex problems with a little mana expense.
Nowadays, wizardry called software development can be picked up and practised almost
effortlessly without knowing its arcana, though nature of the magic itself has not changed at
all. This creates an illusion that knowledge about principles and patterns is no longer
needed. Although entry point is lowered, deluded sorcery apprentices are far from being
enlightened.
Literally every tool python developers use daily is an implementation of long-know (more
than decades) and an extensively described pattern of some sort. Django ORM? It is an
example of Active Record pattern implementation, widely known thanks to Ruby On Rails
which follows the same pattern. For example, it was described in Martin Fowler's Patterns of
Enterprise Application Architecture using these words:
"It’s easy to build Active Records, and they are easy to
understand. Their primary problem is that they work well
only if the Active Record objects correspond directly to the
database tables: an isomorphic schema. (...) Another
argument against Active Record is the fact that it couples
the object design to the database design. This makes it
more difficult to refactor either design as a project goes
forward.”
What about something more sophisticated, like SQLAlchemy's session? It turns out the
pattern behind it is called Unit of Work and is described in the same book. Suddenly an
impression of magic powering PyPi packages fades away to eventually vanish. Such
knowledge is an invaluable help to choose the right tools for the job. At the same time, a
tool which solves your most acute problem will cause several lesser ones, yet those ones you
can live with. For example, SQLAlchemy's session forces a developer to register any newly
created model using add method. Without it, no data will be persisted upon commit. Is the
necessity for manual models management worth the trouble, or maybe Django ORM is just
ok for this particular project?
The most effective cure for indecisiveness is to stay pragmatic and flexible. In fact, this is
what this book is truly about. Even though it explains an exciting approach which highlights
the importance of business concerns, it does not conveniently omit drawbacks.
Whenever a new project is started, developers should ask themselves: which approach/
framework/library should they use? I may ask in return: What problems would you prefer to
have? What issues you have to avoid?
WHO IS THIS BOOK FOR?
This book is aimed at intermediate-/senior-level software developers who wish to broaden
their knowledge with various software engineering techniques that emerged over the last
several years.
Almost all code examples are written in Python, so the reader's acquaintance with its syntax
will be helpful. Luckily, software engineering is mostly technology-agnostic discipline, so
even if readers do not write code in Python for a living, they still might use this book to
learn something new. All code snippets were written using Python 3.7 and later modernized
to Python 3.8.
W H AT W I L L YO U F I N D I N T H I S
BOOK?
“In theor y, theor y and practice are the same. In practice,
they are not."
This book is mainly to provide tons of practical advice on implementing the Clean
Architecture. Everything is based on my experiences, learnt the hard way. Sometimes it was
immediately apparent that a certain solution is a bad idea. Sometimes we needed a laborious
refactoring weeks later to undo bad design. Few times we have not had an opportunity to
improve something that desperately needed it.
With trial and error, we found out how we can evolve our software using more and more
sophisticated techniques, like CQRS, Event Sourcing or Domain-Driven Design. The
greatest thing was the ability to pick one them up whenever we actually needed them,
without investing a lot of time and effort in a big design upfront or having to rewrite
everything from scratch.
Implementing the Clean Architecture is a bit like a buffet - a reader is encouraged to get out of it
whatever seems to suit best their need and mood. It makes no sense to follow every rule &
recommendation rigorously if a simpler approach would suffice.
THE CLEAN ARCHITECTURE BASICS
WHAT IS IT ALL FOR?
IT is an industry which changes rapidly all the time. New languages and frameworks emerge
daily, just to be forgotten several years later. Solutions that once were popular become an
enormous technical debt soon after the last contributor abandoned the project. On the other
hand, there are few successful, long-living projects which are continuously maintained and
developed. Although we get new features and security updates regularly, it still requires
some effort to keep up with the newest versions of your favourite web framework.
This task becomes cumbersome if the business logic of a project is tightly coupled to a
framework. Every backwards-incompatible update in the framework's codebase breaks
something in the actual application. Such a situation is inconvenient for both maintainers
and users of the framework. The former group is under constant pressure not to break
anything with a new release. Just imagine how discouraging the situation is.
Some applications are pretty straightforward. All they need to do is just fetch some data
from a database, modify it and save back. A common name for a database browser is CRUD
(Create Read Update Delete). Adding REST API increases complexity only a bit. Using Django
for such a project is one of the best choices one may make in the Python world.
The situation becomes a lot trickier when we deal with more complex domains. They are
actually pretty easy to recognize. One of the symptoms might be a vast number of checks to
conduct. Invariants spanning multiple objects are even more interesting. Say, we are to build
a new project where people can bid on auctions. An auction can have 0, 1 or multiple
winners at the same time. An auction has an end time after which no one can bid.
If we were to use CRUD approach a'la Django/RoR, then most likely we would end up with
separate models for Auction and Bid:
class Auction(models.Model):
title = models.CharField(max_length=255)
starting_price = MoneyField(
max_digits=19,
decimal_places=4,
default_currency="USD",
)
current_price = MoneyField(
max_digits=19,
decimal_places=4,
default_currency="USD",
)
class Bid(models.Model):
price = MoneyField(
max_digits=19,
decimal_places=4,
default_currency="USD",
)
bidder = models.ForeignKey(
get_user_model(), on_delete=models.PROTECT
)
auction = models.ForeignKey(
Auction,
related_name="bids",
on_delete=models.CASCADE,
)
The problem is that in terms of a bidding process, these two are strongly connected. We can
not just save a new Bid to a database whenever someone clicks a Bid! button. New Bid has to
be checked against Auction. Has not the latter just ended? If the new Bid is the highest one,
then we have to set it as a winning one for the Auction. At the same time, we have to change
the current price of Auction. Previously winning Bid is now considered a losing one. As you
can see, these two, seemingly distinct entities, can not be treated independently. In other
words, there are invariants in the domain that span both Auction and Bid. It does not make
any sense to reason about them separately, at least not in the bidding process.
This example was not too complicated. Yet code that enforces these business rules does not
fit into any of building blocks included in Django (or any other web framework, to be fair).
Invariants span beyond a single model. At the same time it is hard to imagine putting them
in a view (a function or class handling single HTTP request). Although there are no physical
obstacles, it just does not feel right.
Another issue with code coupled to a framework is visible when one tries to test it - one is
not able to test business logic without involving heavy machinery. Initializing the whole
thing, inserting rows to the database, executing web framework code (e.g. for URL
routing), cleaning DB afterwards - it all takes time. As a matter of fact, time is the only cost
of running tests. If executing test suite takes ages, then it will not be run too often. As it
happens, complex domains have multiple cases to be checked. If one wants to cover them
all, they are stuck with long execution time of a test suite.
The last category of things that can give a headache - integrations with 3rd party services.
Using as many external services as possible is a trendy approach these days. 3rd party
services can not be avoided for many seemingly simple projects. The first example that
comes to mind - e-commerce. Customers have to pay for their shoppings, so making friends
with some payments platform is a worthwhile idea. Such platforms income comes from
charged fees. Now imagine you are to replace one integration with another because different
payments platform is slightly cheaper. How many orders must be placed to compensate for
development time? Reckless, naive integration will tightly couple payment processes within
the application in the same way as web frameworks do. Therefore it will be hard to change.
So far only problems were described, without proposing any solution. All of them can be
addressed with elegance and style. This is where the Clean Architecture comes in. Simply
saying, it is an approach to software architecture that gives special treatment to business
rules. It is unacceptable for a framework, database or 3rd party service to leak to and poison
business logic. Correctly applied, the Clean Architecture gives us the following:
• testability - all business rules can be tested using unit tests, without inserting anything
into a database
• independence of any 3rd party - business rules do not need to know which payments
platform you are using
• flexibility - certain architectural decisions can be delayed without stopping development
• extensibility - projects can be easily extended with more sophisticated techniques like
CQRS, Event Sourcing or Domain Driven Design if needed
These are benefits of a strict separation of concerns, arranging codebase into clearly
separated layers and applying the Dependency Rule between them.
CODE ORGANIZATION - HORIZONTAL SLICING
In a basic form of the Clean Architecture, there are four layers. Naturally, one can use more
if it is justified.
EXTERNAL WORLD
The outermost one, External world, represents all services and code that project uses, but it
does not belong to the same code base. Simply saying, this layer encompasses everything
that was implemented outside the project.
INFRASTRUCTURE
The second layer is called Infrastructure. It contains all the code needed for the project to use
goodies from External World. For example, if we use MariaDB for our primary data store,
classes and functions responsible for communication with MariaDB will be sitting in the
Infrastructure layer. The same is true for any 3rd party service we have to integrate with. For
example, if we are building an e-commerce solution, we are going to place here classes
implementing integration with payment providers. Kinds of integrations depend on the type
of the project.
APPLIC ATION
The third layer is for application-specific business rules. Therein lies code that specifies
what a project actually does. Application layer is a home for Use Cases (also known as
Interactors). Use Case is a single operation within the project that leads to changing the state
of the system, assuming everything goes right. Using auctioning example, we could have a
Use Case for placing a bid and another one for withdrawing a bid. If we were building an e-
commerce solution, we could have one Use Case for adding an item to a cart and another for
removing an item from the cart. Use Case represents a single action of a user (or another
actor) that is significant from the business point of view. If you are familiar with Scrum,
these can be more or less translated into user stories.
The second kind of building blocks which will always reside in this layer is an Interface (also
known as Port). These are abstractions over anything that sits in the layer above -
Infrastructure and is required by at least one Use Case. In Python, this can be implemented
using abstract base classes (abc) module.
# application/interfaces/email_sender.py
import abc
class EmailSender(abc.ABC):
@abc.abstractmethod
def send(self, message: EmailMessage) "-> None:
pass
# infrastructure/adapters/email_sender.py
import smtplib
class LocalhostEmailSender(EmailSender):
def send(self, message: EmailMessage) "-> None:
server = smtplib.SMTP("localhost", 1025)
# etc.
These code snippets show a relation between Interfaces from Application layer and their
adapters from Infrastructure. What is important - a Use Case MUST NOT be aware whether
we are using LocalhostEmailSender or any other class inheriting from EmailSender. More on
this later. To sum up, Application layers contains code for all actions and defines interfaces
for the external world to execute actions’ logic.
DOMAIN
This layer is a place for all business rules that have to be enforced regardless of a context in
which they were used. Basic building block to use here is called Entity. Using auctioning
example once again - we could have an Entity for Auction with methods for placing a bid and
withdrawing one:
class Auction:
def place_bid(
self, user_id: int, amount: Decimal
) "-> None:
pass
class PlacingBidUseCase:
def execute(self, _args):
""...
auction.winners = [new_bid.bidder_id]
auction.current_price = new_bid.amount
Such an approach would effectively make our Entities anemic. Such creatures are also called
Data Classes1 (not to confuse with data classes from standard library!) or Plain Old Python
Objects. They are just dummy bags for data and have no methods (behavior). Whole logic
would be implemented outside such classes. This pattern is known as Transaction Script2.
This can work in certain circumstances, but certainly not in this case, because auctioning
domain has invariants to protect. For example, every change of the winner affects the
current price. We already know at least two situations when this happens - when someone
offers more than the previous winner and when we are to withdraw currently winning bid.
We are going to have separate Use Cases for PlacingBid and WithdrawingBid, so naive approach
with methodless classes and Transaction Scripts implies that we would have to duplicate logic
of calculating current price, which is unacceptable. When Transaction Script is misused and
implements the logic that should be encapsulated by Entity, we are talking about anti-
pattern called Anemic Entities. Yet another principle that warns against changing object data
from the outside is Tell, Don’t ask3.
class PlacingBidUseCase:
def execute(self, _args) "-> None:
# Tell, don't ask violated
# if auction.current_price < new_bid.amount:
# auction.winners = [new_bid.bidder_id]
# auction.current_price = new_bid.amount
1 Martin Fowler, Refactoring: Improving the Design of Existing Code 2nd edition, Chapter 3, Data
Class
2 Martin Fowler, Patterns of Enterprise Application Architecture, Chapter 9, Transaction Script
3 Martin Fowler, TellDontAsk https://martinfowler.com/bliki/TellDontAsk.html
THE DEPENDENCY RULE
Grouping classes and functions into layers is not enough to get clear, maintainable codebase.
Obviously, control flow has to cross at least few (if not all) layers to actually do something
in projects that use the Clean Architecture. Having benefits and goals of this approach in
mind, interactions between layers cannot be left to chance. One possibly could import and
call some framework-specific code in the domain layer if it not had been for the Dependency
Rule. It says that no lower layer is allowed to know and use anything from any upper layer.
For example, one is not permitted to use any class, function or a module from the
Infrastructure if we are in the Domain layer. The Dependency Rule not only forbids developer
from explicitly importing symbols from the outer layer but also discourages accepting these
as functions arguments. The Dependency Rule is illustrated with arrows in the architecture
diagram. The direction of arrows is the same as dependencies: Infrastructure uses Application,
Application uses Domain, but it is not allowed for Domain to use Infrastructure or Application
etc.
BOUNDARIES
Last, but definitely not least thing layers need to have are sharp boundaries. The boundary
defines a communication protocol with the layer. A layer groups code. It contains classes
and functions. Most of them will not be meant to be used from the outside. They are private,
in a manner of speaking. This implies that no one from the outside should be even bothered
by their existence. To point lost developers in the right direction, one should expose the
layer’s API and make it look like an obvious path to take whenever someone needs layer’s
functionality. Effectively, a boundary is a set of interfaces. Their methods are like doors and
arguments of these methods are like locks. They expect a specific argument which will open
them, like a key.
Arbitrary types should not be passed between layers. Following the Dependency Rule, one is
strictly forbidden from passing data structure from the upper layer down to the lower layer.
For example, passing an ORM model to Domain violates the rule, because it implies that
Domain knows something about the outer world. Languages without static typing or type
annotations (like Python before 3.4) can have that easily overlooked. Fortunately, importing
something from an upper layer only to annotate an argument already gives bad feelings.
Input arguments are part of a boundary, and they should belong to a layer that accepts them.
In a real-world API of a layer will consist of many methods accepting a varying number of
arguments. This complexity cannot be taken lightly. Thus, it makes perfect sense to group
boundary parameters for individual entry points - methods - in data structures. This pattern
is called Data Transfer Objects (DTOs).
@dataclass(frozen=True)
class EmailDto:
src: EmailAddress
reply_to: EmailAddress
contents: str
The most crucial boundary is placed on the edge of Application layer. Application’s boundary
that is to be used from the outside world is formed by Use Cases. To avoid exposing concrete
classes (and hence, coupling) with Application’s clients, another interface can be introduced
that will abstract a Use Cases - Input Boundary. From External World’s perspective Use Case/Input
Boundary is just an interface communicating business intent of an application. To call it, one
has to prepare a DTO and pass it as the only argument. Analogously, another DTO is a result
of actions taken by a Use Case (though it is not directly returned - more on this later). These
three (Input DTO, Output DTO, Input Boundary) together form a rock-solid boundary that
hides all details of Application layer. Other names that may be used to refer to Input- and
Output DTOs are respectively called Request and Response. However, to avoid confusion with
HTTP protocol, I will refer to them using Input- and Output DTOs throughout the book. Data
Transfer Objects are immutable (frozen=True). There is no reason why would anyone want to
mutate data inside. They are like messages - one coming in and another coming out.
@dataclass(frozen=True)
class PlacingBidInputDto:
bidder_id: int
auction_id: int
amount: Decimal
@dataclass(frozen=True)
class PlacingBidOutputDto:
is_winning: bool
current_price: Decimal
class PlacingBidInputBoundary:
@abc.abstractmethod
def execute(
self, request: PlacingBidInputDto
) "-> None:
""...
MVC ANYONE?
If you are a Pythonista who wrote some code in Django, Flask or Pyramid, you might
be confused a bit with naming. Controller in the diagram corresponds to a concept you
know as view, whereas View resembles template. This fuss roots in different patterns
adoption between Python and other programming communities. The diagram
assumes readers acquaintance with Model-View-Controller, while Django embraces
something known as Model-Template-View. More information can be found on
djangobook.com - Django’s Structure – A Heretic’s Eye View https://djangobook.com/
mdj2-django-structure/.
C HAPTER SUMMARY
Actual value of IT projects lies right next to the most significant complexity they have.
Provided that a project is something more than just a browser for a relational database,
there will be plenty of business rules that have to be enforced. The Clean Architecture treats
the latter as first-class citizens. Instead of hiding this most-valued logic in a soup of
frameworks and ORMs it exposes business rules and processes on separate layers - Domain
and Application. Distilled business logic can be easily tested as it is completely unaware of
the external world. Code responsible for communicating with it lies in the Infrastructure
layer. The latter can use Application, but Application must not know anything about
Infrastructure. This is enforced by the Dependency Rule:
Obviously, during the execution of a business scenario, one will have to insert rows to a
database or call an external service at some point. The Clean Architecture forbids coupling
business logic with the external world, so Application defines set of Interfaces (also known as
Ports) which are a form of abstract plugins. Concrete implementations are to be eventually
provided by Infrastructure.
Keeping everything in order requires drawing sharp, distinctive boundaries. Layers expose
some functionality via Interfaces that accept Input DTO (sometimes called Request) as
arguments. All details are hidden behind the boundary. From the outer world, one can only
see method signatures and data structures required to call method lying on the boundary.
REFERENTIAL IMPLEMENTATION
DISCLAIMER
This chapter is to present example implementation according to the original idea presented
by Robert C. Martin in The Clean Architecture article4, few talks given on conferences5 and
described in his book6.
I must admit I have never tried implementing the Clean Architecture in a commercial
project rigorously following original Uncle Bob’s vision. I felt that a few parts could be
removed or done differently without losing too much. Although my implementations look a
bit different, I decided to illustrate the original concept with code for the sake of
completeness of this book. In the next chapter, I describe possible simplifications one may
make without compromising much of the quality and benefits.
This example is a standard web application that uses a database for storing data. Control
flow begins in Controller, which is invoked by a web framework upon dispatching request.
Role of the Controller is to repack HTTP request data into Input DTO and pass it to Input
Boundary, implemented by Use Case (also known as Interactor). The latter uses data from Input
DTO to fetch required Entities from Database using Data Access Interface. Then Use Case
orchestrates Entities to perform business logic and optionally saves them using Data Access
Interface. Use Case finishes its task by building Output DTO and passing it into Output Boundary
implementation - Presenter. Its role is to reformat data to be convenient for displaying in the
final View. View receives data in another DTO, called View Model. Use Case that implements
Input Boundary, does not return anything. Presenter that implements Output Boundary is to
actually present the result using Output DTO.
The code is to be derived from a set of business rules. Therefore I present them before the
implementation is shown:
◦ to become a winner, one has to offer a price higher than the current price
• Auction has a starting price. New bids with an amount lower than the starting price
must not be accepted
IMPLEMENTATION
SEQUENCE DIAGRAM
It may look confusing that there is no arrow from Presenter to View just before the end. There
is a reason for that described below.
INPUT BOUNDARY
@dataclass(frozen=True)
class PlacingBidInputDto:
bidder_id: int
auction_id: int
amount: Decimal
We assume that the authentication aspect is to be dealt with on a web framework level - we
just accept bare id that belongs to a person that places a bid and trust it. Input DTO is to be
passed into Use Case abstracted by an Input Boundary:
class PlacingBidInputBoundary(abc.ABC):
@abc.abstractmethod
def execute(
self,
input_dto: PlacingBidInputDto,
presenter: PlacingBidOutputBoundary,
) "-> None:
pass
OUTPUT BOUNDARY
At the same time, we expect our operation to produce some data in the form of Output DTO:
@dataclass(frozen=True)
class PlacingBidOutputDto:
is_winning: bool
current_price: Decimal
class PlacingBidOutputBoundary(abc.ABC):
@abc.abstractmethod
def present(
self, output_dto: PlacingBidOutputDto
) "-> None:
pass
PRESENTER
As you might have deduced from the sequence diagram, the flow of control ends in a
Presenter implementation, namely present method. We do not return anything to Controller
(or view in MVT). A bidder should see new data immediately after the present call ends.
This is hard to imagine in the most popular Python web frameworks when Controller is
expected to return something that the framework is going to send to the client later.
However, approach with flow ending in the present method works perfectly fine for mobile
applications which can build the next screen depending on contents of PlacingBidOutputDto
and show it to the user. One could also get to such behavior in frameworks that create a
response object beforehand and lets you manipulate it. Examples for this particular case will
be shown later in the book. For the sake of simplicity, one would rather extend
PlacingBidOutputBoundary interface with another method that can be used for retrieving
data in Controller:
class PlacingBidOutputBoundary(abc.ABC):
@abc.abstractmethod
def present(
self, output_dto: PlacingBidOutputDto
) "-> None:
pass
@abc.abstractmethod
def get_presented_data(self) "-> dict:
pass
Any concrete implementation would essentially be just giving back formatted data:
class PlacingBidWebPresenter(
PlacingBidOutputBoundary
):
def present(
self, output_dto: PlacingBidOutputDto
) "-> None:
self._formatted_data = {
"current_price": f'${output_dto.current_price.quantize(".01")}',
"is_winning": "Congratulations!"
if output_dto.is_winning
else ":(",
}
Finding an appropriate output data type for Presenters which returns through
get_presented_data may be tricky. In Python returning dict is the best bet as it could be
passed down to template rendering function. Popular templating engines accept template
object and a dict instance with data for prepared placeholders. However, this diminishes
Presenter’s responsibility. This problem does not exist when a Presenter does not return data
but is able to actually present result of the process. This topic will be discussed further in the
next chapter.
VIEW MODEL
This is nothing more but another Data Transfer Object that is obtained from a Presenter to be
passed down to the View. In this case, a simple dict does the job, because most templating
engines used in Python web frameworks accept such a format. However, if there is a need
for more control over the structure of a View model, then introducing a class would do the
trick.
USE CASE
Use Case is the most interesting part of the Clean Architecture, where something is finally
happening. Use Case implements Input Boundary and orchestrates an entire business process:
class PlacingBidUseCase(PlacingBidInputBoundary):
def !__init!__(
self,
data_access: AuctionsDataAccess,
output_boundary: PlacingBidOutputBoundary,
) "-> None:
self._data_access = data_access
self._output_boundary = output_boundary
def execute(
self, input_dto: PlacingBidInputDto
) "-> None:
auction = self._data_access.get(
input_dto.auction_id
)
auction.place_bid(
input_dto.bidder_id, input_dto.amount
)
self._data_access.save(auction)
output_dto = PlacingBidOutputDto(
input_dto.bidder_id
in auction.winners,
auction.current_price,
)
self._output_boundary.present(output_dto)
This example is intentionally kept simple. It does not take into consideration any edge cases
or error handling - it is just to reflect what was shown in the sequence diagram. Firstly, we
retrieve Auction Entity using an implementation of AuctionsDataAccess. Having an Entity
instance, we call place_bid method. The latter is a command - it is to change the state of an
Entity but does not return any value. In the next step, we persist changes using an
implementation of AuctionsDataAccess. Finally, we assemble an instance of
PlacingBidOutputDto, by feeding it with data got from query methods on Auction Entity -
winners and current_price property. In the last step, we pass output_dto into
Output Boundary present method call.
One interesting thing here is how data_access and output_boundary are created. They are
not explicitly instantiated by PlacingBidUseCase - rather, they are passed into !__init!__ (a
Python’s rough equivalent of constructor). We know for sure that objects cannot be
instances of AuctionsDataAccess or PlacingBidOutputBoundary because they are abstract.
Actually, we have concrete implementations of these interfaces, namely
PlacingBidWebPresenter and DbAuctionsDataAccess respectively. It is crucial for
PlacingBidUseCase to not know what exact implementation is it using. Why? Because they
belong to higher layer and it would be against the Dependency Rule for Use Case to know
anything about upper layers. On the other hand, AuctionsDataAccess and
PlacingBidOutputBoundary both belong to Application layer, so they can safely be referred in
the Use Case.
import inject
inject.configure(di_config)
Once configured, inject stores a mapping between types (usually abstract classes) and their
implementations. More information on that subject will be presented later. For now, it is
sufficient to know that PlacingBidUseCase does not create its dependencies nor knows which
implementations of abstract classes are used.
DATA AC C E S S I N T E R FAC E
This specifies an interface for retrieving/storing Entities. The simplest interface will consist
of two methods - get by primary key and save.
class AuctionsDataAccess(abc.ABC):
@abc.abstractmethod
def get(self, auction_id: int) "-> Auction:
pass
@abc.abstractmethod
def save(self, auction: Auction) "-> None:
pass
DATA ACCESS
ENTITIES - BID
@dataclass
class Bid:
id: Optional[int]
bidder_id: int
amount: Decimal
This is a simple class that has three fields: id, bidder_id and amount. First one is optional as
newly created bids (before writing them down somewhere) will not have IDs. There is
another approach - to use UUID and always give new bids an ID7. For the sake of simplicity,
I am not adding fields for creation time etc.
ENTITIES - AUCTION
An auction in the simplest form will need a public method for placing a new bid, getting
winners list and current price.
def place_bid(
self, user_id: int, amount: Decimal
) "-> None:
pass
@property
def current_price(self) "-> Decimal:
pass
@property
def winners(self) "-> List[int]:
pass
Please note that place_bid changes an auction (mutates its state), while current_price and
winners do not. Each methods of Auction belong to one of two distinct categories:
• commands that change the state and do not return any value,
This approach is known as Command Query Separation (CQS) and was originally described by
Bertrand Meyer in his Object Oriented Software Construction8 back in 1988. Bear in mind
that queries are considered to be safe - they can be rearranged, used anywhere and will not
affect the state of the system, while one has to be more careful with commands. Usually,
order of invoking commands is meaningful, whereas queries can be invoked in any
sequence. The reason why this pattern was applied here is that it simplifies and orders
Auction class interface. It will also make it a bit easier to test the class.
C HAPTER SUMMARY
All these layers and abstractions are here to distill code driven by business requirements
from non-functional stuff.
THE CLEAN ARCHITECTURE
MODIFIC ATIONS
“ Pe r f e c t i o n i s a c h i e v e d , n o t w h e n t h e re i s n o t h i n g m o re t o
add, but when there is nothing left to take away” -
Antoine de Saint-Exupery
PRESENTER DILEMMA
The original recipe for the Clean Architecture contains a lot of moving parts. Certain
assumptions, like control flow ending in Presenter, are not suitable for flows we are
accustomed to in web applications programming, e.g. in mainstream web frameworks
written in Python (and many other programming languages, to be fair). It is expected from a
developer to explicitly return some value from a Controller (or View, using MVT
terminology):
One cannot simply hack their way around this behavior, so the only option left is to make
Presenter return a value to view and push it forward from there:
Not every framework will force you to use such tricks. For instance, Falcon9 passes a
response object to every Controller. Therefore, Presenter is able to freely manipulate a
response without the need to return anything to Controller:
class PlacingBidJsonPresenter(
IndexOutputBoundary
):
PRECISION = Decimal("0.01")
def present(
self, output_dto: PlacingBidOutputDto
) "-> None:
if output_dto.is_winning:
message = (
"Congrats! You are a winner! :)"
)
else:
message = "Sorry, your bid was not high enough!"
price_formatted = output_dto.current_price.quantize(
self.PRECISION
)
self.resp.media = {
"current_price": f"${price_formatted}",
"message": message,
}
More frameworks share this approach, especially asynchronous ones. express.js, which is
node.js-based solution, also would allow for presenting response without explicit returns:
Figure 3.1 A sequence diagram of the Clean Architecture with queries instead of Output Boundary
Such an approach considerably simplifies Use Case, because it makes assembling Output DTOs
unnecessary. As a result, Output Boundary and Presenter are also thrown out of the picture. On
the other hand, using Query may involve additional round-trips to the database (or other
data source), which looks like a waste. That is true, but only when we have all data required
to produce the response in the Use Case. There is a trade-off to be made between simplifying
Use Case and slightly lowering performance in certain cases. In case Use Case does not have
enough data to produce response despite taking care about executing business logic,
additional calls to the database are inevitable anyway.
In such a case, sticking to Output Dto -> Output Boundary -> Presenter forces a developer to
put fetching additional data somewhere in the flow. Maybe Use Case should put everything
that is needed for presentation to Output DTO? Doubtful, since it would mean that data
presentation drives implementation of a business flow. It is rather Presenter’s role to take
care of having all the required data that cannot be provided by a Use Case whose Output
Boundary is implemented by the Presenter. Leaving this to Presenter allows for some flexibility.
If we have to support multiple Presenters for every possible delivery mechanism of the system
(e.g. web or CLI), then each of them can have different requirements about additional data
and fetch only what it needs.
To sum up, we have two options - use Output DTO -> Output Boundary -> Presenter chain or
abandon this part of the Clean Architecture completely in favour of queries inspired by
CQRS.
Many interfaces along the way are there to provide loose coupling. One argument for
keeping abstractions wherever possible is that they foster testability. The truth is that in
dynamic languages that allow for monkey-patching, we can unit-test classes even if they are
explicitly instantiating and using other concrete classes. In other words, tight coupling is
not really an obstacle, though it lowers the quality of the design. If you can’t help rubbing
your eyes, I want to calm you down: I do not use neither approve monkey-patching. It is one
of these dirty tricks that are going to bite a developer sooner or later. The point I am trying
to make is that loose coupling itself is not a goal. A goal is to deliver features in time while
having a maintainable, easily extendable codebase. It might not be such a big deal to have
Controller coupled to a Use Case. We do not expect Controller to do anything but repacking
request data into Input Dto. This integration actually can (and should!) be checked with
higher-level tests.
Getting rid of Input Boundary interfaces has one consequence - View/Controllers will be tightly
coupled to concrete Use Cases/Interactors. If that is something you can afford, there is no
reason for keeping Input Boundaries. It should not lower the testability that much - View/
Controllers contain little if any logic and are rather tested in end-to-end tests.
ALTERNATIVE DESIGN OF USE C ASES
If applications were hotels, each Use Case / Interactor would be an employee dedicated to
looking after a single service the hotel offers. In such a design, a hotel guest willing to use
spa treatments would contact Spa Treatments Keeper. If they were keen to use on-site golf
field, they would talk to Golf Field Keeper etc. This approach is not a practical one - in the
hotel industry, a hotel guest would be rather contacting reception - either personally or via
phone call. The hotel reception is guests-facing interface of various services offered. The
resemblance to Facade design pattern is visible to the naked eye. With regard to Use Cases /
Interactors, having each of them as a separate class is roughly equivalent to hotel design with
dedicated keepers. The alternative design assumes we have a single class (Facade) with
separate methods, each responsible for one business flow. Effectively, all Use Cases are turned
into methods of a Facade. Naturally, Facade’s methods can use auxiliary classes to do the job
(just like receptionist, passing guests’ requests further), but these collaborators should not
be visible neither accessed from the outside. You want something - go to the application
Facade. This approach works nice provided that our Use Cases are hardly sophisticated, e.g.
follow the same schema: get an Entity using Data Access Interface, call an Entity’s method, then
save it back.
Such a pattern is dangerous when an application is not modular because such a Facade would
quickly become enormous. Modularity will be discussed later, in probably the most
important chapter of the book. Also, have in mind potentially complicated injecting
dependencies.
class AuctionsFacade:
def place_bid(
dto: PlacingBidInputDto,
) "-> None:
""...
def withdraw_bid(
dto: WithdrawingBidInputDto,
) "-> None:
""...
A few paragraphs before a possibility of resigning from Input Boundary was discussed. Even
earlier, a design with Query replacing Output DTOs, Output Boundary and Presenter was shown.
Assuming that our Use Cases do not build Output Dtos and we still see decoupling from
Controllers beneficial, we can turn our Input DTOs into CQRS’s Commands and introduce a
Mediator10, called Command Bus. From now on, there is no PlacingBidInputDto - it becomes a
PlaceBid, a Command. Controller still has to assemble this Data Transfer Object, but now it
passes it into universal dispatch method of Command Bus:
During application start-up, Command Bus is configured to route Commands to Use Cases. To be
strict, our Use Cases become Command Handlers.
The main benefit here is that we are completely decoupled from the handler. This also
simplifies our Controllers, because they only have to know Command classes (e.g. PlaceBid)
and Command Bus. With such an approach, Input Boundary becomes completely redundant.
Command Bus makes possible one more, not-so-obvious design. Our Command classes are
Data Transfer Objects, which makes them easy to serialize and send over the wire. With such a
design, it is easier to evolve towards a distributed, message-driven architecture.
If one uses an ORM library backed by an RDBMS, it might be tempting to reuse ORM
models as Entities. Knowing the Dependency Rule, placing storage-coupled classes in the
centre of Domain layer is unforgettable. First, model classes may cause leaking details of the
underlying persistence mechanism into Domain. In turn, the latter gets unnecessarily
complicated due to the extra coupling. Secondly, Domain built around things that rely on the
database may no longer be easily tested. Application’s testability will also suffer because it is
built around Domain. Thirdly, by reusing ORM models, one will not be able to create useful
abstractions for things more complicated than tables in relational databases which are
characterized by a flat, normalized structure. Rows from RDBMS tables are a very weak
metaphor for business Entities, especially when graphs of objects are involved. Thinking
through the prism of database rows incapacitates our innate ability to model more complex
things. Last but not least - unrestricted lazy loading can be seen as an invitation for
unwanted side effects in Domain & Application.
10 Erich Gamma et al, Design Patterns: Elements of Reusable Object-Oriented Software, Chapter 5:
Behavioral Patterns, Mediator
Database schemas are built around data, while Entities are there to protect business
invariants. The latter are rules that have to be enforced regardless of the application context.
An illustrative example of an Entity that has some invariants to protect is a cart in an e-
commerce project. Examples:
• Whenever someone adds a new product to a cart, the number of items increases
• One cannot add the same product twice - if they do, we increase the quantity
• A customer must not order more than ten pieces of a single product at once
If a developer cannot find invariants to protect, then it means one of two things: there is not
enough information about business requirements or the problem to solve is too trivial. In
the latter case, the Clean Architecture is not needed for the project.
On the other hand, if a developer clearly sees that there are non-trivial business invariants
but is still keen on using models as Entities, let’s consider a few benefits it may bring.
Initially, all Entities will have their corresponding models with exactly the same fields. It is
completely natural to perceive this phenomenon as some kind of undesired duplication. The
same feeling may occur to someone writing a Use Case / Interactor responsible for creating an
Entity.
The cure is to look into the future - initially, when a project starts, these three may indeed
have identical fields. At some stage, they will stop looking identically because they all have
different reasons to change.
For example, creating an auction (as a flow in the application) is unlikely to change
dynamically. We set title, choose a product and set initial price. After some time we notice it
would be super comfortable to have a current price kept on the Entity instead of dynamically
calculating it every time with a list of auction bids, so we add a field to both AuctionModel
and Auction Entity. It does not affect Input DTO for creating auctions at all.
After some time a new feature request comes - show number of winners for each auction in
a dashboard of an administrator. The leanest and most efficient way to achieve this would be
to just add a column to AuctionModel and recalculate its value when we save Auction Entity.
Please note the latter does not have to know anything about the new column. Data
structures that appear to be identical in the beginning of the project may evolve in very
different directions if only their reasons to change differ. However, there is one regularity -
when Auction Entity changes, it is almost certain that AuctionModel will follow. That is fine -
it comes from the Dependency Rule. The model depends on the Entity after all.
That being said, in an ideal world, we should be able to write an arbitrary pure class in a
chosen language and use it as an Entity. Then code responsible for persisting the Entity
should be generated automatically for us. On the other hand, given how JPA (Java
Persistence API), Entity Framework (C#) or SQLAlchemy models look like, how far are we
from this ideal vision?
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.Id;
import javax.persistence.Table;
@Entity
@Table(name = "auctions")
public class Auction {
@Id
@GeneratedValue
private Integer id;
Here is how Auction would look like in JPA - it is a pure Java class with annotations
providing extra metadata for a persistence mechanism. Tentatively, it is acceptable for an
Entity to be such a hybrid. However, a mental model of our Entities will still be partially
constrained because it has to fit into tables in an RDBMS. Such a class alone does not
allow for any direct interactions with the database without JPA’s EntityManager except lazy
loads. It would be really nice to block them.
auction_with_bids from the above example will not allow for lazy loading anything that was
not fetched during the original query.
One note: Active Record ORMs as in Django or Ruby on Rails, are utterly inappropriate for
building Domain around them. They expose too much to Domain which should not be able to
touch the persistence layer in any way. None of JPA, Entity Framework or SQLAlchemy
implements Active Record. On the other hand, ORM of Django or Ruby on Rails are flagship
examples of this pattern.
In conclusion, reusing ORM models as Entities is tempting, but it has numerous negative
consequences. On the bright side, it would relieve from tedious writing code responsible for
persisting. On the downside, it limits freedom in modeling Domain and can seriously
damage testability.
C HAPTER SUMMARY
Control flow in the Clean Architecture ends in a Presenter, which may not always be possible
to implement in specific frameworks. There are two alternatives:
1. add another method to Presenter interface for getting formatted data to be called inside
Controller
One can get rid of Input Boundaries abstractions if tightly coupling of Controllers to concrete
Use Cases is not a problem in their project. There is also an alternative approach to
decoupling with CQRS’ Command Bus, which involves turning Input DTOs into Commands and
Use Cases into Command Handlers.
It is advised against using ORM models as Entities. A developer is then confined to seeing
through prism of database rows. Thinking in terms of such a flat data structure is limiting.
Entities should be built to protect business invariants. In order to do so, a developer often
has to reach beyond a single object and resort to objects graph instead. If there are no
invariants to protect in the project, then probably the Clean Architecture is not a suitable
approach.
DEPENDENCY INJECTION
ABSTRACTIONS & CL ASSES EVERYWHERE!
Reason to blame for such a state of affairs is a lack of an intentional design. It is said that
object-oriented code of decent quality has loose coupling. To start with, what is coupling?
class CreditCardPaymentGateway(PaymentGateway):
pass
# tight coupling
class Order:
def finalise(self) "-> None:
payment_gateway = CreditCardPaymentGateway(
settings.payment["url"],
settings.payment["credentials"],
)
payment_gateway.pay(self.total)
# loose(r) coupling
class Order:
def !__init!__(
self, payment_gateway: PaymentGateway
) "-> None:
self._payment_gateway = PaymentGateway
Consider the above example. In the first part, Order instantiates a concrete class
CreditCardPaymentGateway to use one of its methods. Order has to know exactly what is
required to build such an instance (note passing settings). This is an example of tight
11 Erich Gamma et al, Design Patterns: Elements of Reusable Object-Oriented Software, Chapter 5:
Behavioral Patterns, Template Method
coupling because it is highly probable that changing CreditCardPaymentGateway will entail
adjustments in Order. The second part of the listing presents loosened coupling. Order
accepts an instance of a class that inherits from abstract PaymentGateway. Order is not only
no longer burdened with instantiation, but also remains ignorant of the type of
payment_gateway. Simply saying, coupling is the measure of how hard it is to change one
part of code while keeping the rest working.
A reader might have noticed that this refactoring does not eliminate a need for instantiation,
only shifts responsibility to anyone who will use Order class. Does it mean that if Order is
instantiated in 10 places, one will have to instantiate CreditCardPaymentGateway 10 times?
Luckily, that is not true. Dependency injection containers is a solution to this problem.
Before you start introducing abstractions in your code, bear in mind that loose coupling is
not a goal in itself. What we would ideally want is to be able to do swiping changes in code
without breaking half of the functionalities at the same time. At the same time, extra
abstractions do not come for free and can do more harm than good if they are misused.
The trickiest aspect of using abstract classes/interfaces is knowing where to put them. The
Clean Architecture mainly thanks to a layered structure, exactly steers developers through
these intricacies. In the example from the previous chapter, there is Input Boundary interface
that abstracts Use Case, Output Boundary abstracting Presenter and finally, Data Access which is
an abstraction for DbAuctionsDataAccess. In the last case, we are dealing with a plugin (Data
Access) to a business process flow (Use Case). To explain why having an abstraction here is so
important, we will play in five whys:
We do not want them to be tightly coupled together. PlacingBidUseCase does not care where
Entities are stored or where they are retrieved from. It just needs something to perform such
operations. Something that will fulfill Data Access contract in a form of interface, imposed by
the application layer. Having an interface in between allows for loose coupling, which is vital
in this case.
3. Why would I allow for testing these classes separately? In a production environment
they will always be used together, so what guarantee do I have everything will be fine
once we deploy the project?
A lot of time will be saved. Code inside PlacingBidUseCase runs entirely in memory; there is
no need for calling external services, doing disk operations, etc. Therefore, it is very fast. At
the same time, it is a crucial part where business requirements are materialized. There are at
least a few possible scenarios with different outcomes. In contrast, DbAuctionsDataAccess is
responsible for non-functional requirements. In order to fulfill them, DbAuctionsDataAccess
leverages an RDBMS, so it needs to perform I/O operations. This is a few orders of
magnitude slower than executing code residing in memory. Finally, in DbAuctionsDataAccess
there are usually no alternative scenarios. No if-statements, no branches. If there is no
possibility to test these two strikingly different classes independently, a developer would
have to test them together. This can mean a huge time waste just because
DbAuctionsDataAccess cannot work without interacting with a database.
Concerning guarantee of correctness, even the most exhaustive test suites testing these
classes in separation does not assure that the entire project will work fine. To be certain, we
need higher-level tests that will check if PlacingBidUseCase and DbAuctionsDataAccess co-
operate smoothly. Of course, such a test would somehow duplicate checks we are doing
separately for both classes, but that is a trade-off worth making. An end-to-end test
executed via a REST API (or user interface) can be checking a happy path not only to assure
that PlacingBidUseCase works together with DbAuctionsDataAccess, but also ensure there is
no friction between all other components involved. The testing strategy is a huge topic. A
separate chapter later in the book has been devoted to it.
4. What else can I get from loose coupling? What are other problems I avoid by getting
rid of tight coupling between PlacingBidUseCase and DbAuctionsDataAccess?
A reduced cognitive load since a developer is able to reason about one class at a time
without bothering with the second one. Only its interface is to be taken into account.
5. How to manage loosely coupled classes then? Are there any patterns, libraries?
INVERSION OF CONTROL
IoC relieves classes from creating their dependencies, letting them only specify what
interfaces they need. What concrete implementations will be really used is no longer their
business. In the Clean Architecture, it basically means that layers above the application layer
are making decisions what implementations will be used by Use Case.
Dependency injection with Inversion of Control is used even in Django. The best example is cache
module. Its basic usage is very simple:
A developer gets cache object by simply importing it - there is no need for instantiating
anything. It is ready to use right away. Looking so simple on the surface, a lot of things
happen behind the scenes. There are many available backends for cache. One of them keeps
data in memory. Another is based on memcached12. There is even a dummy implementation
that never saves anything anywhere and always returns None, simulating cache miss.
12 Memcached https://memcached.org/
However, a developer just sees an object get with and set methods. cache is, in fact, a
proxy13 to concrete implementation.
Yet, there must a way to define which real implementation will be proxied by cache. Or
speaking more generally - to map interfaces into concrete classes. Dependency Injection
technique exists exactly for this purpose. So-called Inversion of Control Container is to
orchestrate the process of instantiation and configuring objects. Going back to the example
from the beginning of the chapter - Order requiring an instance of a subclass of
PaymentGateway - developer would only configure IoC Container to instantiate
CreditCardPaymentGateway when PaymentGateway is requested.
Naturally, there is no reason to write your own Inversion of Control Container. There are many
excellent libraries available to use. C# programmers have outstanding Autofac14 at their
disposal. Java people can use Guice15 (Java) while Pythonistas should check out Injector16
(Python). Even if you are not going to use any of them, their manuals alone provide lots of
useful knowledge about dependency injection and inversion of control.
Returning to Django and its cache, the framework configures cache backend (does the job of
IoC Container). A developer is just to configure what they would like to have under the hood
and lets Django do the rest.
CACHES = {
"default": {
# choose concrete implementation
"BACKEND": "django.core.cache.backends.memcached.MemcachedCache",
# provide extra details, e.g. location of memcached server
"LOCATION": "127.0.0.1:11211",
}
}
Initializing concrete backend class and wiring it together with cache is just a stage in
booting up an application. This is the right time for Inversion of Control Container to be
prepared for any application/language etc. The mapping between interfaces and concrete
13 Erich Gamma et al, Design Patterns: Elements of Reusable Object-Oriented Software, Chapter 4:
Structural Patterns, Proxy
14 Autofac https://autofac.org/
15 Guice https://github.com/google/guice
16 Injector https://injector.readthedocs.io/en/latest/
classes is a part of configuration after all. It should be done before any real job is taken care
of by an application.
In Django whole injection ceremony is hidden, and that is a good thing. However, how a real
IoC Container should be used? Let’s say it has been configured. Should it be freely used by
arbitrary code in the project? For example, whenever we need Cache, we could just call
container.get(Cache), and we would receive an instance of a concrete class. Such kind of a
global registry is called Service Locator and is largely considered an antipattern17.
Let’s take a referential implementation of the Clean Architecture presented earlier. We want
to call PlacingBidUseCase from the outer layer. It is abstracted by PlacingBidInputBoundary,
so we ask the container to provide it. We expect to get an instance of a concrete class -
PlacingBidUseCase. The latter requires AuctionsDataAccess, so inside PlacingBidUseCase we
use container once again. We get an instance of DbAuctionsDataAccess. If
DbAuctionsDataAccess requires a database connection or settings object we would also have
to use container inside it to fetch required stuff. I guess it is obvious where does this thing
go. This is how Service Locator looks in the wild.
class AuctionsRepo:
def get(self, auction_id: int) "-> None:
pass
class PlacingBidUseCase:
@inject # instruct Injector to perform injection
def !__init!__(
self, repo: AuctionsRepo
) "-> None:
self._repo = repo
It is also recommended to use some glue code with a web framework (or any other delivery
mechanism) that will hide calls to the container. An example of flask-injector code should
shed some light:
The application can be configured differently for different environments. A developer will
surely prefer to use a local database instead of the production one on their computer.
Perhaps it would also help to disable some external vendors and use stubs instead.
Dependency Injection shines here - if only we have interfaces and IoC Container in place. This
opens another door - techniques such as Branch By Abstraction18 for gradual migrating from
one solution to another. For example, we might be developing MongoDbAuctionsDataAccess
for days/weeks, but we must not use it in production until the implementation is finished.
A programmer who develops MongoDbAuctionsDataAccess can use it locally, while the rest of
the team uses stable DbAuctionsDataAccess at the same time. Making Dependency Injection
C HAPTER SUMMARY
Loose coupling achieved by reasonable use of abstract classes and interfaces allows for
increased testability and maintainability of code. Produced design is more elegant and
flexible. This is especially visible in the Clean Architecture, which explicitly states the need
for interfaces between business logic and infrastructure-specific code.
In an ideal world, Application and Domain layers remain thoroughly ignorant of the
existence of Inversion of Control Container. To achieve it, one has to avoid Service Locator anti-
pattern.
The mysterious acronym CQRS stands for Command Query Responsibility Segregation. It is a
pattern for separating the code that alters the state of the system from the code that does
not change anything but returns data. Code that changes state of the system is referred to as
Commands, while the latter constructs for retrieving data are called Queries.
• state of the application remains intact, no matter how many times a day we request
data
• it does not matter if we request products first or delivery addresses - we will get
exactly the same results
In other words, Queries are simple and safe (they don’t alter system state). What particular
data should they return is imposed by a user interface which will eventually present it to the
user. So the only reason for Queries to change is to conform with the user interface.
Commands, in contrast, are very different creatures. Their sole purpose is to change the
system state. In an e-commerce application examples might be adding an item to a cart,
adding or removing delivery address. There are few special things about Commands:
• they always change the state of the system provided their execution succeeded
• a sequence of commands execution matter - it simply does not make any sense to
remove delivery address before we have added one
• all business rules have to be enforced during their execution to guarantee that the
system is not in an incorrect state afterwards
In conclusion, Commands are unsafe and a few orders of magnitude more complex than just
reading data from a SQL database or another data store. Commands are affected only by
changes in business requirements. New shiny user interface after weeks of redesign should
not change the rules of the game.
Commands and Queries are very different from each other. CQRS focuses on this dichotomy.
The division is not about different classes naming schema (i.e. GettingAuction, PlaceBid). It
reaches far beyond. In its most extreme form, CQRS proposes separate stacks for Commands
and Queries:
The left side of Figure 4.1 shows the write stack (for Commands). We see here all layers
from the Clean Architecture. The right side of the picture is occupied by the read stack (for
Queries). One can immediately tell there are less moving parts in the query stack. Enigmatic
Queries layer plays the same role as Application layer - it will contain classes (Queries) that can
be directly used by upper layers, for example, web interface. If we were to find an analogy
for any building block from the Clean Architecture, it would be Use Cases. Both Query and
Use Case are deliberately exposed to be used by the outer world. The difference is that Query
must guarantee it will not alter the state of the system in any way, while Use Cases do not
make such promises.
W H AT D O E S I T H AV E T O D O W I T H T H E C L E A N
ARCHITECTURE?
What was discussed so far in the book perfectly fits the write stack definition. We are
dealing there with the essential complexity of the project, carefully modeling business rules
in code. In CQRS building blocks look a bit differently, but mostly they do the same thing.
In write stack of CQRS, we would be using Commands, which are Data Transfer Objects. In an
implementation, they can be indistinguishable from Input DTOs. The latter is passed to Use
Cases, while Commands are executed by Command Handlers. Both Use Cases and Command
Handlers represent a business scenario in code. As a result, one does not have to know about
the existence of Command Handlers to use services of the application. Since we replace calling
a Use Case with sending a Command, it is much easier to distribute an application written in
such a way. Commands become messages which can be then sent over the wire.
@dataclass(frozen=True)
class PlacingBidInputDto:
bidder_id: int
auction_id: int
amount: Decimal
The Clean Architecture does not say anything specific about query operations. Hence, it is
assumed that all scenarios like loading list of delivery addresses, getting details of the user
etc. are to be implemented using Use Cases, Entities, Data Access Interfaces and Presenters. It is a
lot of work for simple and safe operation. This is an excellent opportunity to supplement the
Clean Architecture with a pattern stolen from CQRS.
The main argument in favor of leveraging Queries is a much simpler implementation. Simply
put, considerably less code is needed to achieve the same result. One does not sacrifice any
advantage of the Clean Architecture because read operations are safe. Business rules are not
applicable in the context of queries since they do not affect the state of the system.
Secondly, using Queries is bigger freedom in modeling. A developer does not have to reflect
every requirement about viewing data in the write stack and vice-versa. In other words, even
if models have to be enriched with another field because it has to be available on the API or
user interface, they do not necessarily have to be added to entities. The read stack should be
trivial to use and as handy as possible.
The final advantage is leaving room for optimizations and enabling scalability. Read stack
works best when it can use a denormalized data store. Since Queries are implemented
separately, we are free to use different tables or even another database that can be
asynchronously fed with data.
Keeping data for Queries in a separate database is an extreme form of CQRS. There can be
several different approaches:
• The same database, the same tables, just different code for accessing it
As you can see, there are many possible approaches. In the book, I will be sticking to first,
the simplest way of doing CQRS. The main difference will be implemented in code accessing
data. However, patterns should not differ a lot even if we denormalize our data to a
specialized, read the database.
QUERY AS DTO
In this approach, every Query is represented by a single class being a Data Transfer Object.
The class is to represent the intention of getting some data along with the required
parameters. It does not contain any logic of execution. The latter should be placed in a Query
Handler, another class or function. Executing Query involves constructing a query class and
passing it into Query Handler. There is an option of adding another level of indirection
between - one can create a class responsible for dispatching queries to concrete handlers.
This pattern is called Query Bus. With the latter, we do not have to know anything about
concrete handlers.
With regard to the Clean Architecture and its dependency principle, Query class belongs to
Application layer. Concrete Query Handler obviously should be placed in Infrastructure layer.
This approach resembles Command - Command Bus - Command Handler combination with the
difference that in dispatching a Command never returns any result, while dispatching a
Query has to.
@dataclass(frozen=True)
class GetListOfDeliveryAddresses(Query):
user_id: int
Dto = List[DeliveryAddress]
def query_handler(
query: GetListOfDeliveryAddresses,
) "-> GetListOfDeliveryAddresses.Dto:
""...
In the second approach, we still create new classes for every query, but this time there is no
Query Bus or Query Handler involved. Each query is an abstract class placed in Application layer
and has its concrete implementation lying in Infrastructure.
Dto = List[DeliveryAddress]
@abc.abstractmethod
def execute(self) "-> Dto:
pass
# in Infrastructure layer
class SqlGettingListOfDeliveryAddresses(
GettingListOfDeliveryAddresses
):
def execute(
self,
) "-> GettingListOfDeliveryAddresses.Dto:
models = self.session.query(
Address
).filter(
(Address.type "== Address.DELIVERY)
& (Address.user_id "== self.user_id)
)
return [
self._to_dto(model)
for model in models
]
This approach looks more familiar in the context of the Clean Architecture, especially when
we think about relations between abstract Input Boundary and its implementation - Use Case.
Whenever one wants to invoke logic starting from web view, they request Input Boundary.
The dependency injection mechanism is then responsible for instantiating concrete Use Case
corresponding to requested Input Boundary. With Queries this can look the same. One
requests an abstraction (Query living in Application layer) and machinery under the hood
returns a concrete implementation from Infrastructure.
# dependency injection configuration
@inject(config=Config)
def configure(binder, config):
binder.bind(
GettingListOfDeliveryAddresses,
to=SqlGettingListOfDeliveryAddresses,
)
# in web view
@app.route(
"/delivery_addresses", methods=["POST"]
)
def auction_bids(
query: GettingListOfDeliveryAddresses,
) "-> Response:
result = query.execute(
user_id=current_user.id
)
""...
R E A D M O D E L FAC A D E
The third approach is the most flexible, yet a little controversial. It comes down to exposing
query interface from underlying infrastructure directly to view layer, completely bypassing
Application or Domain layers.
Figure 4.3 Bypassing Application and Domain layers with Read model facade
Flexibility comes from removing the need for writing specialized queries for each view and
constraining these details to Infrastructure layer.
Given what we know about the read stack, even if Read Model Facade looks controversial, it is
still 100% safe. At least as long as we can guarantee that no one can use exposed read model
facade to mutate any data in the datastore.
Achieving this is a bit tricky with common Python tools. Django ORM requires a dedicated
Manager that will raise exceptions for insert/update/delete operations. With SQLAlchemy,
the only safe way is to execute Query using separate, read-only connection to the
database. .NET developers have a more convenient method for achieving the same result20.
CQRS looks very good on paper, but if a developer tries to fit it into REST API, they may get
few unpleasant surprises. They all origin in differences between a lifetime of an HTTP
request and read/write stacks dichotomy.
The common expectation for a REST APIs is that after mutating data, we get changed entity
in response. If our commands are synchronous and we know immediately if they succeeded,
then in view we issue query immediately after executing a command.
If we execute commands asynchronously, then we would rather respond with HTTP 202
Accepted and then use other transport mechanisms, e.g. WebSockets, for notifying the client
about finishing operation.
However, if it is REALLY necessary to return a response, one can resort to polling for
command results. Naturally, this is affordable only when we use asyncio, node.js or any
other solution with coroutines when waiting for I/O is not a problem. In a classical prefork
CQRS VS GRAPHQL
A newer kid in town - GraphQL - plays really nicely with CQRS. This is because it also splits
commands from queries, uses different names - mutations and ...queries. It is completely up
to API’s client to specify what data they need after executing a mutation or not.
C HAPTER SUMMARY
CQRS is a compelling approach. This chapter described it briefly without delving too much
into details. A crucial part from the point of view of the Clean Architecture apprentices is
that using read stack concept will considerably benefit and simplify projects. Just reading
data is safe operation (as opposed to changing it), so there is no real benefit from imposing
a discipline of always having Use Case between Controller and Infrastructure layer.
SHARP BOUNDARY
A WORD ON COMPLEXIT Y
Software engineering is fighting a fierce battle against complexity. To gain an advantage over
any opponent, one must get to know them first. Fred Brooks wrote in his famous paper No
Silver Bullet (included in a book The Mythical Man-Month22) that there are two types of
complexity: accidental and essential.
We deal with accidental complexity when the code we read is written in a way we can
improve it with a finite effort. You have not heard about modulo operator (%), so you have
written your own code that calculates remainder. Someone wrote a function that has over
200 lines of code, but with tools provided by IDE you can split it into four smaller ones and
eliminate duplicated logic, eventually getting 80 lines of code. In short, this is how
accidental complexity looks like - it is effectively combatted with clean code, keeping
functions short - in other words, by being an educated engineer who leverages modern
tools.
The second type of complexity has strikingly different nature. Essential complexity reflects
how complicated is the problem one tries to model in code. If an application has to provide
50 features for various types of users, then cleanly written code is not sufficient for the
system to be easy to understand. To be brutally frank, there is no way of getting rid of this
kind of complexity. Our last resort as software developers is to learn how to manage it. On
the other hand, if one can negotiate for reducing the scope and throwing out a few minor
features, then they will reduce essential complexity. When I participated in an Event
T WO WORLDS
It was mentioned that essential complexity is not something that can be simply
eliminated, but has to be managed. A goal of the Clean Architecture is to have all possible
complexity that root in business requirements contained in two inner layers - domain and
application. That is why they should have no knowledge about the outer world - they are
complex enough without such details. Domain and Application layers together form a core - a
place where, ideally, all decisions justified by business requirements are made. Both core
layers are easily testable with unit tests because they either do not have any dependencies or
all of them are abstracted away. Bear in mind that unit tests are the fastest and easiest to
write among all kinds of tests. Hence, it is cheap to get high coverage in this part of the
project where every if-statement is meaningful and was written due to business
requirements.
Layers above the core, namely Infrastructure and higher, are completely different creatures. It
is almost impossible to unit test code placed there because it relies heavily on a scary world
outside - networks, disks, databases. Therefore, we aim for having no decision making there
at all. The control flow should be a straight line - no branches, no if statements. No
23 https://en.wikipedia.org/wiki/Event_storming
24 Tom Poppendieck, Mary Poppendieck, Leading Lean Software Development: Results Are Not the
Point
alternative scenarios whenever possible. To reliably test code residing in these layers one has
to resort to integration or higher-level tests. Of course, such tests will be few orders of
magnitude slower and more complicated than unit tests which cover two core layers.
Between two inner layers (Domain encompassed by Application) and all remaining ones there
is an abyss. Control flow must not cross it carelessly. The Clean Architecture makes it hard
to do something reckless there by introducing Input Boundary and Input DTO. While the
former can be omitted under certain conditions, one still must pay attention to Input DTOs.
It is crucial to always have the Dependency Rule at the back of the head when designing
Input DTOs. This is the only thing that stands between core layers and scary outer world.
A developer must not pass anything that cannot be understood by Domain and Application
layers. This means that Input DTOs can be created only using built-in data types or classes
defined in one of the core layers.
@dataclass
class PlacingBidInputDto:
bidder_id: int
auction_id: int
amount: Decimal
For example, we should be able to trust PlacingBidInputDto that bidder_id field is a valid
integer, but of course, we are not obliged to know if a bidder with such an id even exists
before control flow reaches Use Case.
VA L U E O B J E C T S
Being sure about types is a really nice thing to have (now developers who work with
statically typed languages smile), but it is insufficient (now they are not smiling anymore).
In code examples that were shown so far whenever I needed to represent some amount of
money, I used Python built-in class - Decimal.
The problem is that not every valid Decimal makes sense as a money amount.
• Decimal('0.01')
• Decimal('10.99')
• Decimal('5.49')
• Decimal('-1.99')
• Decimal('3.1415')
• Decimal('-1.0E3')
The point of these examples is to illustrate that the built-in Decimal type is not enough to
express the concept of money. We cannot enforce the desired precision to two decimal
places. Also, there is no notion of currency which is a vital property when we talk about
money. It is clear we need another dedicated type. How about Decimal?
Before delving into implementation details, let’s think about characteristics our Money type
should have:
• it should be immutable - once created, cannot be changed
+ Money specific:
Such types are called Value Objects. We are going to use them extensively in the end-to-end
example presented later in this book.
The implementation can be driven by tests to better illustrate our expectations. We start off
by specifying a base class for currency, so we can easily extend Money whenever new
(crypto)currency emerges. Currency will ensure that our Money is opened for extensions, but
closed for modifications. It means that it is enough to subclass Currency to parametrize
behavior of Money class without having to modify its source code. By the way, this is called
Open-Closed Principle and stands for “O” in a famous acronym SOLID25.
class Currency:
decimal_precision = 2
symbol = None
class USD(Currency):
symbol = "$"
Currency is required for creating Money instance. Since classes are objects in Python, we
could just pass desired Currency subclass:
class Money:
def !__init!__(
self,
currency: Type[Currency],
amount: str,
) "-> None:
""...
25 SOLID https://en.wikipedia.org/wiki/SOLID
The first property of Value Objects is immutability. One cannot really guarantee that in
Python due to its dynamic nature. It is usually sufficient to prepend instance variables
names with a single underscore, so linter and IDE can warn us. Exposing these fields as
read-only properties may still be a good idea, though:
class Money:
def !__init!__(
self,
currency: Type[Currency],
amount: str,
) "-> None:
self._currency = currency
self._amount = Decimal(amount)
@property
def currency(self) "-> Type[Currency]:
return self._currency
@property
def amount(self) "-> Decimal:
return self._amount
The second feature of Value Objects is that it is impossible to instantiate one with an invalid
value. Hence, Value Object carries out validation during initialization:
class Money:
def !__init!__(
self,
currency: Type[Currency],
amount: str,
) "-> None:
if not inspect.isclass(
currency
) or not issubclass(currency, Currency):
raise ValueError(
f"{currency} is not a subclass of Currency!"
)
try:
decimal_amount = Decimal(amount)
except decimal.DecimalException:
raise ValueError(
f'"{amount}" is not a valid amount!'
)
d_tuple = decimal_amount.as_tuple()
if d_tuple.sign:
raise ValueError(
f"amount {amount} must not be negative!"
)
elif (
-d_tuple.exponent
> currency.decimal_precision
):
raise ValueError(
f"given amount has invalid precision! It should have "
f"no more than {currency.decimal_precision} decimal places!"
)
self._currency = currency
self._amount = decimal_amount
Value Objects have no concept of identity - they are meant to be indistinguishable if only were
created using the same Currency and equal amount:
class Money:
""...
Value Objects are great for expressing the intricacies of reality. For instance, let’s assume that
someone has to support multiple currencies. It makes little sense to compare bare amounts
when one is in USD and another in EUR. Just like Python does not like when one tries to
add int to str. Why not write code that will guard that?
C HAPTER SUMMARY
This chapter discussed the meaning and motivation behind establishing a boundary between
two inner, core layers - Domain encompassed by Application and the rest of the world. All
code that makes decisions should be placed inside Domain or Application where it can be unit
tested almost effortlessly.
The Dependency Rule remains in power, so nothing from outer layers can cross the
boundary. One has to translate things from the outside to concepts known in the core
layers. Value Object is a handy pattern that helps to achieve that. It relieves inner layers of
validation. Value Objects thanks to their immutability, can be safely passed around.
END-TO-END EXAMPLE
WHERE TO START?
It is high time to use knowledge of the Clean Architecture to build something. This chapter
will guide you step by step through the development process of an exemplary project.
• As an administrator I want to withdraw bids so that a malicious bidder does not win
an auction.
• As an administrator I want to create auctions so that bidders can place bids on them.
Majority of User Stories makes perfect candidates for Use Cases. Since the former is usually all
we have, it makes perfect sense to start coding from a Use Case when one sits at a new,
shining User Story. It is not the only thing to worry about in a new project, though. Since one
has to ensure that the results of their work will be presentable from the very beginning, it is
equally important to bootstrap project as usual. In other words, one can consider the first
iteration finished when at least a basic scenario of User Story is coded, AND it can be shown
to stakeholders or other interested parties.
That is exactly how I started my journey with the Clean Architecture. Right after creating a
basic structure for the project from a template, we sat with my colleague Dominika for a pair
programming session. For two hours, we were crafting the flow of the basic scenario that
was part of the most complicated process in the application. After the session that resulted
in Use Case, Entities and tests for them, Dominika continued to work on these on her own.
She stayed in touch with the client, resolving ambiguities and discovering edge cases. At the
same time, I was working on connecting Use Case to REST API layer and providing a simple,
in-memory implementation of Data Access. After less than a week, we demonstrated results
to the client using Postman. Frontend part was still in progress at the time, but this did not
stop us from delivering business value. Oh, and of course database was not yet connected at
this stage.
WALKIN G SKELETON
The certain amount of groundwork has to be done during the first iteration. This step is
inseparably connected with the technology one is using, so I will not be delving into too
many details here.
For instance, in Python, one would use Django’s startproject, then startapp commands to
create basic structure and configuration. For different frameworks (and technologies as well)
one can resort to cookiecutter. The goal of this step is simple - to get an ability to quickly
connect Use Case to the desired delivery mechanism. In this case, I mean REST API.
NAMING
Name of the Use Case has to reflect business scenario one is modeling. In this case, I propose
to name it PlacingBid. As a class, it will have only one public method. It can either be
something generic, like execute or more specific - place_bid.
class PlacingBid:
def execute(self) "-> None:
pass
ARGUMENTS
If a Use Case requires arguments (it is not always the case), one defines an Input DTO:
@dataclass(frozen=True)
class PlacingBidInputDto:
bidder_id: BidderId
auction_id: AuctionId
amount: Money
input_dto = PlacingBidInputDto(
1, 2, Money(USD, "10.00")
)
Note that all fields of the Input DTO are Value Objects. Here we have dedicated types for
BidderId and AuctionId. A few words about that will be provided in the following section
about Entities.
Input DTO itself is also a Value Object. An Input DTO does not have to copy the name of the
Use Case. It makes sense to model it as an inner class of Use Case, hence allowing for
shortening name of the Input DTO:
class PlacingBid:
@dataclass(frozen=True)
class InputDto:
bidder_id: BidderId
auction_id: AuctionId
amount: Money
def execute(
self, input_dto: InputDto
) "-> None:
pass
OUTPUT
If a Use Case outputs some data (not always a case!), one defines an Output DTO:
@dataclass(frozen=True)
class PlacingBidOutputDto:
is_winning: bool
current_price: Money
The rule that applies to both Input- and Output DTO is that they have to be very strict about
types of fields and check if data passed in is of an expected type.
If one needs to output some data from a Use Case, an interface (Output Boundary) that will
eventually present PlacingBidOutputDto is missing:
class PlacingBidOutputBoundary(abc.ABC):
@abc.abstractmethod
def present(
self, output_dto: PlacingBidOutputDto
) "-> None:
pass
class PlacingBid:
def !__init!__(
self,
output_boundary: PlacingBidOutputBoundary,
) "-> None:
self._output_boundary = output_boundary
def execute(
self, input_dto: PlacingBidInputDto
) "-> None:
pass
UNIT TESTING
Now when we have laid groundwork, it is possible to start writing actual code. Since we
defined input and output, we can write the most straightforward test for the Use Case. Let’s
try Test-Driven Development:
class PlacingBidTests(unittest.TestCase):
def setUp(self) "-> None:
self.output_boundary_mock = Mock(
spec_set=PlacingBidOutputBoundary
)
self.use_case = PlacingBid(
self.output_boundary_mock
)
def test_presents_data_for_winning(self):
price = Money(USD, "10.00")
input_dto = PlacingBidInputDto(
bidder_id=1,
auction_id=2,
amount=price,
)
self.use_case.execute(input_dto)
expected_output_dto = PlacingBidOutputDto(
is_winning=True, current_price=price
)
self.output_boundary_mock.present.assert_called_once_with(
expected_output_dto
)
This is a pretty simple and naive test. Not only it is based on many assumptions (happy path
- winning, auction exists, bidder exists, etc.), but will also fail because PlacingBid.execute
has no code inside. TDD is about making tiny steps. We could make the test green with this
code:
class PlacingBid:
def !__init!__(
self,
output_boundary: PlacingBidOutputBoundary,
) "-> None:
self._output_boundary = output_boundary
def execute(
self, input_dto: PlacingBidInputDto
) "-> None:
self._output_boundary.present(
PlacingBidOutputDto(
is_winning=True,
current_price=input_dto.amount,
)
)
Making such small steps may be beneficial in the beginning, but when one feels more
confident, they can leave this test failing for now and start crafting missing parts. Apart from
that, there is really not much we can do about PlacingBid Use Case for now.
AUCTION AND BID ENTITIES
As soon as we identify a name for a concept in the domain that can protect Enterprise
business rules, we create an Entity.
NAMING
Entities are usually named in a singular form. They should be given an unambiguous name
that clearly indicates their role. If you struggle to find it, talk to other people, especially
domain experts and project managers.
VA LU E O B J E C T S F O R I D E N T I T Y T Y P E S
You might have noticed enigmatic AuctionId or BidderId in the previous section about
Use Cases. Under the hood, these are just aliases for int, but thanks to their naming they are
much more meaningful than integers. It is also a form of encapsulation. Only very few
places should really care about the type of identity, so it makes sense to hide this
information. Definitions of AuctionId and BidderId are straightforward:
BidderId = int
AuctionId = int
IMPLEMENTATION
Ideally, Entities are classes, written in pure Python. They do not inherit from ORM’ base
classes etc. When we write them, we strive for as little dependencies as possible, though
helpers such as Java’s Lombok26 or Python’s dataclasses/attrs that can generate repetitive
code for us, are welcome. This library can, for example, give us a default constructor for
setting all fields defined in a class.
Going back to Python, that’s how a class outline can look like:
@property
def current_price(self) "-> Money:
pass
@property
def winners(self) "-> List[BidderId]:
pass
This piece of code defines only a method and two properties (read-only field with an
implementation for no-Pythonistas) that will be required for a PlacingBid Use Case.
Obviously one needs to tell Auction to place a bid and finally ask for a list of winners and
current price to ultimately build an instance of PlacingBidOutputDto.
Pause for a second and look again at winners method definition and return value that was
annotated - List[BidderId]. It bears much more information than List[int], doesn’t it?
UNIT TESTING
Since Entities are mostly pure Python (or any other language) classes without external
dependencies, they are trivial to unit-test. There is a plethora of test cases one can think of
in terms of the Auction Entity. For starters, think about the current price. What should it be
when Auction has just been created, and no one ever touched it? It is a common knowledge
that auctions have some starting price to prevent items from being sold way below their
value. This is one of the auctioning domain intricacies we have just discovered by wondering
what should be the current price of an auction provided no one placed a bid yet.
class AuctionTests(unittest.TestCase):
def test_untouched_auction_has_current_price_equal_to_starting(
self,
) "-> None:
starting_price = Money(USD, "12.99")
auction = Auction(
starting_price=starting_price
)
assert (
starting_price
"== auction.current_price
)
This will fail for two reasons: Auction currently does not accept any parameters during
construction, and current_price property always returns None. Such an implementation will
make the test pass:
class Auction:
def !__init!__(
self, starting_price: Money
) "-> None:
self._starting_price = starting_price
@property
def current_price(self) "-> Money:
return self._starting_price
With ease, one can produce many unit tests that will check numerous scenarios. Such a test
suite is pretty extensive and takes very little time to execute. That is exactly what we want
to achieve by pulling decision-making down to the Domain layer where it is ridiculously
cheap to code and test. Although it is tempting to extensively unit-test our Entities
separately, avoid that. Much greater flexibility can be achieved if one decides to do it via
tests of Use Cases. There is much more information on testing strategies in Testing chapter.
Previously, also during public speeches, I advised on unit-testing Entities heavily. While it
still may be useful to nail certain edge cases on this testing level, I no longer think this
should be the default strategy. The way I currently recommend, is to test implementation of
Entities via tests calling Use Cases. At least in the beginning.
IMPLEMENTATION CONTINUED
Another Entity should be introduced in the process - Bid. It represents a single offer made by
a bidder. Note we do not introduce an Entity for a bidder. There is no need to - we really
need just their id to identify winners. If the existence of a separate Entity is justified by
business requirements, then we would create Bidder as well. Bid Entity is very simple.
Newly created Bids have no ids, but once we persist them, they all will get an identity.
@dataclass
class Bid:
id: Optional[BidId]
bidder_id: BidderId
amount: Money
def place_bid(
self, bidder_id: BidderId, amount: Money
) "-> None:
if amount > self.current_price:
new_bid = Bid(
id=None,
bidder_id=bidder_id,
amount=amount,
)
self.bids.append(new_bid)
@property
def current_price(self) "-> Money:
if not self.bids:
return self.starting_price
else:
return self._highest_bid.amount
@property
def winners(self) "-> List[BidderId]:
if not self.bids:
return []
return [self._highest_bid.bidder_id]
@property
def _highest_bid(self) "-> Bid:
return self.bids[-1]
This is how a textbook example of an Entity should look like. There are no dependencies, all
code is written just to enforce business rules. No messing with databases or any other
external stuff.
DATA ACCESS INTERFACE (ABSTRACT REPOSITORY)
Code responsible for logic that places bids resides in Auction Entity, but the Use Case still
does not have a possibility to fetch the Entity and persist it afterwards. It is time to write an
interface for this. Data Access Interface will reside on the same layer as Use Cases - the
Application.
NAMING
IMPLEMENTATION
Persistence-oriented repository - in its basic (and sufficient for the example) form it will
look as follows:
class AuctionsRepository(abc.ABC):
@abc.abstractmethod
def get(
self, auction_id: AuctionId
) "-> Auction:
pass
@abc.abstractmethod
def save(self, auction: Auction) "-> None:
pass
get method exists for retrieving Auction Entity using its AuctionId, while save method is
responsible for persisting Auction.
In this example, an implementation of Data Access Interface will finally be a concrete class that relies
on a relational database, e.g. PostgreSQL.
E VO LV I N G I N - M E M O RY I M P L E M E N TAT I O N W I T H T D D
There are only two methods in AuctionsRepository. A developer could cover them with a
single test provided they can assume that all Auctions will be created (saved for the first
time, to be precise) using it. Testing two methods at once may sound like an anti-pattern,
but for now it is exactly what we need to fully check the behavior of a class under test. This
is TDD, so we devote only minimal effort to push things forward:
class PlacingBidTests(unittest.TestCase):
def test_should_get_back_saved_auction(
self,
) "-> None:
bids = [
Bid(
id=1,
bidder_id=1,
amount=Money(USD, "15.99"),
)
]
auction = Auction(
id=1,
title="Awesome book",
starting_price=Money(USD, "9.99"),
bids=bids,
)
repo = InMemoryAuctionsRepository()
repo.save(auction)
For this to work, Auction has to support a comparison operator ( "== ). In languages that do
not support operators overloading we would use whatever convention is present there (e.g.
equals() method in Java). In Python, an implementation that makes it possible to compare
Entities can look like this:
class Auction:
def !__eq!__(self, other: "Auction") "-> bool:
# we check type and fields being identical
return isinstance(
other, Auction
) and vars(self) "== vars(other)
class InMemoryAuctionsRepository(
AuctionsRepository
):
def !__init!__(self) "-> None:
self._storage: Dict[
AuctionId, Auction
] = {}
def get(
self, auction_id: AuctionId
) "-> Auction:
return copy.deepcopy(
self._storage[auction_id]
)
You may wonder why the class keeps and returns copies of objects? In Python, objects are
passed using references. If one uses such reference-based repository to get Auction, they
would see dirty changes even if they were not explicitly saved with the repository. That is
not acceptable.
With an in-memory implementation, we are now good to go back to PlacingBid Use Case.
DEPENDENCY INJECTION
class PlacingBid:
def !__init!__(
self,
output_boundary: PlacingBidOutputBoundary,
auctions_repo: AuctionsRepository,
) "-> None:
self._output_boundary = output_boundary
self._auctions_repo = auctions_repo
M A K I N G F I R S T R E A S O N A B L E T E S T PA S S
As a consequence, we need to alter our test a bit to accommodate this change. We have to
create an instance of InMemoryAuctionsRepository, save an auction and pass it to the Use
Case:
class PlacingBidTests(unittest.TestCase):
FRESH_AUCTION_ID = 2
def _create_repo_with_auction(
self,
) "-> AuctionsRepository:
repo = InMemoryAuctionsRepository()
fresh_auction = Auction(
self.FRESH_AUCTION_ID,
"socks",
Money(USD, "1.99"),
[],
)
repo.save(fresh_auction)
return repo
Finally, there goes an implementation of a Use Case that passes the test:
class PlacingBid:
def execute(
self, input_dto: PlacingBidInputDto
) "-> None:
auction = self._auctions_repo.get(
input_dto.auction_id
)
auction.place_bid(
bidder_id=input_dto.bidder_id,
amount=input_dto.amount,
)
self._auctions_repo.save(auction)
output_dto = PlacingBidOutputDto(
is_winning=input_dto.bidder_id
in auction.winners,
current_price=auction.current_price,
)
self._output_boundary.present(output_dto)
Finally, the original test for the Use Case passes. Now, one could write another failing test
and evolve the implementation of PlacingBid Use Case with underlying Entities.
R E M OV E BO I L E R P L AT E CO D E W I T H R E FAC TO R I N G
For a TDD cycle to be complete, we should polish code a bit with refactoring. Currently, our
code is pretty simple and there are not many opportunities for improving it. However, we
can get rid of initializer methods __init__ if we leverage excellent attrs29 library which is a
3rd party replacement with more features for built-in dataclasses.
import attr
@attr.s(auto_attribs=True)
class PlacingBid:
_output_boundary: PlacingBidOutputBoundary
_auctions_repo: AuctionsRepository
def execute(
self, input_dto: PlacingBidInputDto
) "-> None:
""...
29 https://www.attrs.org/en/stable/
PAC KAGIN G C ODE
PAC K AG I N G A P P L I C AT I O N & D O M A I N C O D E
Code shown so far have not been yet arranged into layers. Although there is no notion of
web, real databases etc., it is already possible to treat Application with Domain as a separate
code artefact. In Python, it means this code can become a standalone package - like one of
those we get using pip install. This trick allows for enforcing The Dependency Rule, even
in such a liberal language like Python. In different programming languages, such as Java,
there are more appropriate building blocks for enforcing layers separation.
2. Infrastructure
3. Main
4. Web
Application & Domain will be stripped of external dependencies whenever possible, while
Infrastructure will be dependant on the former. Main package is introduced to decouple
assembling objects from any delivery mechanism. Obviously, it must be aware of two former
packages. Web will know Main, since it will not be assembling stuff on their own and needs
IoC container to get its hands on Use Cases. Suggested directories and files structure looks as
follows:
/
└── auctions
├── auctions # 1
│ ├── application # 2
│ │ ├── "__init"__.py
│ │ ├── repositories
│ │ │ ├── auctions.py
│ │ │ └── "__init"__.py
│ │ └── use_cases
│ │ ├── "__init"__.py
│ │ └── placing_bid.py
│ ├── domain # 3
│ │ ├── entities
│ │ │ ├── auction.py
│ │ │ ├── bid.py
│ │ │ └── "__init"__.py
│ │ ├── exceptions.py
│ │ ├── "__init"__.py
│ │ └── value_objects.py
│ └── "__init"__.py
├── tests # 4
│ ├── application # 5
│ │ ├── "__init"__.py
│ │ └── test_placing_bid.py
│ ├── factories.py
│ └── "__init"__.py
├── requirements-dev.txt # 6
├── requirements.txt # 7
└── setup.py # 8
A packaged code for Application & Domain is kept in a separate directory named as the
auctions. Inside, there is another directory with the same name (1) and "__init"__.py inside.
In Python, this is required to to be able to import code. Application (2) and Domain (3) are
inside auctions (1) contain two subdirectories with actual production code.
The test suite (4) is kept inside the package. Note it does not mirror the structure of the
application and domain directories. There is just a (5) application package to show where to
look for tests of Use Cases that are package’s API after all. One should avoid mirroring the
structure in tests to avoid unnecessary coupling. Such an approach also encourages practices
like unit-testing every class/function separately, which may make it much harder to refactor
in the future.
Remaining files - 6, 7 and 8 - are another Python-package specific files. Respectively, they
keep information about package dependencies and package metadata, like version or name.
Note for Pythonistas: it may look a bit odd that we create separate files for every entity,
repository etc. and do not keep them together in one file - entities.py or repositories.py.
With so little code that was shown so far it may look unnecessary, but in the long run,
keeping things separated is a much cleaner approach. To shorten import paths in other
modules, one may import all Entities from submodules to domain/entities/"__init"__.py
This is a standalone, database- and framework-agnostic package that has its own test suite.
It will be mentioned in other packages’ requirements files - in other words, other packages
will depend on it.
PAC K AG I N G I N F R A S T R U C T U R E C O D E
/root
├── auctions
│ ├── … # see previous directory structure
└── auctions_infrastructure
├── auctions_infrastructure
│ ├── "__init"__.py
│ ├── repositories
│ │ ├── "__init"__.py
│ │ └── in_memory_auctions_repository.py
│ └── settings.py
├── tests
│ └── repositories
│ └── in_memory_auctions_repository.py
├── requirements-dev.txt
├── requirements.txt
└── setup.py
There it is, a cohesive, self-contained package that depends on the previously presented
auctions package.
There is a concrete, although keeping data only in memory, implementation in one package
(auctions_infrastructure). In another package (auctions) lies its abstraction. It is high
time we wire those together - whenever abstraction is requested; a concrete class should be
returned. In other words, we are about to configure IoC container.
It is going to be placed in yet another packaged called just main. Every delivery mechanism
(web, CLI, background task queue, etc.) will have to use it in order to assemble the project.
/
├── auctions
│ ├── ""...
├── auctions_infrastructure
│ ├── ""...
└── main
├── main
│ └── "__init"__.py
├── requirements-dev.txt
├── requirements.txt
└── setup.py
Structure of main is straightforward and initially, will consist of just one file. With the
project growth, more files would appear, for example, to handle various aspects of
configuration. For starters, main will be responsible just for building IoC container.
It is going to be placed in yet another packaged called just main. Every delivery mechanism
(web, CLI, background task queue, etc.) will have to use it in order to assemble the project.
There are many different implementations of IoC containers out there. Some are configured
with XML, others with YAML. Yet another simply use code. For the sake of the example,
assume injector has been chosen. It is designed after Java’s Guice, so it should look familiar
even for no-Pythonistas. Before we inject everything everywhere, stop for a moment and let
me remind, that the whole goal of packaging is to organize code, by hiding information that
is irrelevant outside the package and expose only classes that are absolutely necessary. For
this to work with Injector, we make each of our packages to define a class inheriting from
main.Module:
from auctions import AuctionsRepository # 3
class AuctionsInfrastructure(injector.Module):
@injector.provider
def auctions_repo(
self,
) "-> AuctionsRepository: # 2
example_auction = Auction(
id=1,
title="Exemplary auction",
starting_price=Money(USD, "12.99"),
bids=[],
)
repo = InMemoryAuctionsRepository() # 1
repo.save(example_auction)
return repo
Infrastructure will provide a concrete implementation (1) of the abstract repository (2), so it
has to import the abstract class from auctions package (3). In the snippet above, we build an
instance of InMemoryAuctionsRepository with a predefined, example Auction Entity. That
helps to verify if everything works fine after attaching web interface. In production, one
would rather not be adding fake data like this, but imagine how powerful it might be to have
dedicated main files for different environments. For example, automated tests with in-
memory repositories or another main customized for frontend developers that mocks out
problematic dependencies. Looking from the perspective of our first package with code of
Application & Domain, it has to expose AuctionsRepository.
class Auctions(injector.Module):
@injector.provider
def placing_bid_uc(
self,
boundary: PlacingBidOutputBoundary,
repo: AuctionsRepository,
) "-> PlacingBidInputBoundary:
return PlacingBid(boundary, repo)
The module defined in auctions package is providing Use Cases using their abstractions. If we
use this extra layer of indirection, auctions should also expose all Input Boundaries. If not, we
expose all Use Cases directly:
class Auctions(injector.Module):
@injector.provider
def placing_bid_uc(
""...,
) "-> PlacingBid:
return PlacingBid(boundary, repo)
A word “expose” has been used several times, but what does it actually mean? It was
mentioned that a package should hide all information that is not relevant to the outside
world. For example, there is no reason to expose our Use Cases if we use Input Boundaries. We
only expose the latter. To visualize this, let’s use Java. By default, classes within the package
are not visible outside. To publish them, we use public access modifier. It should not be
added recklessly to every class, though. We only expose as little as possible. It is a little
harder to achieve a similar effect in Python, but one can hint clients of a package what can
and what should not be imported from it. Each of Python packages we shown before has
top-level "__init"__.py. Inside, we import everything we want to expose and append names to
a special list inside - "__all"__:
"__all"__ = [
# module
"Auctions",
# repositories
"AuctionsRepository",
# types
"AuctionId",
# use cases
"PlacingBid", # no input boundaries
"PlacingBidInputDto",
"PlacingBidOutputBoundary",
"PlacingBidOutputDto",
]
If one tries to import from auctions something that is not inside "__all"__, they’ll be warned
by linter and IDE.
Now that Injector modules are defined, they can be assembled inside main:
Returned Injector instance can be used to build any configured object. For example, if one
requests PlacingBid, IoC container would use Auctions.placing_bid_uc method to see it
also needs AuctionsRepository, so it would call AuctionsInfrastructure.auctions_repo to
build it first. IoC container makes it effortless to create objects graphs.
AT TAC H I N G W E B I N T E R FAC E
At this point, we are ready to expose our functionality to the outer world via HTTP API. My
weapon of choice is Flask microframework because it is lightweight and integrates smoothly
with the injector. To start with, we need a separate package for the web:
/
├── auctions
│ ├── ""...
├── auctions_infrastructure
│ ├── ""...
├── main
│ ├── ""...
└── web_app
├── requirements-dev.txt
├── requirements-dev.txt
├── setup.py
└── ""...
The basic structure of web_app package is not standing out, although most of it has been
omitted. A tree of directories would be typical for the chosen web framework. For Flask, we
would see blueprints, app factory file etc. The goal of this book is not to teach you, dear
reader, how to use web frameworks, so please kindly turn a blind eye on this. This example
is to demonstrate how to call Use Case from any delivery mechanism, not only web.
web_app also defines its injector module in order to provide all Output Boundaries specific for
the web. First things first, implementation of PlacingBidOutputBoundary - a Presenter:
class PlacingBidWebPresenter(
PlacingBidOutputBoundary
):
response: Response
def present(
self, output_dto: PlacingBidOutputDto
) "-> None:
message = (
f"Hooray! You are a winner"
if output_dto.is_winning
else f"Your bid is too low. Current price is {output_dto.current_price}"
)
self.response = make_response(
jsonify({"message": message})
)
Injector module:
class AuctionsWeb(injector.Module):
@injector.provider
@flask_injector.request # request scope
def placing_bid_output_boundary(
self,
) "-> PlacingBidOutputBoundary:
return PlacingBidWebPresenter()
FlaskInjector(
app,
modules=[AuctionsWeb()],
injector=main.setup_dependency_injection(),
)
return app
and a Flask-powered view:
@auctions_blueprint.route(
"/<int:auction_id>/bids", methods=["POST"]
)
def place_bid(
auction_id: AuctionId,
placing_bid_uc: PlacingBid,
presenter: PlacingBidOutputBoundary,
) "-> Response:
if not current_user.is_authenticated:
abort(403)
placing_bid_uc.execute(
get_input_dto( # 1
PlacingBidSchema,
context={
"auction_id": auction_id, # 2
"bidder_id": current_user.id, # 3
},
)
)
return presenter.response # 4
Note the very same instance of Presenter has been injected to Controller and to Use Case. That
happened because we defined a request scope for the
AuctionsWeb.placing_bid_output_boundary provider method. Otherwise, injector would
create a new instance upon every injection, and that would not work as expected.
For more details concerning web layer, especially those regarding JSON deserialization and
nuances of constructing PlacingBidInputDto please refer to code on GitHub30. Most of it is
heavily Python-specific anyway, so there would be little value added in explaining it all here.
30 https://github.com/Enforcer/clean-architecture
On the other hand, there is no magic there - it is just a glue code that leverages popular
serialization/deserialization Pythonic library - marshmallow.
Auctions are typical Entities - long-living creatures that have an identity. A lifetime of an
auction starts when it is created. Various settings can be adjusted, for example, starting
price. An auction may be published immediately or become a draft to be published later.
When an auction becomes publicly available, bidders are encouraged to place their bids.
Such a state only lasts to a certain moment in time when we consider the auction to be
ended. Ending an auction occurs at a time specified when the auction was created.
What is so special about EndingAuction Use Case that I decided to devote next pages to it?
There are several reasons:
2. it will not be invoked manually from any User Interface, it should rather happen
automatically once the auction ends or shortly after ending time has come,
3. since there is no UI involved, there also will not be anyone waiting for a result of
EndingAuction Use Case. Hence, there is no need for an Output DTO or Output Boundary,
Regarding point 1, in existing platforms I know, a winner pays in a separate step. Real-life
equivalent to our EndingAuction is limited to changing status of the auction and notifying all
bidders whether they won or lost. After that, winner has to go back to platform, select how
they want the item delivered etc. However, to boost educational value of the example I
combined payment and ending auction into one Use Case.
In a previous Use Case we implemented, the database was our only dependency. This time we
will require a payment provider. That is a bit different kind of dependency since such
integration will involve communication over the internet to a 3rd party service not hosted
within our infrastructure (as it was the case with the database). This part of an end-to-end
example is meant to teach you what is the role of Interface/Port and Interface Adapter/Adapter
in the Clean Architecture.
USE CASE OUTLINE WITH INPUT DTO
One can quickly start implementing the Use Case with the almost empty body:
@dataclass(frozen=True)
class EndingAuctionInputDto:
auction_id: AuctionId
class EndingAuction:
def !__init!__(
self, auctions_repo: AuctionsRepository
) "-> None:
self._auctions_repo = auctions_repo
def execute(
self, input_dto: EndingAuctionInputDto
) "-> None:
auction = self._auctions_repo.get(
input_dto.auction_id
)
auction.end()
# "?? payment "??
self._auctions_repo.save(auction)
Note that in this case there is absolutely no need for an Output DTO and an Output Boundary.
An administrator will be able to see the results of the Use Case using different means.
From a paragraph introducing EndingAuction Use Case you learned that bids cannot be placed
on an auction that has ended and an auction cannot be ended twice. Auction Entity should
make sure of it. There are few steps needed: Auction should get a new field for ending date
and creation time of new bids should be checked against it. Additionally, a boolean field will
be helpful to make sure an auction was ended exactly once.
Changes can be introduced gradually with TDD. First, a failing test for the existing Use Case:
class PlacingBidTests(unittest.TestCase):
def test_bid_on_ended_auction_raises_exception(
self,
) "-> None:
yesterday = datetime.now() - timedelta(
days=1
)
self.create_auction(ends_at=yesterday)
price = Money(USD, "10.00")
input_dto = PlacingBidInputDto(
bidder_id=1,
auction_id=self.AUCTION_ID,
amount=price,
)
with self.assertRaises(BidOnEndedAuction):
self.use_case.execute(input_dto)
This test is doomed to fail for several reasons. Firstly, Auction does not accept ends_at
argument. Secondly, BidOnEndedAuction does not exist yet. Before fixing that, you might
have noticed a little helper method create_auction for creating Auction:
class PlacingBidTests(unittest.TestCase):
def create_auction(
self,
ends_at: Optional[datetime] = None,
ended: Optional[bool] = None,
) "-> None:
if ends_at is None:
ends_at = datetime.now() + timedelta(
days=7
)
auction = Auction(
self.AUCTION_ID,
"socks",
Money(USD, "1.99"),
[],
ends_at,
)
self.repo.save(auction)
Now, let’s create a domain exception with base class:
class DomainException(Exception):
pass
class BidOnEndedAuction(DomainException):
pass
I don’t use Error or Exception suffixes in class names because they are redundant. Name of
the exception (especially one coming from the domain) itself should convey enough
information to identify what was exactly the problem.
class Auction:
def !__init!__(
""...
ends_at: datetime,
) "-> None:
""...
self.ends_at = ends_at
def place_bid(
self, bidder_id: BidderId, amount: Money
) "-> None:
if datetime.now() > self.ends_at:
raise BidOnEndedAuction
""...
Please note I only show new code that was added to the Entity.
I F E N T I T E S S H O U L D N OT H AV E A N Y D E P E N D E N C I E S , C A N T H E Y
USE TIME FUNCTIONS?
Generally speaking, Entities should be free from dependencies, including system clock.
Purists would surely scream in a fury before tearing this book apart. However, let’s be
pragmatic. It’s not a big deal in Python where we can control clock easier than Gaunter
O’Dim in Witcher 3. All we need is a freezegun31 library.
If for whatever reason we don’t want/can’t use it, one can always pass current date and time
to place_bid method, so Auction only has to compare timestamps. A current date time can
be obtained in Use Case using Port / Adapter for system clock. Speaking of Port & Adapter…
31 freezegun https://github.com/spulec/freezegun
I N T R O D U C I N G P O R T F O R PAY M E N T S
Port is for external services, the same thing as Data Access Interface is for Entities and their
persistence. It abstracts away details, for example, communication protocol. Is the payment
provider talking JSON and REST? Or maybe it understands only XML sent over SOAP? It
does not matter from a perspective of EndingAuction Use Case. It only needs an interface to
make payment.
First, a bunch of assumptions and important notes about popular payment providers -
especially for those readers who do not work with such services on a daily basis. Let’s
assume the payment is made using a credit or debit card. A bidder has to enter their card
details before they can bid. We do not want to store this data because a) it is risky b) you
would become an attack target c) it involves serious legal obligations. What is important is
that details of payment card are not even flowing through our backend. This data is sent
from frontend to a selected payment provider. In return, we get a token that we store.
Whenever we want to charge the payment card, we send an authorized request to the
payment provider with that token.
These deliberations are for one thing - to discover where a boundary between Use Case and
the Port should be. What I propose is to pass a bidder’s id and Money to be paid to Port’s
method. Then Adapter (implementation of the Port) is responsible for finding the associated
token. In the end, EndingAuction Use Case will be simpler, and all things specific to the given
payment provider nicely separated. Moreover, Bidder will not be polluted with any stuff that
is not specific to auctions.
class PaymentProvider(abc.ABC):
@abc.abstractmethod
def pay_for_won_auction(
self,
auction_id: AuctionId,
bidder_id: BidderId,
charge: Money,
) "-> None:
pass
Notice a comfort we have here, thanks to using Money Value Object. We can be sure that the
amount argument is valid and after a little love (I meant, conversion) will be accepted by
virtually any API.
Failures will happen. It would be naive to believe that our Adapters would always succeed.
An actual exception class that will be thrown depends on the Adapter. There are cases when
we would like to catch an exception inside Use Case and do something with it. Naturally, we
can not refer to adapter-specific exceptions classes inside a Use Case because it would violate
The Dependency Rule!
What we can do instead, is to define a class (or a hierarchy) of exceptions alongside the Port:
class PaymentFailed(Exception):
pass
class NotEnoughFunds(PaymentFailed):
pass
Use Cases can use them since they will reside on the same layer (Application). Now Adapter
can raise exceptions that either inherits from this class or raise it directly. I recommend
having at least one generic exception class for every Port.
Beware of excessive exception handling inside Use Case! If you are not able to recover or take
action justified by business requirements in a Use Case once exception was thrown, do not
even bother with catching it. Let higher layers deal with it or just let the request fail.
IMPLEMENTING ADAPTER
The Adapter for PaymentProvider is going to be something very, very concrete. Let’s assume
that we have a token stored in the database somewhere and we are able to get it using
bidder_id passed to pay_for_won_auction method. Actual charging involves an HTTP request
to payment provider’s API - /api/v1/charge endpoint with basic auth and application/json
Content-Type and Accept headers. Data to be passed in the request’s body is supposed to be a
JSON.
Example:
{
"card_token": "123456",
"currency": "USD",
"amount": "14.99"
}
If everything goes fine, we expect to get a 200 OK along with following JSON:
{
"charge_uuid": "67f0e9db-015d-407f-a58f-9c76551dc771",
"success": true
}
Without further ado, let’s see a code that does what we need:
class CaPaymentsPaymentProvider(PaymentProvider):
BASE_URL = "http:"//ca-payments.com/api/v1/"
CHARGE_SUFFIX = "charge"
def !__init!__(
self, login: str, password: str
) "-> None: # 1
self.auth = (login, password)
def pay_for_won_auction(
self,
auction_id: AuctionId,
bidder_id: BidderId,
charge: Money,
) "-> None:
card_details_model = BidderCardDetails.objects.filter( # 2
bidder_id=bidder_id
).first()
response = requests.post( # 3
self.BASE_URL + self.CHARGE_SUFFIX,
auth=self.auth,
json={
"card_token": card_details_model.card_token,
"currency": charge.currency.iso_code,
"amount": str(charge.amount),
},
)
if not response.ok:
raise PaymentFailedError # 4
else:
# record payment charge_uuid in DB, etc
""...
Interesting lines:
1. "__init"__ is where the authentication object is prepared for a format convenient for
Python’s requests library. Both login and password are some kind of secrets that should
not be hardcoded and will vary depending on the environment. main.py should take
care of passing proper values there
2. To retrieve card_token associated with a given bidder_id, one needs to reach the
database. You may wonder why there is no entity nor repository for
BidderCardDetails? Because there is no need to. The Adapter is meant to be concrete.
We cannot test it anyway without mocking HTTP calls. Such tests would have hardly
any value. Storage options, as well as structure of HTTP requests and responses, are
tightly coupled to this class anyway.
In case anything goes wrong, we raise an exception defined alongside the PaymentProvider
Port (provided there it makes sense to handle it instead of just letting the request fail).
You probably noticed that pay_for_won_auction is not the best function in the world. It is
quite long and does several things. Depending on what is actually abstracted away by a Port,
a corresponding Adapter may grow over time to eventually become one of these monster
classes that have over one thousand lines of code, and no one wants to touch them. In fact,
there is no rule telling to keep everything inside Adapter class! The latter is meant to be thin.
Initially, when an Adapter implements one or two methods, a developer might be reluctant
about refactoring it to not fall into the trap of premature optimization. Once an Adapter
grows, it will be wise to resort to Facade design pattern.
See this refactoring in practice:
class CaPaymentsPaymentProvider(PaymentProvider):
""...
def pay_for_won_auction(
self,
auction_id: AuctionId,
bidder_id: BidderId,
charge: Money,
) "-> None:
request = ChargeRequest( # 1
card_token=dao.get_bidders_card_token(
self._session,
bidder_id
), # 2
currency=charge.currency.iso_code,
amount=str(charge.amount),
)
response = self._execute_request(
request, ChargeResponse
) # 3
dao.record_successful_payment(
self._session,
auction_id,
bidder_id,
charge,
charge_uuid=response.charge_uuid,
)
def _execute_request(
self,
request: Request,
response_cls: Type[ResponseCls],
) "-> ResponseCls:
response = requests.post(
request.url,
auth=self.auth,
json=asdict(request),
)
if not response.ok:
raise PaymentFailedError
else:
return response_cls("**response.json())
We start from the final look of the Adapter. Now it looks much more generic.
Interesting lines:
1. We introduced data classes for HTTP requests and responses. They enforce the
presence of parameters. We will see the implementation in just a moment
Second refactoring amounted to pulling all DB-interacting code to a separate module -
dao.
The thing that was left here is the logic responsible for making HTTP calls. It is
generic, easily extendable with custom Requests and Responses dataclasses. We could
have also pulled it to separate function/module/class.
Now let’s see how clear these requests and responses look like:
@dataclass(frozen=True)
class Request:
url = "http:"//ca-payments.com/api/v1/"
method = "GET"
@dataclass(frozen=True)
class ChargeRequest(Request):
card_token: str
currency: str
amount: str
url = Request.url + "charge"
method = "POST"
Such a structure allows for a declarative extending list of handled endpoints with a little
effort. Responses look very similar, except there is no base class.
Finally, dao is not very interesting, but I show it for the sake of completeness:
def get_bidders_card_token(
session: Session, bidder_id: BidderId,
) "-> str:
card_details_model = (
session.query(BidderCardDetails)
.filter_by(bidder_id=bidder_id)
.one()
)
return card_details_model.card_token
def record_successful_payment(
session: Session,
auction_id: AuctionId,
bidder_id: BidderId,
charge: Money,
charge_uuid: str,
) "-> None:
entry = PaymentHistoryEntry(
auction_id=auction_id,
bidder_id=bidder_id,
amount=charge.amount,
currency=charge.currency.iso_code,
charge_uuid=charge_uuid,
)
session.add(entry)
After such a refactoring session, Adapter becomes merely a thin Facade over quite a complex
subsystem.
R E A D O N LY O P E R A T I O N S
It is high time we covered a touchy subject - implementing read-only operations with the
Clean Architecture. It is something we could definitely implement right now using building
blocks that we have seen so far - Use Case, Input-, Output DTO and Repository. Let’s assume we
are to allow potential bidders look at auction details - its title, starting and current prices. To
make the whole example more interesting, we can add a list of 3 top bids with anonymized
bidders username. Anonymization, in this case, means that we will display the first letter of
each username followed by an ellipsis.
We shall not be prejudiced. Let’s see how the first idea works out. GettingAuctionDetails
Use Case should, for sure, accept an auction_id in its Input DTO. As for Output DTO, we
already know what is expected.
@dataclass(frozen=True)
class GettingAuctionDetailsInputDto:
auction_id: AuctionId
@dataclass(frozen=True)
class TopBidder:
anonymized_name: str
bid_amount: Money
@dataclass(frozen=True)
class GettingAuctionDetailsOutputDto:
auction_id: AuctionId
title: str
current_price: Money
starting_price: Money
top_bidders: List[TopBidder]
Job of GettingAuctionDetails is to pull required data from repositories and repack them to
Output DTO:
class GettingAuctionDetails:
def !__init!__(
self,
output_boundary: PlacingBidOutputBoundary,
auctions_repo: AuctionsRepository,
bidders_repo: BiddersRepository,
) "-> None:
self._output_boundary = output_boundary
self._auctions_repo = auctions_repo
self._bidders_repo = bidders_repo
def execute(
self,
input_dto: GettingAuctionDetailsInputDto,
) "-> None:
auction = self._auctions_repo.get(
input_dto.auction_id
)
top_bids = auction.bids[-3:]
top_bidders = []
for bid in top_bids:
bidder = self._bidders_repo.get(
bid.bidder_id
)
anonymized_name = (
f"{bidder.username[0]}""..."
)
top_bidders.append(
TopBidder(
anonymized_name, bid.amount
)
)
output_dto = GettingAuctionDetailsOutputDto(
auction_id=auction.id,
title=auction.title,
current_price=auction.current_price,
starting_price=auction.starting_price,
top_bidders=top_bidders,
)
self._output_boundary.present(output_dto)
I believe you did NOT like this solution. I also hope you do not want to give up on the Clean
Architecture yet. My point here is that although Use Case approach works great for scenarios
that involve mutating data, it becomes a burden when all that one wants to is to retrieve
some information in a possibly efficient way. The code above is not only verbose but also
inefficient. It suffers from a classical flaw of ORM n + 1 queries. Luckily, we still have some
rabbits left in our hat.
CQRS pattern introduced in one of the previous chapters demonstrated how beneficial
separating code responsible for writes from reads might be. It is the right moment we
leveraged that knowledge and talked a bit about how to implement read side in the project.
Quick reminder: three variants of how one can implement read side of CQRS were described
in this book. For this example, the last one (Read Model Facade) will be used. To make
things simpler, assume there is a single database for both writes and reads.
Read Model Facade interface will be a bunch of methods for each model/entity. Every one of
them will return prepared query object that client can customize by adding filters or
(depending on implementation) joining more models. Actual implementation will naturally
be something very specific to the underlying database. Here is how it could for Django and
the aforementioned example with a list of 3 top bidders:
class AuctionsReadFacade:
def auctions(self) "-> models.Manager:
return Auction.objects
Usage example:
def details(
request: HttpRequest, auction_id: int
) "-> HttpResponse:
try:
auction = (
AuctionsReadFacade()
.auctions()
.get(pk=auction_id)
) # 1
except ObjectDoesNotExist:
raise Http404(
f"Auction "#{auction_id} does not exist!"
)
bids = (
AuctionsReadFacade()
.bids()
.filter(auction_id=auction_id) # 2
.select_related("bidder")
.order_by("-amount")[:3]
)
ctx = Context(
{"auction": auction, "bids": bids} # 3
)
tpl = Template( # 4
"""{% load app_filters %}"""
"""Auction: {{ auction.title }}<br>"""
"""Price changed from {{ auction.starting_price|dollars }}"""
"""to {{ auction.current_price|dollars }}<br>"""
"""Top bids":<br>"""
"""{% for bid in bids %}"""
"""{{ bid.amount|dollars }} by {{ bid.bidder.username|anonymize }}<br>"""
"""{% endfor %}"""
)
return HttpResponse(tpl.render(ctx))
AuctionsReadFacade is very thin in this case. It may make little sense with Django (just like
the Clean Architecture doesn’t play well with this framework), but we are only to grasp the
idea. Interesting lines:
1. We fetch auction first to be able to quickly discover if it is not there and react with
HTTP 404
2. Notice how heavily customized this Queryset is. We filter it, join Bidder model, order
and limit the result set
3. Unaltered data obtained from Read Model Facade is passed for further processing
An alternative approach that would relieve view (or controller in other languages) from having
to customize raw query objects of the underlying persistence mechanism is to use Query
classes. The latter is to contain details and hide it behind some nice, descriptive name.
In the above example, it could look like this:
class GetAuctionDetails:
@dataclass(frozen=True)
class Dto: # 1
auction: Auction
bids: List[Bid]
def query(
self, auction_id: AuctionId
) "-> "Dto":
auction = Auction.objects.get(
pk=auction_id
) # 2
bids = (
Bid.objects.filter(
auction_id=auction_id
)
.select_related("bidder")
.order_by("-amount")[:3]
)
return self.Dto(auction, bids)
Interesting lines:
def details(
request: HttpRequest, auction_id: int
) "-> HttpResponse:
try:
dto = GetAuctionDetails().query(
auction_id
)
except ObjectDoesNotExist:
raise Http404(
f"Auction "#{auction_id} does not exist!"
)
ctx = Context(asdict(dto))
tpl = Template(""...)
return HttpResponse(tpl.render(ctx))
Bear in mind that Query and Read Model Facade classes are part of our application’s interface
next to Use Cases. It means that if one finds it valuable to have an abstract class (or interface)
for every Use Case - Input Boundary, it may also help to abstract away Query. Doing this with
Read Model Facade would be actually much, much harder - one would have to abstract whole
underlying ORM which makes absolutely no sense.
class GetAuctionDetails(abc.ABC):
@dataclass(frozen=True)
class Dto: # 1
auction: Auction
bids: List[Bid]
@abstractmethod
def query(
self, auction_id: AuctionId
) "-> "Dto": # 2
pass
Interesting lines:
Use Cases shown so far controlled the flow only directly. Simply saying, a Use Case always
behaved as if it was the most strict and observant conductor - nothing could happen without
its knowledge.
This approach is a desirable one if you need to coordinate a dance of Repositories, Entities and
Ports in which everything has to happen in the specific order. However, in real-life Use Cases
will have side effects that may be crucial from the perspective of stakeholders, yet they
simply do not fit into our current set of building blocks. For example, sending e-mails to
bidders that have just been overbid. To actually send an e-mail, there must be some network
communication with an SMTP server. If you are itching to write a Port/Adapter pair, hold
your horses for a moment. There are a few tricky questions to answer. Where does the
content of the e-mail, its looks and feel, template etc. belong? To Application? No, too many
details are concrete. If not, should Port be something more generic, like
CommunicationGateway that could as well send push notifications or SMS messages to mobile
devices? It becomes obvious that Port/Adapter approach would be clumsy. Another idea is
needed.
Sending e-mails has to be decoupled from Use Cases. Neither network communication nor
contents of e-mails belong to the Application layer. Primarily used inverting control
technique (Port/Adapter + Dependency Injection) does not fit here. If only there would be a way
to let know all interested parties about some significant event that happened within
auction… Well, there is. And it is called just Event. Before we delve into implementation,
let’s consider what Event actually is. To put it simply, Events represent facts - their sole
occurrence means that something happened. They cannot be denied or rejected. Events are to
decouple a sender from any other object that is interested in state changes of the sender.
The latter is not even aware of whether anyone is listening for events. This is the main
difference between Port/Adapter and Event/Listener. In the picture below, you can see, there is
an additional party involved - Event Bus, serving as a Mediator. Its goal is to decouple Event
sender from Listener.
EVENT IMPLEMENTATION
Figure 6.3 Emitting Events via EventBus decouples sender from a listener. The sender is not aware of how many listeners
are subscribed or even if there is one
In the context of the example, with sending e-mails after a bidder has been overbid, we
could express that situation with BidderHasBeenOverbid Event. Notice the past tense used - it
emphasizes what Event is - a piece of information, that something happened. More examples
related to auctions we could think of are BidPlaced, AuctionStarted or AuctionEnded. In
terms of implementation, an Event is simply a Data Transfer Object:
@dataclass(frozen=True)
class BidderHasBeenOverbid:
auction_id: AuctionId
bidder_id: BidderId
new_price: Money
We now know example Events; now the question is which building block is actually
responsible for sending them? In the Port/Adapter approach, it was the Use Case which was
responsible for coordination. Use Cases may simply not know enough when it comes to
emitting Events, though. The knowledge needed for creating Events is available in Entities. Use
Cases would need extra Entity’s method calls and potentially complex ways of rediscovering
what just happened to build an Event. By doing so, we would risk lowering the quality of
Entities encapsulation. Hence Events belong to the most inner circle in the Clean Architecture
- they are part of the Domain. Hence, they can be (and often are) just called Domain Events.
Events as a decoupling technique have enormous potential. Since they are simple data
structures, they can be serialized and then sent over the network to a different application.
Let’s not jump into this swamp at this moment. For now, assume that both sending and
listening objects live within one operating system process. We will get back to more
complex scenarios, I promise.
How about Event Bus? At a minimum, we expect it to give us an ability to subscribe for given
Event type (single Event may have 0 or more subscribers) and to emit Events:
class EventBus:
def emit(self, event: Event) "-> None:
""...
def subscribe(
self,
event_cls: Type[Event],
listener: Callable,
) "-> None:
""...
Normally, Event Bus should be a dependency of the project. 3rd party solutions are more than
enough. If we wrote one ourselves, we would have to resort to a certain workaround.
EventBus and base Event class would be placed in another package, called for example,
foundation. Beware, the foundation is not a place for so-called utility classes (an incoherent
bunch of functions where one is responsible for striping off non-ascii characters and another
one for checking if a given date belongs to a given range)! This “trick” is to make Event Bus
part of our standard library, so to speak.
/root
├── auctions
│ ├── auctions
│ │ ├── application
│ │ │ └── ""...
│ │ └── domain
│ │ ├── events
│ │ │ ├── bidder_has_been_overbid.py
│ │ │ └── ""...
│ │ └── ""...
│ └── tests
│ └── ""...
└── foundation
├── foundation
│ ├── "__init"__.py
│ ├── event.py
│ └── event_bus.py
├── requirements.txt
└── setup.py
It should not be placed within the directories structure of the Clean Architecture, because
this will be, spoiler alert, reused among different project modules.
Events itself are a part of the Domain, no doubt. Event Bus resembles a Port. Hence we would
expect it to be a part of the Application. However, that would lead to a paradoxical situation.
The almighty The Dependency Rule states clearly - Entity MUST NOT use Event Bus since it
resides above the Domain layer. Despite the fact that Entity has to emit Event and the latter
should be passed to EventBus.emit.
Figure 6.4 Repository emits events obtained from Entity via EventBus
In this approach Entity creates events instances, but since it is not permitted to use Event Bus
(or anything else outside Domain) it hoards them in a private field. Later, when the Repository
saves an Entity, it has to collect all pending events, and then pass them to Event Bus.
class Auction:
def !__init!__(""...) "-> None:
""...
self._pending_domain_events: List[
Event
] = [] # 1
def _record_event(
self, event: Event
) "-> None: # 2
self._pending_domain_events.append(event)
@property
def domain_events(self) "-> List[Event]: # 3
return self._pending_domain_events[:]
def place_bid(
self, bidder_id: BidderId, amount: Money
) "-> None:
""...
self._record_event( # 4
BidderHasBeenOverbid(
self.id,
old_winner,
amount,
self.title,
)
)
Interesting places:
3. The repository will use this method to get pending events. It returns a copy of a list
class SqlAlchemyAuctionsRepo(AuctionsRepository):
def !__init!__(
self,
connection: Connection,
event_bus: EventBus,
) "-> None: # 1
self._conn = connection
self._event_bus = event_bus
Interesting places:
This design is pretty clean, though it requires to put emitting logic in a concrete repository.
It is expected the same logic will appear in all repositories and can be considered a
major downside of this approach. Naturally, we could refactor it and polish the code even
more. However, the goal of these snippets is to present an idea - pending Domain Events are
passed to Event Bus during saving of an Entity.
Let’s remind what a CQS (Command-Query Separation) class design is. Simply saying,
methods can be categorized into one of two categories - commands or queries. Commands
change state of the object and return nothing while queries return anything, but are
forbidden to mutate state of the instance. This is a design that was deliberately chosen for
our Auction Entity. The second approach comes down to returning Events from methods that
are commands. Although it is a trade-off, it shifts responsibility of collaborating with Event
Bus from Repository to Use Case. I bet you admit it actually suits Use Case pretty well.
Implementation-wise, our Entity has to return Events at the end of command execution:
class Auction:
""...
def place_bid(
self, bidder_id: BidderId, amount: Money
) "-> List[Event]:
events = [] # a list for events created
if self._should_end:
raise BidOnEndedAuction
class PlacingBid:
""...
Since this is at least partially a Python book, we can also make the whole thing a bit easier
by getting rid of events list in a method. How? First, we turn our command method into a
generator:
class Auction:
""...
def place_bid(
self, bidder_id: BidderId, amount: Money
) "-> Generator[Event, None, None]:
""...
if amount > self.current_price:
""...
yield WinningBidPlaced(
self.id, bidder_id, amount, self.title
)
if old_winner and old_winner "!= bidder_id:
yield BidderHasBeenOverbid(
self.id, old_winner, amount, self.title
)
We can not yet stop here - a yield keyword stops the execution of a method. With a
combination of a snippet above from Use Case or Repository, it means that first Event can be
dispatched to listeners even though we have not finished processing of command yet! What
is more, if we find ourselves in an undesired situation and want to raise an exception
between Events, we may end up with part of the events emitted already. This is probably not
what we want to happen.
To compensate, we could always flatten generated Events using list or write a decorator that
will be doing it for us:
def command_returning_events(
method: Callable[""..., Generator[Event, None, None]]
) "-> Callable[""..., List[Event]]:
@functools.wraps(method)
def wrapped(*args: Any, "**kwargs: Any) "-> List[Event]:
return list(method(*args, "**kwargs))
return wrapped
class Auction:
""...
@command_returning_events
def place_bid(
self, bidder_id: BidderId, amount: Money
) "-> Generator[Event, None, None]:
""...
SUBSCRIBING TO EVENTS
Without subscribers, the Event would simply get lost in the void. For now, the best place for
subscribers to sign up for future Events is already known to you - main. The next chapter
shows other method, but for now let’s reuse main:
def setup_dependency_injection(
event_bus: EventBus,
) "-> None:
def di_config(binder: inject.Binder) "-> None:
binder.bind(EventBus, event_bus) # 2
""...
inject.configure(di_config)
def setup_event_subscriptions(
event_bus: EventBus,
) "-> None:
event_bus.subscribe( # 3
BidderHasBeenOverbid,
lambda event: send_email.delay(
event.auction_id,
event.bidder_id,
event.money.amount,
),
)
Interesting lines:
Events are enormously powerful decoupling technique. When direct control with Ports/
Adapters does not feel right or leads to creating awkwardly looking Ports’ interfaces, Events are
your best bet.
Regardless of the implementation we choose, the lovely thing about making Entities
producing Events is that the former stay easily testable. By nature, Events are Data Transfer
Objects and can be seen as Value Objects - indistinguishable provided their fields have equal
values.
If we were to test Entity that keeps Events inside, then all that one needs to check is compare
value for domain_events property with a list of expected Events:
auction.place_bid(
bidder_id=1, amount=winning_amount
)
As long as everything happens within one database transaction and communication with
external services over the network is not involved, one can sleep well. Regrettably, this is
almost never the case. Even aforementioned, simplified examples can fail in a number of
interesting ways. Though “interesting” is not probably a word you would use being woken
up in the middle of the night during your on-call shift. Reliability is not something one can
take lightly.
Let’s consider a scenario when someone overbids another bidder on a given auction:
1. If we react to the event right after the Entity/Repository emits it, we will be sending e-
mail before a transaction is committed, meaning that overbid may still fail. The
Auction would look like as if an e-mail receiver has not been overbid at all
2. if the background job is triggered before commit of the original transaction AND it
reaches to the database for data it may happen it will not find simply because the first
transaction is still in progress and changes it made are invisible to other connections.
A classic race condition.
Although these problems seem to be very hard to solve at first glance, their cause is trivial -
misuse of Events! In the context of transactional RDBMS, nothing actually happened as long
as the transaction is still in progress. It means that any side effects caused by events cannot
be triggered until the DB transaction is committed. Before that, emitted Events are
unfounded. The situation quickly gets tricky if there are more databases or message brokers
involved. Welcome to the world of distributed systems.
Let’s not fall into paranoia, though. Without considerable scale, problems like the first one
are unlikely to happen. This is not the case with 2. situation though. Such a race condition
can bite us unexpectedly, even if there is only one user.
Is there an easy way out for both problems? There is. Many database access libraries
implement callback functionality to call user-defined logic after the transaction is
committed. If we combine it with task queues (every event subscriber schedules a task to be
executed after transaction commit), we will get the desired behavior. A more formal
approach is to use Unit of Work pattern. However, the solution is not 100% reliable, although
it is good enough in certain cases and especially at a lower scale.
Unit of Work is an abstraction over a so-called business transaction. Please do not confuse it
with a database transaction which has a narrower scope. There is a lot of similarities,
R E L I A B L E M E S S AG E - S E N D I N G : T H E O U T B OX PAT T E R N
Distributed systems that communicate using messages sent over the network
are tricky beasts. Even if we disregard the fallible nature of networks, things
still may not work as we want.
Now, in simpler cases that involve just sending a message out or scheduling a
background job AFTER transaction is committed The Outbox Pattern comes
to the rescue. The idea is very simple - we save messages to the same database
within a transaction that caused them. We end up with persisted changes and
temporarily saved messages. Then, another designated thread, coroutine or
process opens another transaction, picks up pending messages and sends
them. Next, messages are removed from the database, and the second
though - Unit of Work also groups a few operations to either succeed or fail as one. It is just
not restricted to the database queries.
Originally, Unit Of Work33 was only to track changes made on models to minimize the
number of issued queries. In fact, advanced ORMs (like SQLAlchemy) implement such a
mechanism. Our Unit of Work will be a bit more specialized, and it will expose four handy
methods - begin, commit, rollback and register_callback_after_commit. Under the hood, it
33 Martin Fowler, Patterns of Enterprise Application Architecture, Chapter 11, Unit Of Work
Figure 6.5 Simplified diagram showing the flow of control. send_email is a callback that has been scheduled due to an
event edited during the lifetime of a current Unit Of Work
will call appropriate methods of a database transaction and maintain a list of scheduled
callbacks:
class UnitOfWork(abc.ABC):
@abc.abstractmethod
def begin(self) "-> None:
pass
@abc.abstractmethod
def rollback(self) "-> None:
pass
@abc.abstractmethod
def commit(self) "-> None:
pass
@abc.abstractmethod
def register_callback_after_commit(
self, callback: typing.Callable[[], None]
) "-> None:
pass
An example usage could look like follows:
def setup_event_subscriptions(
event_bus: EventBus,
) "-> None:
event_bus.subscribe(
BidderHasBeenOverbid,
lambda event: ( # 1
inject.instance( # 2
UnitOfWork
).register_callback_after_commit( # 3
lambda: send_email.delay( # 4
event.auction_id,
event.bidder_id,
event.money.amount,
)
)
),
)
Interesting places:
3. …and told to run another anonymous function right after the current transaction is
committed
4. Callback does not accept any arguments, but we can create a closure using another
lambda, so we still have data we need.
Depending on the underlying database access library, a Unit Of Work can be merely a thin
wrapper around it or provide more advanced features, like increased reliability and
guarantees about executing after-commit callbacks.
Putting it all together - the lifetime of Unit Of Work is insignificantly longer than the lifetime
of a database transaction in the scenario explained under Events vs transactions vs side
effects. In the context of web, Unit Of Work is created once we receive a new request and
committed upon the request handling completion. If there is an unhandled exception
thrown in the process, Unit Of Work’s rollback method will be called. Its role is to discard
any changes made. In most cases, it will be just rollbacking database transaction and get rid
of pending Events.
Event Bus has to be aware of the Unit Of Work existence. Most notably, it must operate within
the context of a certain instance of Unit Of Work. While mapping between Events and their
subscribers is to be considered a part of the configuration and typically is not changed once
the application is started, Event Bus scheduling callbacks after the transaction is committed
indicates there is a need for statefulness. Unit Of Work provides it.
The tricky part is to make sure that Event Bus will always get the right Unit Of Work instance.
IoC Container should be able to provide it - a capability we look for is called scopes. If we are
talking about handling HTTP requests, we want so-called request scope, that exists only as
long as handling a single HTTP request. Of course, each request gets its own scope to be
fully isolated from others. When we are processing tasks in a worker, we usually want to
have scope per task.
Usually, there should be no problem in using mature IoC Containers in the aforementioned
way. There should be either plugin to popular frameworks or a possibility for writing a
minimal amount of glue code. Documentation of an exemplary DI container, Autofac, is very
elaborate about integrating it.
A lot of space was devoted to discussing transactions handling. You might have thought this
had little to do with the Clean Architecture, but it had to be to mentioned since transactions
are one of the so-called cross-cutting concerns. Simply saying, it is something that is present
everywhere in the application and affects a lot of things by neglecting transactions handling
one risks data inconsistency. In certain projects, this may not be a significant problem, and it
would be more feasible just to call the customer and apologize. In other businesses, data
inconsistency is a serious problem. It might require halting entire system, fixing
inconsistency and reverse all side-effects since it occurred. Of course, a bug that caused the
disaster still needs to be fixed. It is like stopping a scorching train that is pulled by a steam
locomotive, filled up with expensive goods that have a short expiration date and having to
fix tracks that train is already on. Oh, I forgot about upset passengers and stakeholders
asking every few minutes how long will the delay last. You get the picture.
Not all cross-cutting concerns are able to bite you that much, but putting a slight on them
may result in bad code that will be later difficult to maintain. Since this is the Clean
Architecture book, let’s pay them some attention.
CONFIGURATION
Various settings, like access tokens to external services or credentials to the database, has to
be loaded once an application starts. There is also a second class of settings - that are used
to parametrize various parts of the application, like frequency of periodic tasks or pool sizes.
The third kind of settings are so-called feature toggles. Basically, these are boolean flags
responsible for turning on/off features they concern. For example, they may be used for
disabling features that are not yet complete or make sense only in certain environments, like
development or production. One example is using dummy, test payment gateway instead of
a real one.
As we can see, there are plenty of occasions for reaching to configuration throughout all
application. A quick and dirty solution would be to create a class that will represent the
configuration like:
class Config(dict):
pass
Then we plug it into dependency injection by initializing the instance with settings.
Whenever we need some configuration, we inject Config and use it as if it was a raw
dictionary:
def setup_dependency_injection(
settings: dict,
) "-> None:
def di_config(binder: inject.Binder) "-> None:
binder.bind(Config, Config(settings))
inject.configure(di_config)
# somewhere in code
class CaPaymentsPaymentProvider(PaymentProvider):
@inject.autoparams()
def !__init!__(self, config: Config) "-> None:
self.auth = (
config["payments.login"],
config["payments.password"],
)
On the bright side, this solution is very fast to implement. On the downside, it is not
especially clean. Suddenly, dozens of classes & functions start to depend on Config. It makes
Config to be practically immutable code, since changing it would break too much. Naturally,
there is little we can do about the implementation shown above. Making a lot of classes
depend on Config is not a good idea. Besides that, there are no restrictions on what data is
available to what class. To sum up - such a design is suboptimal.
def setup_dependency_injection(
settings: dict,
) "-> None:
def di_config(binder: inject.Binder) "-> None:
binder.bind(
PaymentProvider,
CaPaymentsPaymentProvider(
settings["payments.login"],
settings["payments.password"],
),
)
inject.configure(di_config)
In the next chapter you will also see a more object-oriented approach to configuration
passing when we dive deeper into modularity.
VA L I DAT I O N
Validation is one of the trickiest parts of the Clean Architecture. On the one hand, there are
plenty of libraries and web frameworks components that do just that - validation. They play
smoothly with CRUD applications. On the other hand, there we have Input DTOs being
passed to Use Cases. Input DTOs may not be the right place to carry on validation. However,
once it gets to Use Case we expect it to be correct and safe to use.
To sum up, Input DTO provides:
• In Use Case, I should be able to rely on semantic correctness (e.g. type) of attributes
◦ I am given a bidder_id equal to 3. That is fine; I know that IDs of users are
positive integers. Input DTO instances must not exist with semantically incorrect
attributes, e.g. bidder_id equal to "incorrect".
• In Use Case I usually do not know yet if a semantically correct Input DTO will suffice for
performing a business operation
◦ Typically one will not try to check if a Bidder with given bidder_id even exists
before calling Use Case
To guarantee semantic correctness, the ultimate solution would be to leverage Value Objects
and existing validation solutions. In Python, there are plethora of excellent libraries that
allows for serializing dictionaries into objects, for example marshmallow34 or Pydantic35 just
to name a few. For a complete example, please refer to web_app package36 of the exemplary
project.
In an ideal world, inner layers (Domain & Application) would be relying on Value Objects that
by design are immutable and guard their correctness.
SYN C HRONIZATION
If you have ever taken part in any online auction, you probably know how does it feel like
when someone overbids you in the last second. Perhaps the winner was not the only one
who tried their luck hoping to offer a better price in the very last moment. This task is,
however, better suited for bots. I imagine that countless times bots were competing against
each other in the last second of an auction. Such a competition provokes the most ordinary
race conditions. Unless one secures themselves.
It is called optimistic locking. In short, we assign a version to every Entity. When we fetch
one from a datastore, we also remember what version it had. When the time comes for
persisting the Entity, we do it conditionally. The condition checks if the version in the
datastore equals to what we got when we were fetching the Entity in the beginning. If it
does, we update the Entity and bump up the version by one. In case the version changed
(has been bumped up already by someone else in the meantime) then we either fail or retry
the whole operation. The problem is that the datastore we use has to support such a
conditional update.
Implementation details will vary on the database one uses. I will present how to do it using
plain SQL:
BEGIN; # 1
UPDATE # 3
auction
SET
current_price = {new current price},
version = {original version} + 1
WHERE
id = {id}
AND version = {original version};
COMMIT; # 4
Interesting lines:
3. This statement persists our changes provided that no one bumped up the version in
the meantime
In many cases, though, locking is undesirable since it impacts performance. If two bidders
try their luck at the same time, one of them will have to retry and will block server
resources for at least twice as much. An alternative is to rework a design, so it does not need
locking at all. If we think about auctions, we could insert every new bid to the table in the
database. However, it would then no longer be feasible to keep the current price on an
auction. It would become a virtual property that could be calculated only by getting the
highest bid. It is also quite hard to detect who was overbid and when. Choosing a
synchronization strategy is a game of trade-offs. Moreover, it is heavily dependant on
underlying data storage.
C HAPTER SUMMARY
This chapter has been a wild ride with lots of new information. Implementation of two Use
Cases (PlacingBid and EndingAuction) has been presented. Auction Entity has undergone a
gradual evolution to meet more and more requirements. All written business code was
covered with tests. Concept of Data Access in the form of abstract Persistence-Oriented
Repository was presented. Port for making payments and its concrete counterpart, Adapter has
been crafted. Various approaches to implementing Read Side were discussed. Finally,
inverting control with events was demonstrated. In the very end, few tips on dealing with
cross-cutting concerns were given.
It takes time and writing some code to fully wrap one’s head around all of these building
blocks. It also helps to look at them from a broader perspective of an entire application:
The crucial thing to discover is where the boundaries of the application lie. State of the
system is mutated using Use Cases which accept Input DTOs. During mutation, Events may be
37 SQLAlchemy https://www.sqlalchemy.org/
38 Hibernate https://hibernate.org/
39 Doctrine https://www.doctrine-project.org/
Figure 6.6 Dependencies between majority of building blocks demonstrated in this chapter
emitted to let know outer world what happened. There is also a Read Facade that allows
querying the system about its state.
MODULARITY
"Software development is a learning process; working code
is a side effect." - Alberto Brandolini
Let’s say we successfully released our application to production. Even though the dust has
not settled yet, stakeholders are already full of new ideas. Also, first customers gave
feedback. It is inevitable - new features requests are coming. Brace yourself because this is
going to be the ultimate trial for the design you produced. It is interesting to observe how
much of a carefully crafted code will still be there once we start serious development under
pressure.
Where do business requirements come from? The answer varies depending on the team
structure. If you are developing a pet project, you are your own boss and the only source of
vision of the project. A single Scrum team with proxy product owner is in a slightly more
complicated situation - the latter takes care of distilling requirements and present a clear
vision to the developers. Even then, a product owner often has to deal with a not-always-
compatible feature or change requests coming from different people, or in a greater scale -
departments. Not to mention users demanding new features that are crucial from their
point of view. This should already ring a bell - this is a situation that Single Responsibility
Principle warns about - the same code might have more than one reason to change. Luckily,
we already know the remedy - refactor code, so individual parts do not have to change for
different reasons.
Humans (in particular software developers) are bad when it comes to naming things. Even
within one organization, the same term may mean a completely different thing to various
people. Consider item in the context of auctioning platform that also manages stock. When
an auction is configured, an item is nothing more than a starting price, a bunch of images,
name, maybe some attributes. From the standpoint of placing bids, the item is irrelevant.
However, the item is everything for the bidder - they participate in the auction just to get it.
When they do win an auction, and the item is about to be shipped, what matters most are
its size, weight or location in the warehouse. It is next to impossible to reconcile all these
different meanings and sets of features if we tried to create a single class. It would be
bloated with many attributes or methods which are relevant only in specific contexts.
If it is so hard and impractical to satisfy everyone’s needs at once, what else can we do?
Recall good ol’ divide et impera rule - meet needs separately when it makes sense, so instead
of one giant problem we have several much smaller ones.
Imagine stakeholders requested a new functionality: each user can opt-in for some areas of
interest, e.g. technology, fantasy books, healthy food or parenting. Users can opt-in and opt-
out anytime.
/root
├── auctions
│ ├── …
├── auctions_infrastructure
│ ├── …
└── web_app
Where should we put new code? Areas of interest have nothing in common with auctions
after all. We want our core module to stay cohesive, meaning no one wonders “what the
heck is this thing doing here? The package is called auctions, but that areas of interest thing
are definitely not about auctioning. We should avoid sticking everything in one package
unless we want to end up with a scary Big Ball of Mud40.
One of the Clean Architecture chapters, The missing chapter is precisely about this issue -
packaging code. Keeping our modules, cohesive quickly pays off as they are much easier to
understand and maintain. Hence, it is logical to create a new package for this feature - let’s
call it preferences. Such packing approach would be a materialization of the idea from The
missing chapter - package by feature.
How should the preferences package look like? Writing Use Cases, Entities, Data Access Interface
and its implementation for such a simple thing is a bit of an overkill. The actual feature is
not too complicated - it is nothing more than saving a bunch of boolean values coming from
a single form with several checkboxes.
Hence, nothing prevents us from implementing certain parts of the application using
different, simpler approaches. Even without any abstractions along the way to the database.
All these features of the modular approach mentioned above may resemble microservices.
They have, in fact, a lot in common. Many qualities of a good microservice can also be used
to describe a well-written module. For example, a microservice is considered to be properly
decoupled from the rest of the system if introducing changes there does not require
rebuilding half of the application. Being able to develop a microservice without worrying
about breaking other moving parts in the system indicates a fine dose of decoupling and
encapsulation.
Forming boundaries of microservices is a big thing. It is virtually impossible if the team does
not have a strong understanding of the project and its domain. A potential mistake can cost
a lot. In case of sub-packages, living in a single codebase, fixing a wrong design is more
affordable - it usually amounts to moving code from one package to another. It is way harder
with microservices, especially if they were developed in total separation for several months
or longer.
The most obvious difference is that there is no network communication overhead when one
module talks to another - unlike microservices. Microservices also require significantly more
effort from the (Dev)Ops to keep them up and to run than a single, monolithic application.
Not to mention increased cost of deployment and maintenance.
Good modular design does not obstruct splitting into microservices. What is more, it allows
for deferring this decision with a far-reaching consequence until we find it is necessary. For
example, a given module which is responsible for communication with Payment Provider
might be moved to a microservice in order to implement extra security measures. Putting it
on a dedicated machine, limiting the number of people with access to literally few, keeping
credentials to Payment Provider only there would definitely reduce the number of possible
attack vectors. Another example might be team growth and management challenges - why
not organize a team around features and allow each one of them to own codebases
separately?
To sum up, there is absolutely no need to migrate to microservices if all that one needs is to
have nicely organized, maintainable codebase. It is perfectly fine to stay with a monolith
provided it is modular. Also, there is a fancy word for a modular monolith - it is modulith.
Modular monolith is a reasonable middle ground between one disorganized code base and
microservices.
You might have noticed that before this chapter, there was no single occurrence of a term
user. I omitted it on purpose since this is one of the vaguest and abused names in software
projects. It definitely does not help to organize code into modules. In the context of
auctions, Bidder is the right name to use, at least when we talk about a person that places
bids. It is quite easy to come up with names for each module. There will be Subject
(authentication), Recipient (mailing) or Reporter (support). However, in the end, one will
have to associate Subject with corresponding Entity from another module. The easiest way is
to share common ID, while the rest of Entity’s data depends on the module it belongs to.
Recall how PlacingBidInputDto was created in the web package. bidder_id is a required
argument which can be obtained from the authentication mechanism that is in use in the
web framework.
All this modules talk might (and I hope it does!) ring a bell for readers familiar with
strategic Domain-Driven Design. For those unfamiliar with DDD, there is a modeling
building block called Bounded Context. It delimits an area of applicability of a given model,
guaranteeing its cohesion. It means that we should instead produce more specialized
models. There is no need either benefit from putting everything in one User class. We also
do not have to worry if our cohesive, specialized Bidder is not usable in another Bounded
Context. It is better when it is not. The relation between Bounded Context and module is
that Bounded Context can span one or more modules, but there should never be multiple
Bounded Contexts within a single module. Strategic DDD is outside the scope of this book.
I highly recommend having at least a read about it.
S TRATEGIC DOMAIN-DRIVEN DESIGN
If the strategies as mentioned earlier sound like second-guessing to you and you
would rather use something more formalized, try to apply strategic patterns of
Domain-Driven Design. In particular, read about a technique called Context Mapping.
The resultant artefact is called Context Map. The latter shows discovered Bounded
Contexts and relations between them. The practical approach to get it is to conduct
Event Storming session(s). One requirement, though - stakeholders has to be
included in the workshop. Software developers will not be able to come up with an
accurate vision on their own, period. The last piece of advice - do not treat the
Context Map as something set in stone. Requirements and knowledge of all involved
people will evolve over time and as every written document, it may simply become
obsolete at some point.
MODULES IMPLEMENTATION
To sum up, a module is a cohesive package of code with its own vocabulary (in particular
having a specific name for a user). It groups related functionalities and exposes them using a
module API. A well-designed module resembles a microservice, though there is no network
communication needed to use the module.
Modules that follow the Clean Architecture expose a subset of standard building blocks for
others to use:
• Use Cases
Modules that do not implement the Clean Architecture are free to use the structure that is
the most convenient. Since a module is expected to have an API, one can leverage Facade
design pattern. A Facade is to be used in the same manner as Use Cases or Queries -
synchronous call from the outside. Module’s Facade exposes methods for mutating and
querying the state of the system within the module. Let’s take a module which contains
code responsible for customer relationship management. More precisely, it sends
informative e-mails.
class CustomerRelationshipFacade:
def create_customer(
self, customer_id: int, email: str
) "-> None:
""...
def update_customer(
self, customer_id: int, email: str
) "-> None:
""...
def send_email_about_overbid(
self,
customer_id: int,
new_price: Money,
auction_title: str,
) "-> None:
""...
def send_email_about_winning(
self,
customer_id: int,
bid_amount: Money,
auction_title: str,
) "-> None:
""...
The trickiest part about modules is that they need to cooperate. It is a rare situation that
module can exist in complete isolation without being aware of any other modules. However,
dividing codebase into modules and figuring out how they depend on each other is orders of
magnitude more maintainable solution than adding more and more code to a single Big Ball
of Mud. Grouping code quickly pays off.
Provided there are two modules (respectively A and B), there are three possible relations
between them:
3. B depends on A
S E PA R AT E WAY S
Situation #1 does not require any additional explanations. A and B exist unaware of each
other.
Assuming both A and B implement the Clean Architecture, how A can call B’s code? When
it comes to a synchronous, direct control, we would usually want to call Use Case of one
module from another. However, we do not want to call B’s Use Case directly - it would couple
A with B and made it next to impossible to test one without another. Luckily, in the Clean
Architecture, there is a building block that solves the problem - Input Boundary, being just an
abstraction over Use Case.
Figure 7.1 One module’s Use Case calls Use Case of another module via InputBoundary
This solution is good enough if it is acceptable for a caller module to embrace naming of the
called module. An Input Dto belonging to B has to be assembled in A after all. The decisive
factor here is which module is more significant from the business perspective. If the called
module (B) is, then one can only reconcile with the fact that A has to know something
about B’s language. In the opposite case, when a leak of B’s (inferior) vocabulary into A
(superior) is undesirable, there is still something one can do. What we can do instead, is to
write a Port in A while B has to provide an Adapter. The former belongs to A where it will be
used, so there is no risk of leaking knowledge from B to A. Method names of the Port/
Adapter will belong to A. Still, the implementation will repack data appropriately to call B’s
Use Case eventually. Therefore, the Port/Adapter pair will not be complex - it just separates
two worlds.
Figure 7.2 Use Case calls Port within the same module. The Port abstracts away Adapter from another module
However, bear in mind that direct dependency is a very tight form of coupling and
should be used sparingly. It makes perfect sense when both modules are part of the same
Bounded Context, and they share a vocabulary.
An indirect dependency decouples subscribing module from the one that emits events. The
subscribing module may still be coupled to another one if it imports any code (e.g. event
class to configure subscription). To fully decouple two modules, we need to configure
subscriptions in the main module, just like it was shown in the previous example. For most
of the time we won’t be needing that, though.
Figure 7.3 One module’s Handler subscribes to events from another module. When it gets notified, it calls an
appropriate Use Case
In case of a direct dependency, when CRUD module has to use another one that implements
the Clean Architecture, we call the Use Case (optionally via corresponding Input Boundary).
When we deal with an opposite situation, we add a Port to the Clean Architecture module
and implement Adapter in the CRUD module.
Again, direct use introduces strong coupling between modules. Use it with caution. In case
of an indirect dependency, one resorts to events.
F L AVO R S O F E V E N T S - B A S E D I N T E G R AT I O N
One of the paragraphs above mention about one module subscribing to events emitted by
another one. It suggests that the subscribing module still needs to rely on code from the
second module directly. Naturally, this is a form of coupling. It is not always a problem, but
when it is, one has a way out. In a yet another module, one puts handlers that subscribe to
events and call appropriate logic from other modules. These handlers do all the translation
on their own, effectively making modules unaware of each other. For a multi-stage process
with cascades of events, a patterns called Saga or a Process Manager can help. Process Manager
will be discussed and presented later in the book.
Don’t worry if the above examples seem a bit abstract right now. Provided we have quite a
few options we may hesitate when choosing the one to go with. There is no single good
approach - each one of them has unique features which may or may not suit our modules.
We will learn it soon. Everything will be illustrated with examples on a few next pages.
C A S E S T U DY - AU C T I O N I N G P L AT F O R M
So far, code examples concentrated solely on auctioning itself. However, it is not enough to
successfully run an auctioning platform. After all, we need to receive payments, notify
bidders about various things or ship the won item to them. Usually, stakeholders are the
best source for this kind of information. Domain knowledge may be acquired during
meetings or workshops, like Event Storming.
If direct co-operation is limited or not possible, one can always try a mental exercise of
mapping different features onto different departments of the company working. One
employee conducts auctions, while another one accepts payment and makes financial
reports. Yet another employee cares about making sure that won item gets to the winner.
This way of modeling works best when we know how the business works.
It may also occur that a new module will emerge organically - developers will discover them
themselves. In our case, we discovered a new module twice, when our Ports and Adapters
were growing and growing. In both cases, at first Port and Adapter were pretty simple, with
no more than three methods. When we added the 8th method to a Port and Adapter by then
was a Facade over quite complicated subsystem, it became clear it is time to refactor and find
a new home for this code.
AUCTIONIN G PL ATFORM MODULES
Foundation - place for classes shared between different modules. Foundation serves as a
standard lib extension. Also known as Shared Kernel in DDD. For example, Money class
belongs here.
Auctions - everything related to auctions itself. In particular, all nuances of auctioning are
modelled here.
Customer Relationship - hosts details about communication with the customer, like e-
mails contents.
Payments - here lies code responsible for processing payments and integration with an
external provider.
Processes - home of Sagas and Process Managers. It contains complex business processes that
span multiple modules.
Main - special module that is solely responsible for assembling everything else.
Web Application - an HTTP interface to the platform. Heavily dependant on the chosen
web framework.
These are not all, of course. However, these are main, language-agnostic modules. We are
going to need several auxiliary ones which are specific to the programming language and 3rd
party libraries we use. We could also add more, for example, inventory management, but
let’s keep things a bit simpler to maximize educational value.
You surely noticed that Customer is used in Customer Relationship and Payments module.
Needless to say, Customer means an entirely different thing in each of these modules.
A N ATO M Y O F A M O D U L E - C O M M O N PA R T
Each module of the project exposes public classes to the outer world.
• Domain Events
• Input Boundaries (or Use Cases if the module does not use the former)
• Input DTOs
• Ports
"__all"__ = [
# module
"Auctions",
# events
"AuctionEnded",
"WinningBidPlaced",
"BidderHasBeenOverbid",
# repositories
“AuctionsRepository",
# types
"AuctionId",
# use cases
"PlacingBid",
"PlacingBidOutputBoundary",
"WithdrawingBids",
# input dtos
"BeginningAuctionInputDto",
"EndingAuctionInputDto",
"PlacingBidInputDto",
"WithdrawingBidsInputDto",
# queries
"GetActiveAuctions",
"GetSingleAuction",
]
class Auctions(injector.Module):
@injector.provider
def placing_bid_uc(
self,
boundary: PlacingBidOutputBoundary,
repo: AuctionsRepository,
) "-> PlacingBid:
return PlacingBid(boundary, repo)
@injector.provider
def withdrawing_bids_uc(
self, repo: AuctionsRepository
) "-> WithdrawingBids:
return WithdrawingBids(repo)
A counterpart module with infrastructure-specific implementations, called
Auctions Infrastructure, looks as follows:
"__all"__ = [
# module
"AuctionsInfrastructure",
# models, needed for SQLALchemy ORM to discover them
"auctions",
"bids",
]
class AuctionsInfrastructure(injector.Module):
@injector.provider
def get_active_auctions(
self, conn: Connection
) "-> GetActiveAuctions:
return SqlGetActiveAuctions(conn)
@injector.provider
def get_single_auction(
self, conn: Connection
) "-> GetSingleAuction:
return SqlGetSingleAuction(conn)
@injector.provider
def auctions_repo(
self,
conn: Connection,
event_bus: EventBus,
) "-> AuctionsRepository:
return SqlAlchemyAuctionsRepo(
conn, event_bus
)
Notice that we are exposing a lot of stuff outside. If we implemented a CQRS write stack, we could
expose much less. Instead of showing off Use Cases (or Input Boundaries) and Input DTOs, we would be
exposing only Command classes (roughly equivalent to Input DTOs).
class Payments(injector.Module):
@injector.provider
def facade(
self,
config: PaymentsConfig,
connection: Connection,
event_bus: EventBus,
) "-> PaymentsFacade:
return PaymentsFacade(
config, connection, event_bus
)
Second Facade-based module, Customer Relationship, should not surprise us. It also
exposes just a Facade:
"__all"__ = [
# module
"CustomerRelationship",
"CustomerRelationshipConfig",
# facade
"CustomerRelationshipFacade",
]
class CustomerRelationship(injector.Module):
@injector.provider
def facade(
self,
config: CustomerRelationshipConfig,
connection: Connection,
) "-> CustomerRelationshipFacade:
return CustomerRelationshipFacade(
config, connection
)
Processes is one of the most interesting modules, though the majority of its code will be
skipped for now. It will be discussed in detail later in this chapter. It will also expose the
Dependency Injection module:
"__all"__ = [
# module
"Processes"
]
class Processes(injector.Module):
""...
Main and Foundation are a bit different creatures - the role of the former is to assemble all
other modules into application while the latter is simply a place for commonly used classes
or functions. Hence, neither of them expose a Dependency Injection module named Main or
Foundation. On the other hand, in the example project written for the book, Main defines
several helper module classes. Most of them will be technology-specific, but there is an
exception - Configs module class to provide all configurations required for other modules.
Both Payments and CustomerRelationship require such a DTO to work:
class Configs(injector.Module):
def !__init!__(self, settings: dict) "-> None:
self._settings = settings
@injector.singleton
@injector.provider
def customer_relationship_config(
self,
) "-> CustomerRelationshipConfig:
return CustomerRelationshipConfig(
self._settings["email.host"],
int(self._settings["email.port"]),
self._settings["email.username"],
self._settings["email.password"],
(
self._settings["email.from.name"],
self._settings[
"email.from.address"
],
),
)
@injector.singleton
@injector.provider
def payments_config(self) "-> PaymentsConfig:
return PaymentsConfig(
self._settings["payments.login"],
self._settings["payments.password"],
)
Apart from that, Main will take Dependency Injection module classes and will use them to build
an Injector - our IoC Container:
# somehere in Main
def setup_dependency_injection(
settings: dict,
connection_provider: ConnectionProvider,
) "-> injector.Injector:
return injector.Injector(
[
Db(connection_provider),
RedisMod(),
Rq(),
EventBusMod(),
Configs(settings),
Auctions(),
AuctionsInfrastructure(),
CustomerRelationship(),
Payments(),
Processes(),
],
auto_bind=False,
)
A resultant IoC Container is capable of building for us a tree of objects to access our
applications functionality. For the time being, it remains ignorant of delivery mechanism…
Last but not least, there is a Web Application. Structure of this one is ultimately dependant
on the web framework and chosen technology. The one important matter is to choose IoC
Container and web framework that play nicely together, as it is the case with Flask and
injector. The ultimate result we want to achieve is to get rid of the explicit use of IoC
Container and leave it to tools. We use a Flask-Injector lib to provide integration:
The rule should be visible with the naked eye. In essence, a module is not only a bunch of
code put under a uniquely named directory but also a first-class citizen that defines a
Dependency Injection module. Later, IoC Container (Injector class in code examples) will use
modules classes to resolve requested classes and handlers. However, note that Injector
module classes are something very specific to Python and Injector. A Holy Grail is to make
code ignorant about DI as much as possible - especially in Auctions module.
It may be worrying that different modules do not follow the same architectural pattern.
Although uniformity is highly desired, leaving it behind is a trade-off between complex
modules requiring the Clean Architecture and simpler ones. In other words, this decision
means giving up on uniformity but gets rid of over-engineering in less-complex modules.
It is relatively easy to accept that modules will have different internal structure. First of all,
they have to be specialized for a particular problem they model. Secondly, a module’s
internal structure is like private attributes of a class - they are not meant to be used directly.
They are implementation detail hidden behind the public API of a class. That being said, if
unifying internal structure is not a good idea, how about modules API? Is there any middle-
ground between approach derived from the Clean Architecture with Input Boundaries + Input
DTOs and simple Facades?
Actually, there is, and it was already mentioned in the book - it is a combination of
Command, Command Handler and Command Bus patterns from CQRS. Refactoring from the
Clean Architecture is fairly easy - Input DTOs become Commands, Input Boundaries are
removed, and Use Cases are registered as Command Handlers so that Command Bus can dispatch
Commands. For Facades it is also not complicated - we create Command classes and wire them
together with Facade’s methods or split the Facade into a bunch of Command Handlers.
From now on, to use Auctions module services, one only has to know about
PlaceBidCommand. After instantiating a Command, one uses a Command Bus - Mediator-style
pattern that will dispatch PlaceBidCommand to appropriate Handler.
TCommand = TypeVar("TCommand")
class WithdrawBidHandler:
def !__call!__(
self, command: WithdrawBid
) "-> None:
# dummy handler just prints what it gets
print(f"Handling {command}!")
class Auctions(Module):
def withdraw_bid_handler(
self,
) "-> Handler[WithdrawBid]:
return WithdrawBidHandler()
# et voila!
command_bus.dispatch(WithdrawBid(bid_id="123"))
# prints "Handling WithdrawBid(bid_id='123')!"
Note a profound consequence - the module no longer has to expose Use Cases or their Input
Boundaries as it was the case with PlaceBidHandler. In other words, we encapsulate more,
minimizing coupling with other modules.
This refactoring is fully optional - it will not be presented further in the book. It was
mentioned for the sake of addressing doubts about lack of architectural uniformity across
modules.
Thanks to the explicit nature of Dependency Injection, it becomes obvious how modules of
auctioning platform depend on each other.
Starting from the bottom, Foundation does not depend on any other module. Ideally, it
would only use a standard library of the programming language and certain 3rd party
libraries that do not tie it with a concrete database, framework etc.
Auctions, as well as Shipping, being both the Clean Architecture based modules, depend
on Foundation only.
Customer Relationship in our case rely on Auctions module because it will react to certain
Auctions’ domain events. For example, when BidderHasBeenOverbid or WinningBidPlaced
occurs, the module sends appropriate e-mails. Customer Relationship also uses 3rd party
libraries to access the database.
Payments does not depend on any other module, but it is tied to 3rd party libraries to
access the database. It does not subscribe directly to any events. Therefore it does not rely
on Auctions or Shipping. Its Facade’s methods will be called in a bit different way, which
will be described later in this chapter.
Processes module glues all other modules that take part in more complex business
scenarios. It will depend on Auctions, Shipping, Customer Relationship and Payments to
either react to their events or call Facade’s methods/Use Cases accordingly. Implementation
details will be explained thoroughly, just a few paragraphs further in this chapter.
Main will rely on every other module but Web Application. Although Main module knows
how to assemble the application, it remains ignorant of a delivery mechanism, i.e. web, CLI,
background task queue etc.
Web Application will depend on Main and any other module which will be connected to
the Web API. In this case, these are Auctions and Payments.
Kind reminder - even though a few modules have access to the database, they do not share
database tables! Models defined in one module stay private to this module.
Although events emitting were already discussed in the previous chapter, the most
important facts have to reminded because Events are the cornerstone of integration between
modules. Hence, we have to be much more serious about them.
To start with, in the Clean Architecture-based module, Events can be emitted from Repository
(Entity keeps events in a private field until the moment it is being saved) or from Entity
(then we consciously violate the Dependency Rule). It should also not be a big deal to emit
events inside Use Case if it does not make sense inside an Entity.
In modules that are implemented using Facade based approach, we emit events inside Facade
methods:
class PaymentsFacade:
def !__init!__(
self,
config: PaymentsConfig,
connection: Connection,
event_bus: EventBus,
) "-> None:
""...
def capture(
self, payment_uuid: UUID, customer_id: int
) "-> None:
""...
self._event_bus.post(
PaymentCaptured(
payment_uuid, customer_id
)
)
From Events vs transactions vs side effects we already know that until the transaction is
committed, Events are unfounded. A straightforward way to circumvent the problem is to
run event handling logic in a background job that will be triggered only after the transaction
is committed. This is a quite useful design, but a bit generalized version is going to be used
- from now on, such behavior shall be known as asynchronous event handling.
• A class that emits Events stays utterly ignorant about the way how they are handled
With Python Injector, it can be achieved by using a combination of multibind and generics.
Firstly, we need a couple of generics for synchronous and asynchronous event handling:
class Handler(Generic[T]):
"""Simple generic used to associate handlers with events using DI.
e.g. Handler[AuctionEnded].
"""
pass
class AsyncHandler(Generic[T]):
"""An async counterpart of Handler[Event]."""
pass
Whenever we want to bind a new handler for a given Event, we can do this in an elegant
way inside Dependency Injection module class:
binder.multibind(
# a synchronous handler for AuctionEnded
Handler[AuctionEnded],
to=SYNCHRONOUS_HANDLER,
)
binder.multibind(
# an asynchronous handler for AuctionEnded
AsyncHandler[AuctionEnded],
to=ASYNCHRONOUS_HANDLER,
)
def !__call!__(
self, event: BidderHasBeenOverbid
) "-> None:
self._facade.do_something(…)
binder.multibind(
Handler[AuctionEnded],
to=EventHandlerProvider[SomeEventHandler],
)
binder.multibind(
AsyncHandler[AuctionEnded],
to=AsyncEventHandlerProvider[
SomeEventHandler
],
)
The way binding is done is fully transparent to events and handlers. On the bright side, this
design is very flexible and allows to model everything the way we need.
As always, implementation details are a very specific thing to how this particular Event Bus
was written using Injector:
class InjectorEventBus(EventBus):
"""A simple Event Bus that leverages injector."""
def !__init!__(
self,
injector: Injector,
run_async_handler: RunAsyncHandler, # 1
) "-> None:
self._injector = injector
self._run_async_handler = (
run_async_handler
)
try:
async_handlers = self._injector.get(
AsyncHandler[type(event)]
)
except UnsatisfiedRequirement:
pass
else:
assert isinstance(
async_handlers, list
)
for async_handler in async_handlers:
self._run_async_handler(
async_handler, event
) # 5
Interesting lines:
3. It may happen that no one is subscribed to a given event, so we silence exception here
4. Synchronous handlers are called in place as if they were functions. Notice that by now
handler classes are instantiated with injected dependencies. Handler class has to
define special method "__call"__ so it can be called like a function.
In a limited range of cases, it might be helpful to handle events emitted within the same
module. Take Payments module for example - once payment card of a customer is charged
(funds are reserved), we need to capture it (confirm payment), which will eventually result
in funds being transferred to our bank account.
There is a possibility to do both steps at once, but depending on payment gateway it may
not always be reliable. Moreover, there are scenarios (not in auctioning platform, though)
when we specifically want to split the payment into two phases. For example, we reserve
funds, then we take care about merchandise using API of some external provider and only if
the latter succeeds, we capture funds.
Nonetheless, for the sake of example, let’s assume that once payment card is charged, one
wants to capture the funds in the background. The charge should be online if possible
because, in case of incorrect card details, a customer is able to see immediately if something
is wrong. Capture is unlikely to fail, so it can be safely executed in the background.
The facade of Payments does charge, then emits event:
class PaymentsFacade:
def charge(
self,
payment_uuid: UUID,
customer_id: int,
token: str,
) "-> None:
payment = self._dao.get_payment(
payment_uuid, customer_id
)
try:
charge_id = self._api_consumer.charge(
payment.amount, token
)
except PaymentFailedError:
self._event_bus.post(
PaymentFailed(
payment_uuid, customer_id
)
)
else:
""... # code skipped
self._event_bus.post(
PaymentCharged(
payment_uuid, customer_id
)
)
class PaymentChargedHandler:
@injector.inject
def !__init!__(
self, facade: PaymentsFacade
) "-> None:
self._facade = facade
def !__call!__(
self, event: PaymentCharged
) "-> None:
self._facade.capture(
event.payment_uuid, event.customer_id
)
Dependency Injection module class deals with subscribing for event:
class Payments(injector.Module):
def configure(
self, binder: injector.Binder
) "-> None:
binder.multibind(
AsyncHandler[PaymentCharged],
to=AsyncEventHandlerProvider(
PaymentChargedHandler
),
)
Et voilà! Note there is nothing unusual in both handler and Facade that would indicate there
is any asynchronous process in place. This is important only when a developer writes the
binding.
Handling events coming from different modules is not really different from handling them
within the same module. A simple example is sending an e-mail once a bidder has been
overbid. Bidding logic is in Auctions module, especially inside Auction Entity. That’s where
an event is emitted from:
class Auction:
def place_bid(
self, bidder_id: BidderId, amount: Money
) "-> None:
old_winner = (
self.winners[0] if self.bids else None
)
if amount > self.current_price:
""...
if old_winner:
self._record_event(
BidderHasBeenOverbid(
self.id,
old_winner,
amount,
self.title,
)
)
All communication with the customer belongs to Customer Relationship module where an
appropriate handler is defined:
class BidderHasBeenOverbidHandler:
@injector.inject
def !__init!__(
self, facade: CustomerRelationshipFacade
) "-> None:
self._facade = facade
def !__call!__(
self, event: BidderHasBeenOverbid
) "-> None:
self._facade.send_email_about_overbid(
event.bidder_id,
event.new_price,
event.auction_title,
)
All it does, it calls a Facade’s method. Handlers within modules are not going to be anything
more complex than simple glue code.
class CustomerRelationship(injector.Module):
def configure(
self, binder: injector.Binder
) "-> None:
binder.multibind(
AsyncHandler[BidderHasBeenOverbid],
to=AsyncEventHandlerProvider(
BidderHasBeenOverbidHandler
),
)
Although this way of integrating modules is straightforward, it is not a go-to solution for
every case. To subscribe to an event coming from Auctions module, one has to import it
into Customer Relationship. This makes these two coupled together. It is not too
problematic here, because this is merely an illustration of the nature of Customer
Relationship - it is meant to inform customers about many different things that happened
within the system. Inevitably, this module will have to know about many others.
Beware of modeling business flows spanning several modules this way, though. When
module A subscribes for an event from module B, and the latter subscribes for an event
from module C, imagine how pleasant it is to read such a scattered code.
It is extremely difficult to understand such a process when a developer has to jump through
different modules and recreate the whole flow in their heads. Luckily, there is an appropriate
pattern for this situation - Process Manager.
Splitting codebase into modules increases the distance between code fragments responsible
for complex business scenarios. In isolation, none of Use Cases or Facades’ methods are
difficult. The one missing piece here is some pattern that makes processes spanning
multiple modules explicit and easy to comprehend.
First of all, both Saga and Process Manager subscribe to Events coming from different modules.
They take on coordination and removes the need for dependencies between modules.
The main difference between these patterns is that Saga is stateless whereas Process Manager
is stateful. Thus, Process Manager maintains internal state to know how to react. This feature
has much in common with State41 design pattern. On the other hand, Saga does not keep any
data with it. It may require modules to provide additional querying capabilities. In the book,
we will see a case study of a Process Manager.
41 Bert Bates et al, Head First Design Patterns, Chapter 10. The State Pattern: The State of
Things
First, let’s see Events involved in the process:
• AuctionEnded
• PaymentCaptured
• PackageShipped
Note that each one of them is emitted from a different module. A code snippet that gives an
idea about implementation:
class PayingForWonItem:
@method_dispatch
def handle(
self,
event: Any,
data: PayingForWonItemData,
) "-> None:
raise Exception(
f"Unhandled event {event}"
)
@handle.register(AuctionEnded) # 1
def handle_auction_ended(
self,
event: AuctionEnded,
data: PayingForWonItemData,
) "-> None:
assert data.state is None
self._payments.start_new_payment(""...)
self._customer_relationship.send_email_about_winning(
""...
)
@handle.register(PaymentCaptured) # 2
def handle_payment_captured(
self,
event: PaymentCaptured,
data: PayingForWonItemData,
) "-> None:
assert (
data.state
"== State.PAYMENT_STARTED
)
self._customer_relationship.send_email_after_successful_payment(
""...
)
self._shipping.register_new_package(""...)
@handle.register(PackageShipped) # 3
def handle_package_shipped(
self,
event: PackageShipped,
data: PayingForWonItemData,
) "-> None:
assert (
data.state
"== State.SHIPPING_STARTED
)
self._customer_relationship.send_email_after_shipping(
""...
)
Interesting lines:
Each handler inspects the internal state of Process Manager before executing any logic. In this
way, a Process Manager enforces correctness just like State Machine does. These checks also
bring idempotency - we are safe from reacting to the same events more than once. This
might not sound like a big deal in a modular, yet still monolithic application, but is a highly
desired feature in distributed systems.
Even though PayingForWonItem is a trivial example, its state is not a single variable (e.g.
State.PAYMENT_STARTED or State.SHIPPING_STARTED). A Process Manager has to keep just
enough information to autonomously make decisions using limited data enclosed with new
Events. In practice, Process Manager copies all data it needs from Events. In other words,
handlers change Process Manager’s state, which can be quite a complex data structure:
@dataclass
class PayingForWonItemData:
process_uuid: uuid.UUID
state: Optional[State] = None
winning_bid: Optional[Money] = None
auction_title: Optional[str] = None
auction_id: Optional[int] = None
winner_id: Optional[int] = None
For state mutation example, see the second half of the handler of AuctionEnded:
class PayingForWonItem:
@handle.register(AuctionEnded)
def handle_auction_ended(
self,
event: AuctionEnded,
data: PayingForWonItemData,
) "-> None:
""...
data.state = State.PAYMENT_STARTED
data.auction_title = event.auction_title
data.winning_bid = event.winning_bid
data.auction_id = event.auction_id
data.winner_id = event.winner_id
Later, we will pass this data to appropriate Facades or Use Cases, e.g:
class PayingForWonItem:
@handle.register(PaymentCaptured)
def handle_payment_captured(
self,
event: PaymentCaptured,
data: PayingForWonItemData,
) "-> None:
""...
self._customer_relationship.send_email_after_successful_payment(
data.winner_id,
data.winning_bid,
data.auction_title,
)
An alternative to keeping all data in the state is to use appropriate Queries to fetch extra
information. Then, the state will be kept to minimum. Obviously, this is a no-go when our
system is eventually consistent, because Queries not necessarily will return the newest
information.
Process Managers’s statefulness makes it possible to implement more complex scenarios, like
retrying payments up to N times - one has to record the required information in the internal
state. Second, very common Process Manager’s application is timeouts handling. Say a winner
has a day to pay. Otherwise, we will have to send them a reminding e-mail. Possibilities are
endless. From the implementation standpoint, one has to add a timeout_at field to the state
structure:
@dataclass
class PayingForWonItemData:
""...
timeout_at: Optional[datetime] = None
Another step is to implement a timeout method in Process Manager itself:
class PayingForWonItem:
def timeout(
self, data: PayingForWonItemData
) "-> None:
self._customer_relationship.send_payment_reminder(
self._state.winner_id,
self._state.auction_title,
)
self._data.state = (
State.PAYMENT_OVERDUE_A_DAY
)
With this implementation one still has to invoke Process Manager somehow from the outside.
One approach would be to periodically query database for Process Managers that run out of
time and then call their timeout method. A more fancy solution involves using some
external scheduling service that will invoke our code at a specific moment in the future.
Naturally, a Process Manager does not live in the memory of the program all the time. Just like
an Entity, it is fetched from the database when needed and saved afterwards. Thus, requires
a persistence mechanism. There are two problems to address:
• how to find data of desired one if there are multiple Process Managers of the same type
in progress?
A good-enough solution for the first dilemma is to keep state data structure as JSON. This
approach is lean, but has a trade-off - it will work only if Process Manager’s data structure is
serializable and deserializable. This should be left up to 3rd party libraries if possible. For
Java, there is excellent Jackson ObjectMapper that is able to handle POJOs. After all, it is
possible to write a single, generic repository for all Process Managers. It will have to be able to
translate DTOs into JSON to be later stored in the database. For example, the following
object:
ExampleData(
process_uuid=UUID(
"9fc15305-2a0f-41ed-8c1c-eafa2416ee75"
),
name="Example",
counter=0,
timeout_at=datetime.datetime(
2019, 9, 29, 20, 20, 11, 426930
),
)
{
"name": "Example",
"counter": 0,
"timeout_at": "2019-09-29T20:20:11.426930"
}
Now, there are two possible approaches to keeping timeout_at. One can keep it inside data,
but it will require a partial JSON index for acceptable performance or create another column
just for the timeout. Then, using it should be much simpler. By the way, Process Manager’s
data is clearly belongs to wide category of documents, which makes document databases
(e.g. MongoDB) really handy here. If it is the one you already have in your project, then
keeping Process Managers’ data inside shouldn’t give you much of a headache.
A second dilemma (how to find data of the desired one if there are multiple Process Managers
of the same type in progress?) is a bit more complex one. What has not been mentioned yet
is that between Events and a Process Manager there is a Process Manager Handler. The latter is a
class that will be responsible for creating Process Manager instance if it just handles a starting
Event or fetching it using the repository.
class PayingForWonItemHandler:
@injector.inject
def !__init!__(
self,
process_manager: PayingForWonItem,
repo: ProcessManagerDataRepo,
) "-> None:
self._process_manager = process_manager
self._repo = repo
@method_dispatch
def !__call!__(self, event: Event) "-> None: # 1
raise NotImplementedError
@!__call!__.register(AuctionEnded)
def handle_beginning(
self, event: AuctionEnded
) "-> None: # 2
data = PayingForWonItemData(
process_uuid=uuid.uuid4()
)
self._run_process_manager(data, event)
@!__call!__.register(PaymentCaptured)
def handle_payment_captured(
self, event: PaymentCaptured
) "-> None: # 3
data = self._repo.get(
event.payment_uuid,
PayingForWonItemData,
)
self._run_process_manager(data, event)
def _run_process_manager(
data: PayingForWonItemData,
event: Event,
) "-> None: # 4
self._process_manager.handle(event, data)
self._repo.save(data.process_uuid, data)
Interesting lines:
• guarantee that all events will come back with the same UUID which one will just use
for the primary key - uuid
• write handlers and repo in such a way one is able to use different UUIDs and query
process_manager_data by various nested fields.
The first solution is straightforward to implement and results in simpler code but will affect
involved modules. In practice:
# Process Manager
@handle.register(AuctionEnded)
def handle_auction_ended(
self,
event: AuctionEnded,
data: PayingForWonItemData,
) "-> None:
self._payments.start_new_payment(
self._data.process_uuid, ""...
) # 2
# Process Manager Handler
@!__call!__.register(
PaymentCaptured
)
def handle_payment_captured(
self,
event: PaymentCaptured,
data: PayingForWonItemData,
) "-> None:
data = self._repo.get(
event.payment_uuid,
PayingForWonItemData,
) # 3
Interesting lines:
2. A Process Manager passes its UUID to be used as UUID of newly created payment
3. When PaymentCaptured is emitted one can be sure its the same UUID that Process
Manager has
A side effect of this approach is that one will get a correlation ID - the same UUID will be
passed and reused in different modules, making it easier to track what is actually going on.
On the downside, this solution is not always possible to implement, especially when
tracking multiple objects of the same type is required. It is not possible to reuse the same
UUID then.
The second approach does not have specific requirements but is a bit more complex.
However, this extra complexity is contained in Process Manager Handler:
class PayingForWonItemHandler:
@!__call!__.register(AuctionEnded)
def handle_beginning(
self, event: AuctionEnded
) "-> None: # 1
data = PayingForWonItemData(
process_uuid=uuid.uuid4()
)
self._run_process_manager(data, event)
@!__call!__.register(PaymentCaptured)
def handle_payment_captured(
self, event: PaymentCaptured
) "-> None:
data = self._repo.get_by_field( # 2
PayingForWonItemData,
payment_uuid=event.payment_uuid,
)
self._run_process_manager(data, event)
Interesting lines:
Note these solutions work only if there is exactly one event that can start Process Manager. If
there are more, one has to find another way to guarantee continuity of a Process Manager.
Process Managers are stateful creatures. They have their state persisted between invocations.
Events they handle can be emitted at the same time. Think of a person paying in the last
moment, when a datetime specified in timeout_at passes. In other words, Process Managers
are vulnerable to race conditions and one has to protect them.
In a previous chapter, a technique called optimistic locking was used. The method amounts
to checking if someone else has changed Entity since the last time we fetched it. If so, then
one has to retry the entire operation. It works great with a side-effect-free Entity, but may
not be the best fit for Process Managers. Pessimistic locking would do better.
In its principles, it is even simpler than the former solution. Before one calls any handling
logic of a Process Manager, a lock has to be explicitly acquired. When we finish with the
processing, the lock is released. If we fail to acquire it in the first place - we abort.
Sometimes we may also want to wait for some time until the lock is released or simply retry
later. That is a theory, but for the sake of the example, let’s assume if the lock cannot be
acquired, we are ok with aborting. Consider the exaggerated example mentioned at the
beginning of the section - a winner paying in the very last moment when a Process Manager
times out. Let’s assume that when the timeout occurs, we no longer want the money of a
winner. They are late, end of the story. In other words, when two Events that can end Process
Manager are competing, it makes absolutely no sense to retry or wait. In other cases, like
counting someone’s achievements, it will make sense to retry or wait.
Implementation-wise, Process Manager Handler can be burdened with the logic of acquiring
lock:
class PayingForWonItemHandler:
LOCK_TIMEOUT = 30
@injector.inject
def !__init!__(
self,
*omitted_args,
lock_factory: LockFactory
) "-> None: # 1
""...
self._lock_factory = lock_factory
def _run_process_manager(
self,
lock_name: str,
data: PayingForWonItemData,
event: Event,
) "-> None:
lock = self._lock_factory(
lock_name, self.LOCK_TIMEOUT
) # 2
with lock: # 3
self._process_manager.handle(event, data)
self._repo.save(data.process_uuid, data)
Interesting lines:
2. It is a good idea to always acquire locks with some timeout after which they will be
automatically released if something goes wrong. This improves the resilience of the
system
3. Pythonic context manager abstracts away from us acquiring and releasing the lock
For the sake of completeness of the example, Redis will be used for locking. An example
implementation of a lock context manager may look as follows:
class RedisLock(Lock):
LOCK_VALUE = "LOCKED" # 1
def !__init!__(
self,
redis: StrictRedis,
name: str,
timeout: int = 30,
) "-> None:
self._redis = redis
self._lock_name = name
self._timeout = timeout
def !__exit!__( # 3
self,
exc_type: Optional[Type[BaseException]],
exc_val: Optional[BaseException],
exc_tb: Optional[TracebackType],
) "-> bool:
if exc_type "!= AlreadyLocked:
self._redis.delete(self._lock_name)
return False
Interesting lines:
1. To do locking on Redis using String type, one has to put something inside. It is rather
irrelevant. For some debugging purposes, one might want to put timestamp there
!__exit!__ runs after block under with ends. Its goal is to release lock only if we acquired it.
C HAPTER SUMMARY
This chapter explained a vital part of architecting a system - packaging code by feature or
vertical slicing. We learned that not every module has to follow the Clean Architecture if
only one would benefit from it. In simpler cases, we may resort to Facade-based modules that
interact with databases or external services in a more direct way, without additional layers of
abstractions.
Then, different types of relations between modules were discussed to show ways of
integrating. Among the presented techniques, there were:
• defining a Port in one module and letting another module provide an Adapter,
The preferred way to integrate modules should be using Domain Events unless a synchronous
call is required.
TESTING
T E S T I N G S T R AT E GY A N D F E AT U R E F L AVO R S
At the beginning of any project, delivering business value without causing regression is
trivial. However, with adding more and more features, complexity grows. So does risk of
breaking something up. Adherence to practices of good OOP design, like Open-Closed
Principle or loose coupling lessens danger in case of adding new features, so it does not yet
gives enough confidence. One needs more than a coincidence to deliver regularly. Luckily,
the remedy is already known - automated tests. Here’s what Adrian Sutton says about
relying on test suite at LMAX42:
If our test suite is to provide a complete safety-net, the architecture has to facilitate testing.
Otherwise, we find ourselves hacking around the untestable monster. The Clean
Architecture is definitely a powerful ally when it comes to writing testable code. Up to this
moment, the focus was on testing smaller pieces, like Entities or Use Cases. It is high time we
saw the bigger picture and wonder about testing whole modules as well as an entire
application.
T H E T E S T P Y R A M I D - A M Y T H O R T H E O N LY R I G H T T H I N G TO
DO?
Chances are you heard about the concept of allegedly perfect tests distribution - The Test
Automation Pyramid. This concept is attributed to Mike Cohn and was first published in a
written form in 200943:
The original Test Automation Pyramid strives to maximize return on investment by using
different kinds of tests. Fast and cheap unit tests are not enough to be confident that
program works when a user interacts with it, so they are supplemented with a lesser amount
of higher-level service tests and even fewer tests that run atop user interface. Please note
that this model does not take into account manual testing - it is Test Automation Pyramid
after all. However, manual testing is still necessary - especially exploratory testing. A Test
Pyramid model that takes it into account looks like this:
Mike Cohn’s Test Automation Pyramid consists of three types of tests - unit, service and
UI.
Unit tests and UI should look familiar, but what are mysterious service tests? According
to the author44:
In the way I’m using it, a service is something the application does in response to some
input or set of inputs. Our example calculator involves two services: multiply and divide.
Service-level testing is about testing the services of an application separately from its user
interface. So instead of running a dozen or so multiplication test cases through the
calculator’s user interface, we instead perform those tests at the service level.
44 Mike Cohn, The Forgotten Layer of the Test Automation Pyramid, https://
www.mountaingoatsoftware.com/blog/the-forgotten-layer-of-the-test-automation-pyramid
If we were to apply this to the Clean Architecture, our Use Cases (and Queries) would be our
Services. Test Automation Pyramid can be definitely praised for its consistency - all types of
tests involved refer to certain levels in a tested application, whether it is a user interface
(UI), Use Cases/Queries (Service) or classes/functions (Unit).
Other commonly used names for tests types depending on their level are system tests, end-
to-end tests or integration tests. Although it is generally agreed upon what system tests
and end-to-end tests mean (just go through all the layers), integration tests are defined in
different, often contradictory ways. For example, ISTQB glossary45 says it is A test level that
focuses on interactions between components or systems. Another definition assumes its a way of
testing a few units together as a bigger whole - a component. Why not name it component
testing, then? Oh, wait - there is such a term already. Anyway - the last known
interpretation is to write tests with scope equal to unit tests, but do not stub database and
still call them integration tests. Let’s assume that in this book from now on integration
tests stand for verifying the correctness of code responsible for communication with an
external system or provider. In terms of the Clean Architecture think of a pair of Port and
Adapter for a payment provider. Tests for the Adapter that require communication with the
external system would be called integration tests. They are not meant to test the external
provider itself, though - only Adapter.
I am going to make use of yet another type - API tests. They are placed slightly higher than
Service tests and exercise web framework that is used. A characteristic feature of such tests
would be calling a specific URL and make assertions about the response.
There is yet another attempt to classify tests that uses their role rather than how they are
supposed to run. These types are functional tests and acceptance tests. Difference
between functional and acceptance tests is subtle and some people openly treat as if they
were identical. ISTQB Glossary states that functional testing is to evaluate if a system
satisfies functional requirements while acceptance testing aims to determine whether to
accept the system. From a certain perspective, these are indeed the same - both testing types
check if the system works from a business perspective. The only difference in meaning I was
able to find is that acceptance tests focus on real-life scenarios, while functional testing
can also exercise the system in a more thorough fashion.
From now on, I am going to stick to acceptance testing name when referring to verification
if the system does what it is supposed to do from a perspective of stakeholders and users.
To better illustrate the issue, let’s see through the lens of a user interface and three
differently flavoured features:
2. proxy to other systems - making lots of external calls and doing very little with results
apart from storing or just presenting them,
Each one of these examples is an extremity, but that’s all possible features flavours. A real-
life application will be a combination of all. However, the prevalent flavour will determine
the most appropriate testing strategy.
To conclude, the original Test Automation Pyramid is not a silver bullet. One may need to
take a different approach, best suited to the project of theirs. Please note that with a
modularized application, we can apply a test strategy tailored for each module. That remains
true if eventually, we decide to split into microservices. There is no reason to impose the only
right approach throughout the whole project, especially given that no one-size-fits-all
solution exists.
Such systems are sometimes referred to as CRUDs. One interprets all business
requirements as database operations - Create (INSERT), Read (SELECT), Update
(UPDATE) and Delete (DELETE). Let’s consider an example.
We are going to build a system for our Pizza Fridays. Each week, all participants will order a specific
pizza by entering its name from the menu into a form. There is going to be a person called coordinator who
will be responsible for making a call to a restaurant or ordering online. Once order is placed at the
restaurant, the coordinator will clean the internal orders list.
Participant adds order to the internal list - INSERT a row to the database with pizza name
Coordinator displays contents of the internal list - SELECT all rows from the database
Coordinator empties list after ordering - DELETE rows, alternatively use UPDATE for soft delete
Note that it is the simplest approach possible. For such an application to be used beyond a
single office, one would have to consider many more scenarios, like updating participant’s
order, withdrawing from Pizza Friday etc. The problem is CRUD would then cease to work
for this example.
“Pure” CRUD is an extreme case when there is an exactly one possible path for a given
interaction. No alternative scenarios. Doing the same action repetitively has no (or limited)
consequences. If one would apply the Clean Architecture for such a problem and it would be
their first experience with it, they would think it is a complete waste of time and
overengineered mumbo-jumbo.
And what does an Entity’s method does? Sets some fields. No if-statements, no logical
branches, no special cases. Oh, and it is tempting to name Use Case like Updating[Entity Name]
since one thinks through the lens of database rows. Of course, one can write a unit test for
Entity or Use Case in such a situation. The singular form was used on purpose - there would
be only one scenario to check after all. There is no added value by such tests - if we also test
on a higher level (e.g. via API), unit tests would be checking exactly the same paths of code
and would be giving us less confidence since they exercise less code at once. So when we
face such a problem, we can safely skip unit testing for most of the time and our test
automation “pyramid” can look like this:
In such a project, unit tests prove themself to be valuable only when we deal with some
calculator or a validator. All functionalities are checked with API tests with an assumption
that a minimal amount of tested scenarios suffices. One test for a positive scenario, one for
negative scenario and potentially one for checking security (if we get authentication right)
should give us enough confidence. Few extra UI tests should complement the strategy.
Beware of incorrectly discovered flavour. If an application would benefit from the Clean
Architecture, but one implements it as if it was a CRUD then testing will be much less
effective since more and more scenarios would have to be checked via API tests. Bear in
mind the issue is not only about the speed of tests, but also their stability and
maintainability. If your only dependency is a database, entire test suite finishes in a
relatively short time (e.g. less than 10 minutes), and the whole setup comes down to
putting up to few objects in the database, it is still fine.
Disclaimer: This part is written in mind with scarce integrations with 3rd party providers that have
public APIs. It has little to no application to microservices. For the latter, there is a note in the end.
A project (or a part of it - Adapter) might be just a proxy over another service. Even if one
provides another user interface, still the majority of business logic is being executed
somewhere else. The role of our system is to translate requests from our GUI and handle
responses. Another way to tell if we have this kind of problem is to ask what would happen
if the 3rd party service disappeared? If the answer is “our system is not going to work at all”
then you have a proxy. A typical example is a cheap flights search that checks a few
providers. If it is not able to fetch available flights from elsewhere, it is useless and provides
no service. Of course, it could be caching results for some time to provide minimal service
for the most popular flight routes, but it is only a fraction of the original value.
In this case, once again, unit tests do not prove to be the most reliable source of truth
whether our software works. This mostly depends on the external provider and not our
isolated units. Eventually, we have to resort to integration tests. An absolute must is to test
at least once code using each endpoint (or a method if we speak about RPC or SOAP). There
are two more things we have to address - dependencies between methods and data
validation.
One cannot use Refund endpoint if a Charge endpoint not been used beforehand. Obviously
we could test how refund behaves if we give it a non-existing id, but that brings little to no
value for an API client like us. In this case, one can just naively issue additional requests as
needed to check each API endpoint individually. Second solution would be to write tests
that will be checking real-life scenarios by calling several methods at once, e.g. creating
charge (1. call), then checking its details (2. call) to finally refund (3. call). Everything in
one test.
@pytest.mark.super_cool_payment_provider
def test_charge_then_capture(
api_consumer: ApiConsumer,
source: str,
api_key: str,
) "-> None:
# first, we call our code to charge & capture 15 dollars
charge_id = api_consumer.charge(
get_dollars("15.00"), source
)
api_consumer.capture(charge_id)
Data validation can be mostly dealt with by using Value Objects as Facade’s arguments and
return values. Validation rules can be then easily enforced due to the nature of Value
Objects. Number of integration tests should be very limited and they should be written with
Robustness Principle (AKA Postel’s Law) in mind46:
Simply saying, we write tests (and code!) checking only for things we rely on. For example,
failing a test if there is an unexpected field in JSON violates Robustness Principle. Such a
change in API is considered to be backwards-compatible and hence should not break our
integration.
Another thing to consider is how stable these tests would be. If they will be failing randomly
due to a fallible provider, one would neglect running them. There are little things more
irritating to a software developer than a test suite failing locally without their fault. The
same goes for a CI pipeline. Solutions may vary from excluding these tests from local runs
to adding retries to tests or implementing them in the code under test itself.
The advice given above is not really applicable in a system consisting of microservices. The
very idea of integration testing is contradictory to the way how done-right microservices
systems are developed and deployed. Testing services in isolation is not solving the problem
either - it is like relying on unit tests only. Luckily, a proper solution has been discovered
rediscovered - Consumer-Driven Contract Testing47. One of the most popular tools for
facilitating this technique is Pact48. This topic is exciting, though it remains out of scope for
this book. An interested reader will surely be able to study the Consumer-Driven Contract
Testing further on their own.
Once we realise we are dealing with an essentially complex problem, the Clean Architecture
or more sophisticated techniques (like DDD) come in handy. Complexity means many
potential scenarios, countless edge cases and lots of factors affecting the outcome of actions
undertaken by users.
First and foremost, only appropriate code structure and architecture can enable us to use an
effective testing strategy. Effective does not mean straightforward - it has to combine at least
a few types of automated tests to earn its name. Enterprise-wide business logic (Entities)
should be unit tested. Application business rules (Use Cases) fall under service testing. Then
we have Repositories and Adapters which will benefit most from the integration testing just
like it was described in the previous section - How to test a proxy to other systems?. Then there
is a web framework and potentially a User Interface, so we should also have a handful of API
& UI tests.
Then a testing automation pyramid will closely resemble the exemplary one from the
beginning of this chapter.
Automated testing can be attributed another feature - how much a test knows about code
that is being tested. Using the same terminology as for manual testing, we can think about
black-box and white-box testing. Black-box testing means a tester has no idea about
internals while white-box is exactly opposite - a tester knows and sees everything, including
implementation. Black box testing validates the requirements and specifications, whereas
white box testing validates the code.
Unit tests can be implemented in both ways. One extreme is an orthodox act of white-box,
when the unit under test has no secrets from the test. In the context of unit-testing, this is
rarely a good idea. Such approach duplicates implementation in testing, making it
impossible to evolve our tests in such a way that as the tests get more specific, the code gets more
generic. Moreover, tests are tightly coupled to the implementation, which makes them fail
each time the latter changes. Unit-testing in such a way, resembles an act of pouring
concrete - making sure nobody can move a thing. In white-box testing of a class, we call its
public methods but then often verify the result by examining private fields or using mocks
of types used internally.
Example:
auction.end_auction()
assert auction._ended # 1
def test_Auction_Ending_ChangesEndedFlag(
yesterday: datetime,
) "-> None:
auction = AuctionFactory(ends_at=yesterday)
auction._ended = True # 2
with pytest.raises(BidOnEndedAuction):
auction.place_bid(
bidder_id=1,
amount=get_dollars("19.99"),
)
def test_EndedAuction_Ending_RaisesException(
yesterday: datetime,
) "-> None:
auction = AuctionFactory(ends_at=yesterday)
auction._ended = True # 2
with pytest.raises(AuctionAlreadyEnded):
auction.end_auction()
Note how much inappropriate intimacy49 is there. At a glance, the first test looks pretty fine…
Until you notice its title and assertion. Not to mention accessing pseudo-private field at (1).
Second and third tests look better in terms of assert, but putting an auction in the expected
state (2) by manipulating its private field is just plain wrong.
Now, imagine that for tracking purposes _ended flag is replaced with _ended_at that keeps
datetime instance indicating moment of calling end_auction or None if the method has not
been called. All these three tests would break, although most probably everything would
work in Service or higher-level tests (and in production). Unit-tests are a useless burden,
aren’t they? Well, they are if written in a way that violates encapsulation.
49 https://refactoring.guru/smells/inappropriate-intimacy
A better approach is to respect class’ privacy and test it in a black-box way. The rule is
simple - we are not allowed to ask the class for its secrets and can touch only its public
methods & properties.
def test_EndedAuction_PlacingBid_RaisesException(
yesterday: datetime,
) "-> None:
auction = AuctionFactory(ends_at=yesterday)
auction.end_auction() # 1
with pytest.raises(BidOnEndedAuction):
auction.place_bid(
bidder_id=1,
amount=get_dollars("19.99"),
)
def test_EndedAuction_Ending_RaisesException(
yesterday: datetime,
) "-> None:
auction = AuctionFactory(ends_at=yesterday)
auction.end_auction() # 1
with pytest.raises(AuctionAlreadyEnded):
auction.end_auction()
The most noticeable change is that we no longer manipulating a private field, but tell
auction to end (1). Then we test class’ behavior, by checking if an ended auction prevents us
from placing new bids (test 1) and ending it again. Notice that we reduced the number of
tests by one and still have the same coverage. Moreover, we kept an ability to rearrange
implementation details as long as class’ behavior remains unchanged.
I write about white- and black-box testing, but there is always a grey area in between.
“A l l p r o b l e m s i n c o m p u t e r s c i e n c e c a n b e s o l v e d b y a n o t h e r
level of indirection” - David Wheeler
In the above examples, a AuctionFactory class was used as a helper to create Auction
instances. This is an example of a builder, a quite handy pattern to use in tests. That being
said, in Python, there is an excellent library factory_boy that facilitates writing builders in a
declarative way. To get a builder, we write a sort of specification with default values for an
object being built.
class AuctionFactory(factory.Factory):
class Meta:
model = Auction
id = factory.Sequence(lambda n: n)
bids = factory.List([])
title = factory.Faker("name")
starting_price = get_dollars("10.00")
ends_at = factory.Faker(
"future_datetime", end_date="+7d"
)
ended = False
In its basic form, all fields declared on AuctionFactory are used to populate
Auction."__init"__ arguments. A rough equivalent, written in a more imperative way, looks
as follows:
def create_auction(
id: Optional[AuctionId] = None,
bids: Optional[List[Bid]] = None,
title: Optional[str] = None,
starting_price: Money = get_dollars("10.00"),
ends_at: datetime = datetime.now()
+ timedelta(days=7),
ended: bool = False,
) "-> Auction:
if id is None:
# object we can iterate to get integers like 1, 2, 3""...
id = auction_sequence.next()
if bids is None:
bids = []
if title is None:
title = faker.name()
return Auction(
id,
title,
starting_price,
bids,
ends_at,
ended,
)
Now, if we were to change _ended field to _ended_at and still let our tests use it via
create_auction or AuctionFactory with ended parameter, we simply have to implement logic
in a builder:
# factory_boy-based builder
class AuctionFactory(factory.Factory):
class Meta:
model = Auction
class Params:
ended = False
id = factory.Sequence(lambda n: n)
""...
ended_at = factory.LazyAttribute(
lambda o: datetime.now()
- timedelta(days=1)
if o.ended
else None
)
# function-based builder
def create_auction(
*omitted_args, ended: bool = False,
) "-> Auction:
""...
if not ended:
ended_at = None
else:
ended_at = datetime.now() - timedelta(
days=1
)
return Auction(
id,
title,
starting_price,
bids,
ends_at,
ended_at,
)
Relying on builders in tests is a pretty good idea since they can abstract many
implementation details and significantly limit the number of places where we need change.
They also simplify objects creation, providing defaults values for all fields we decide not to
specify ourselves. Of course, they will always have to follow changes in Auction
implementation because they are tightly coupled to it.
All in all, relying on public methods during testing is generally a very good idea. Leaking
knowledge about implementation will cost you in the longer term. Remember the analogy of
pouring concrete. Implementation-coupled testing makes sure nobody can move a thing in
the future.
INTRODUCTION
Another angle tests can be looked at is whether they are more state or interaction oriented.
These two approaches look pretty similar during exercising system under test (Act or When
step) but have a different approach to verification phase (Assert or Then step). State-based
testing inspects the internal state of the system under test while interaction-based testing
makes checks if the system under test called used mocked dependencies as expected.
repo.save(auction_with_pending_event)
event_bus_mock.post.assert_called_once_with(
pending_event
)
# state-based testing example
@pytest.mark.usefixtures("transaction")
def test_AuctionsRepo_UponSavingAuction_ClearsPendingEvents(
connection: Connection,
auction_with_pending_event: Auction,
event_bus_mock: Mock,
) "-> None:
repo = SqlAlchemyAuctionsRepo(
connection, event_bus_mock
)
repo.save(auction_with_pending_event)
There has been a dispute about one approach being better than others, but personally, I find
such an argument beyond the point. I think both state-based and interaction-based testing
approaches are useful in appropriate situations. However, it does not mean they can always
be used interchangeably. Regardless which one you use, an overarching goal is to avoid
violating encapsulation of a system under test. Sadly, neither of the two approaches
prevents that by design.
State-based testing seems to be a perfect choice when we implement Act (or When) step
with a simple public method call, then verify its result using another public method.
However, it is dangerous a test uses private fields or methods. When there is no obvious
way to read a state, one may feel tempted to add a special method that will be used just in
testing.
def test_Auction_Ending_ChangesEndedFlag(
yesterday: datetime,
) "-> None:
auction = AuctionFactory(ends_at=yesterday)
auction.end_auction()
3. The tests no longer describe how Auction should be used. is_ended is not meant to be
used outside in production code but tests that are living documentation suggest
otherwise
4. Checking internal state does not contribute to conveying domain knowledge. Okay,
Auction is ended, but so what? What are the consequences? What I can / cannot do
with an ended auction?
To conclude, state-based testing is a great choice as long as one can inspect the state without
breaking encapsulation of a system under test.
Interaction-based testing is convenient when we verify the usage of dependencies. This kind
of testing is used in combination with mocks which are used to write assertions and
potentially fail tests. Mocks are our salvation when we cannot / do not want to use external
dependencies, like payment providers or other 3rd parties. In terms of the Clean
Architecture, one could mock Port and then verify if Use Case called Port as expected:
def test_EndingAuction_WonEndedAuction_CallsPaymentProviderWithAuctionCurrentPrice(
ending_auction_uc: EndingAuction,
auction: Auction,
payment_provider: Mock,
) "-> None:
ending_auction_uc.execute(
EndingAuctionInputDto(auction.id)
)
payment_provider.begin_payment.assert_called_once_with(
auction.current_price
)
To sum up, behavior-based testing is dangerous when mocks are overused. They should be
applied sparingly. Good examples are checking how the system under test uses Ports or
external dependencies that we do not control.
Mocking is not the only approach to replace objects for testing purposes. While mocks are
flexible and enable us to replace virtually any object with dynamically specified behavior,
they have their limits and ideal use case. Generally speaking, mocks are merely one category
of pretend objects used for testing purposes. The most general name for this type of objects
is Test Doubles. Another type is a stub - simple implementations that return canned
responses. They are meant to be used in a different way than mocks. When stubs are used in
a specific test, one would rather write assertions about an object that uses them than for
stubs themselves.
Consider an example:
class PaymentProviderStub(PaymentProvider):
def begin_payment(
self, amount: Money
) "-> bool:
return True
def test_EndingAuction_WonEndedAuctionStubbedPaymentProvider_EmitsDomainEvent(
auction: Auction,
payment_provider: PaymentProviderStub,
event_bus_mock: Mock,
) "-> None:
ending_auction_uc = EndingAuction(
payment_provider, event_bus_mock
)
ending_auction_uc.execute(
EndingAuctionInputDto(auction.id)
)
assert event_bus_mock.post.assert_called_once_with(
""...
)
Although the test is verifying EndingAuction and both PaymentProvider and EventBus
collaborators were replaced, assertions are made only about the mocked EventBus. This test
checks if EndingAuction posts a Domain Event via EventBus provided that
PaymentProvider.begin_payment succeeds. Note that PaymentProviderStub is a custom
implementation of a Port.
Usually, there are no shortcuts in writing stubs, though Python’s mocks could potentially be
abused to do so like:
payment_provider = Mock(
spec_set=PaymentProvider,
begin_payment=Mock(return_value=True),
)
There is another creative way of using stubs that allows turning virtually any behavior-
oriented testing into state-based one. In order to do so, we have to break two
aforementioned rules - first, a query method has to be added to stub just for testing
purposes and the assertion in the test will be written against the stub (ouch).
class PaymentProviderStub(PaymentProvider):
def !__init!__(self) "-> None:
self._payments: List[Money] = []
def begin_payment(
self, amount: Money
) "-> bool:
# begin_payment not only returns a canned response, but also records the
call
# this makes this class a hybrid of Stub and Spy - another test double type
self._payments.append(amount)
return True
def test_EndingAuction_WonEndedAuction_CallsPaymentProviderWithAuctionCurrentPrice(
ending_auction_uc: EndingAuction,
auction: Auction,
payment_provider: PaymentProviderStub,
) "-> None:
ending_auction_uc.execute(
EndingAuctionInputDto(auction.id)
)
Query method was exposed on a stub specifically to verify how its begin_payment method was
called. A few paragraphs earlier, I warned you about adding such special methods, but that
piece of advice is applicable for classes used in production, not those written exclusively for
tests.
We know already we should avoid coupling tests with implementation because this will
impede refactoring instead of speeding it up and make it next to impossible to make any
changes in code without failing half of the test suite. A properly designed (or rather
discovered by trial-and-error) encapsulation is what lets us write tests against stable API
and maintain the freedom to rearrange private details. We also know that unit tests are
cheap to write but are often underestimated due to the fact of how little code they exercise
at once. Or maybe we are wrong about small scope…?
A common misconception is that a unit test is only about verifying a method of a single
class or a standalone function - the smallest unit one can imagine. The confusion comes
from the fact that neither unit nor unit-testing is strictly defined. There are only clues. We
expect tests of a unit to be stable, isolated from the external world and significantly faster
than other kinds of tests. Software developers should be able to write and run such tests on
their machines. Hence, time for a shift in perspective - let’s think how one can test an entire
module as a single unit with a black-box approach. First, let’s remind what a module is and
where are its contact points with surroundings.
Stub - a test double implemented before the test that returns canned (hardcoded)
responses. It should not fail the test. Its role is to replace the dependency of a system
under test. Then, we check if it behaves as expected, provided with stubbed
dependency.
Fake - simple yet functional implementation of a dependency that is not ready for
production-use yet may prove valuable in tests
Dummy - usually a primitive value like None that is just passed around and not used.
For example, we may need an argument to class’ constructor which is not used in the
particular test, yet still required by class
Mock - a test double that can be queried for how it was used. Used in assertions,
especially in behavior-oriented testing
When we unit-test a class, we are doing so by calling its methods and checking results or
fields of an instance. Hopefully, we are not using private methods or fields in tests, since we
then break encapsulation and make it harder to refactor the class in the future. If we think
about a module, it also has its own public API - these are Use Cases (or Commands +
Commands Handlers) and Queries. A typical contact point with the outside world is a Port or
Repository. Repository represents the private storage of a module that needs to be
maintained between invocations of Use Cases. Things that come out of a module are Domain
Events (via Event Bus) and Output DTOs (if we use Output Boundaries / Presenters).
To write any test against an entire module, we need to decide how to achieve four things:
The way setting the stage is implemented depends on whether we treat Entities as a private
detail of a module or not. In a former case, one must not touch Entities in tests but is
allowed to call Use Cases of a given module in such a way they should put the system under
test into the state we desire. If our module is using a Facade instead of Use Cases, we are
allowed only to call Facade’s public methods:
def test_Auction_OverbidFromOtherBidder_EmitsEvents(
beginning_auction_uc: BeginningAuction,
place_bid_uc: PlacingBid,
*other_args
) "-> None:
auction_id = 1
tomorrow = datetime.now(
tz=pytz.UTC
) + timedelta(days=1)
# First, auction begins
beginning_auction_uc.execute(
BeginningAuctionInputDto(
auction_id,
"Foo",
get_dollars("1.00"),
tomorrow,
)
)
# Then, a new winning bid is placed
place_bid_uc.execute(
PlacingBidInputDto(
1, auction_id, get_dollars("2.0")
)
)
# Act / When
""...
# Assert / Then
""...
On the bright side, that is very close to the way the module will be used in production code.
On the other hand, introducing a fatal bug to BeginningAuction would create a cascade of
failing tests since it is used in virtually all other tests. A situation like that can be easily
contained by running tests often and working in a TDD way. Then we can exactly tell which
change caused tests failures, then revert or amend it.
In general, keeping Entities hidden makes better encapsulation, hence allows for bolder code
changes. However, when putting a module in the desired state using Use Cases only becomes
more complex, one may want to sacrifice some encapsulation. Builders, like AuctionFactory,
are then our way to go. Even if one goes for total encapsulation, builders still may come in
handy, for example, to generate Input DTOs to our Use Cases.
This step is the simplest of all. Implementation comes down to just calling a Use Case:
def test_Auction_OverbidFromOtherBidder_EmitsEvents(
place_bid_uc: PlacingBid, *other_args
) "-> None:
# Arrange / Given
""...
# Act / When
place_bid_uc.execute(
PlacingBidInputDto(
2, auction_id, get_dollars("3.0")
)
)
# Assert / Then
""...
While business logic indeed interacts with Entities, they are rarely the same what users of
the system see (unless we are writing CRUD). In simple cases, we present a subset of fields.
In more complex scenarios user sees a combination of data from different Entities, probably
nested & twisted. That is why CQRS is so helpful because it relieves the burden on Entities.
They no longer have to care for both executing business logic AND providing insights into
the state of the system for users. That also implies Queries are not unit-testable since they
are tightly coupled to the underlying infrastructure, hence have to be tested using higher-
level tests. If we were to use a fake repository implemented in memory, obviously we would
have to reimplement all queries just for tests. This makes the whole effort questionable.
Even if we take Queries out of the equation, there are still obvious ways to assert about
during unit-testing of an entire module:
• exceptions thrown
• which Domain Events have been emitted from within the module
expected_dto = PlacingBidOutputDto(
is_winner=True,
current_price=get_dollars("100"),
)
assert output_boundary.dto "== expected_dto
# verifying thrown exceptions
def test_PlacingBid_BiddingOnEndedAuction_RaisesException(
beginning_auction_uc: BeginningAuction,
place_bid_uc: PlacingBid,
) "-> None:
yesterday = datetime.now(
tz=pytz.UTC
) - timedelta(days=1)
with freeze_time(yesterday):
beginning_auction_uc.execute(
BeginningAuctionInputDto(
1,
"Bar",
get_dollars("1.00"),
yesterday + timedelta(hours=1),
)
)
with pytest.raises(BidOnEndedAuction):
place_bid_uc.execute(
PlacingBidInputDto(
1, 1, get_dollars("2.00")
)
)
# verifying emitted Domain Events
def test_Auction_OverbidFromWinner_EmitsWinningBidEventOnly(
place_bid_uc: PlacingBid,
event_bus: Mock,
auction_id: AuctionId,
auction_title: str,
) "-> None:
place_bid_uc.execute(
PlacingBidInputDto(
3, auction_id, get_dollars("100")
)
)
event_bus.post.reset_mock()
place_bid_uc.execute(
PlacingBidInputDto(
3, auction_id, get_dollars("120")
)
)
event_bus.post.assert_called_once_with(
WinningBidPlaced(
auction_id,
3,
get_dollars("120"),
auction_title,
)
)
You may wonder if these approaches are sufficient since we are not validating Queries in any
way. The truth is one should be able to check plenty of possible scenarios that way. Thanks
to Domain Events, not being able to inspect the state directly should not be a problem.
Especially that we could implement read models to be generated only using Domain
Events51.
Another quite obvious idea one may have is to make assertions about Entities being saved.
This might sound like a great idea, but causes the same problems with encapsulation like
those mentioned before in part about putting a module in the desired state before the test.
51 Matthias Noback, Object Design Style Guide, Chapter 8. Build read models from domain
events
DEALING WITH DEPENDENCIES (PORTS & REPOSITORIES)
Although a single module can deal with the most complex business logic we can imagine, it
still requires Ports and Repositories to actually do something. What good would they be for if
they could not communicate with the world?
Ports are often abstracting away external systems which we do not control. For example, we
cannot rely on their stability or easily put them in the desired way. We definitely must not
unit-test a module with a real Adapter. We either mock or stub this kind of dependency.
There is one seemingly special case when one module calls another - what do we do? The
solution is the same - mock or stub such dependency.
In the case of Repositories, we want to replace them with another test double - fake. The
latter is a simplified, yet functional implementation of a given interface (AuctionsRepository
in this case). A simple in-memory implementation is what we are looking for. This should
be aligned with our production-grade implementations. For example, if our repository uses
event bus to emit events, an in-memory implementation should do the same.
C HAPTER SUMMARY
Although automated tests cannot guarantee there will be no bugs in code, they are still a
worthy investment. There is no single best practice (I'm not too fond of that term by the way)
for building automated test suites that will maximize your return on investment. It is just
typical that lower-level tests, for example, unit tests, are faster, simpler and cheaper than
high-level API tests. That being said, LMAX employees has been boasting their extensive
test suite that is strongly focused on higher-level testing. However, they are building a
blazing-fast trading facility, so unless one is building exact same product, they should not
recklessly copy other testing strategies. Each project needs a bit of experimentation and
failed trials to find an optimal approach to automated testing.
Let’s conclude this chapter with example testing strategy:
• heavy unit-tests for the entire module, with faked Repositories and mocked Ports
• no unit tests for individual classes inside the Clean Architecture modules, unless they
are things like calculators or validators and it is very impractical to test them with the
entire module
• higher-level tests for Facade-based modules, without test doubles for database
• always have tests to check external providers, at minimum mimic the real usage of it,
e.g. charge payment card, then capture
• at least three API-level tests for each Use Case / Facade method to check positive
scenario, negative scenario and to check if authorization is working
Remember that fine test suite should empower developers and encourage them to introduce
changes. It should be a kind of a safety net that whispers to their ears “I got your back!”
even when they are to touch most critical, money-making parts of the system. In the long
run, a test suite helps maintain a high velocity of delivery. Bear in mind that your test suite
can only be as good as code that is being tested. One has to pay attention to design and code
quality to guarantee high testability.
FINAL WORDS
from: [email protected]
to: the dearest reader
subject: Personal thanks
Dear reader,
Thank you for bearing with me throughout the lecture of this book. I sincerely hope reading
it was at least as enlightening experience for you as writing it was for me.
I must admit I thought it would be relatively easy to write about the Clean Architecture
when you implemented it twice and taught a few dozens of people how to do it.
Throughout writing the book, I tried to look at things from different angles. I was not
looking for absolute truth or best practices (again, I wouldn't say I like that term - best practices
are context dependant). That is how it works in software development - it is a game of trade-
offs. Universal solutions, appropriate for everyone does not exist. A developer chooses a
solution not to have a particular problem they do not like. At the same time, they accept
several less significant issues they have to learn how to live with.
That is why I want you to evaluate every recipe shown in this book before using it. Try to
find out if it will really benefit you, your team, your company and your project.
Regardless of whether you will use 5% or 100% advice from the book, I wish it would help
you advance your career.
Yours truly,
Sebastian
APPENDIX A: MIGRATING FROM
LEGACY
SHOULD I EVEN MIGRATE?
What good for are techniques if they are only applicable to green-field projects?
Working with legacy (also known as brown-field) projects is much more probable scenario
for most of us. Hence, a migration strategy would be most welcomed. Frankly, there is little
to no chance that you will be able to migrate your entire project to the Clean Architecture.
Not only because not all code would benefit from it, but also because some areas are not
actively developed. Rewriting them just to make it finally look good is art for art’s sake.
Unless it is a non-critical part, you decide to rewrite to gain an understanding of applying
the Clean Architecture.
You may neglect refactoring to the Clean Architecture if your project is undergoing major
changes. Unstable conditions will make it considerably harder (and more expensive) to
make progress.
HOW TO DO IT?
First things first - we have to have an idea about different modules of our project. If there is
no notion of modules in the code, we have to sketch it, using imagination and a whiteboard.
Do not jump to code if you are not sure where modules’ boundaries are. Talk to people.
Gather domain knowledge. Invite someone that works longer than you to support your
efforts. Ideally, it would be best if you came up with a DDD Context Map, but rough
sketches would also do.
Now when you have identified components, try to mark their complexity/importance on the
scale from 1 to 3, where 1 is the most important and 3 is the least important. 1 is for
components that are either company’s competitive advantage, are most complex, are main
source of revenue or are just complex. On the other hand, 3 is often a glue code for 3rd
party solution used by a business which is not changed very often, if at all.
Now when you understand how things work, the next step is to do some groundwork. Setup
dependency injection library if you do not have one already.
Identify the module boundary and formalize it, using Use Cases or Facade. There is no need to
start big-bang rewrite from identifying Entities, writing Repositories for them etc. Hold your
horses, at least for now. At the moment, just wrap up the module and find how it is used.
Then expose an API for it. Then start using it in other code areas.
Do not forget about testing. It would be ideal if you had a set of higher-level tests you can
run locally to make sure your refactorings are safe.
In the next step, identify code coupled with external providers and start moving it to a
newly created Adapter. Create a Port and configure a binding in IoC container.
In the final step, start identifying Entities that will protect your business rules. Create
repositories for them. That is going to be the most challenging and long-lasting part of the
migration.
Frankly, it would be better to not even think about freezing new features. Refactoring
codebase is a serious undertaking. Since it involves so many changes, it is risky. It would be
best to take an approach like Branch By Abstraction that will let you do changes along with
usual development and deploy changed code.
Do not try to do a big-bang refactoring and definitely do not jump into code unless you have
come up with a plan. Instead, adopt a steady approach to evolve codebase in the desired
direction. It is not an easy refactoring to do since you will be probably learning two things at
the same time - the Clean Architecture AND arcana of your project. You do not have to
strive for perfection at first attempt. Perfectionism is not an ally of a software developer. You
are not building a cathedral that is supposed to last ages. Know where you want to be and
slowly go there, step by step.
APPENDIX B: INTRODUCTION TO
EVENT SOURCING
WHAT IS EVENT SOURCING?
Let’s consider e-commerce Order. It might hold current status (new, confirmed, shipped,
etc.) and summaries – total price, shipping and taxes. Naturally, Order does not exist on its
own. We usually wire it with another entity, OrderLine that refers to a single product
ordered with quantity information. This structure could be represented in a relational
database in the following way:
orders
-[ RECORD 1 ]---------
id | 1
status | NEW
total_price | 169.9900
order_lines
-[ RECORD 1 ]""---
id | 1
order_id | 1
product_id | 512
quantity | 1
-[ RECORD 2 ]""---
id | 2
order_id | 1
product_id | 614
quantity | 3
By storing data this way, we can always cheaply get the CURRENT state of our Order. We
store a dump of the serialized object after the latest changes. Any data mutation, for
example switching status from new to shipped causes data overwrite. We irreversibly lose
old state. What if we need to track all changes…?
Let’s see how that fits in another database table:
order_history
-[ RECORD 1 ]----------------------------------
id | 1
order_id | 1
event_name | Created
created_at | 2018-08-09 23:00:09.22674+02
data | {}
-[ RECORD 2 ]----------------------------------
id | 2
order_id | 1
event_name | LineAdded
created_at | 2018-08-09 23:01:03.47922+02
data | {"product_id": 512, "quantity": 1}
-[ RECORD 3 ]----------------------------------
id | 3
order_id | 1
event_name | LineAdded
created_at | 2018-08-09 23:37:06.93112+02
data | {"product_id": 614, "quantity": 3}
-[ RECORD 4 ]----------------------------------
id | 4
order_id | 1
event_name | Confirmed
created_at | 2020-08-09 23:52:03.08832+02
data | {"status": "confirmed"}
Such a representation enables us to firmly tell what was changed and when. However,
order_history plays second fiddle. It is merely an auxiliary record of orders, added just to
fulfill some business requirement. We still reach to original orders table when we want to
know the exact state of any Order in all other scenarios. In particular, we rely on orders
whenever we make any changes (e.g. changing quantity) or reading state in most cases (e.g.
telling what total price of the order is).
However, note that order_history is as good as orders table when we have to get current
Order state. We just have to fetch all entries for given Order and ‘replay’ them from the start.
In the end, we’ll get exactly the same information that is saved in the orders table. So
should we still treat orders table as our source of truth? Event Sourcing takes a different
approach. We can safely get rid of the table or at least no longer rely on it in any situation
that would actually change Order.
To sum up, Event Sourcing comes down to:
• Keeping your business objects (from now on called Aggregates) as a series of replayable
events. A collection of these events is called an event stream
• Never deleting any events from a system, only appending new ones
• Using events as the only reliable way of telling in what state a given Aggregate is
• If you need to query data or present them in a table-like format, keep a copy of them in
a denormalized format. This is called projection
• Designing your Aggregates to protect certain vital business invariants, such as Order
encapsulates costs summary. A good rule of thumb is to keep aggregates as small as
possible
• A complete history what was changed when and by who (if you enclose such
information in an event)
• Time-travel debugging, allowing to recreate the state of the system in any given
moment
• Possibility of creating specialized read models of your data for high performance
ORDER AS ENTITY
class Order:
def !__init!__(
self,
uuid: UUID,
customer_id: CustomerId,
lines: Optional[List[OrderLine]] = None,
status: OrderStatus = OrderStatus.NEW,
) "-> None:
if lines is None:
lines = []
self._customer_id = customer_id
self._status = status
self._lines = lines
To create an instance, we just need a customer_id and uuid. There are default arguments
provided for lines and status. Order guards a very simple domain invariant, namely makes
sure only new orders can be confirmed.
INTRODUCING EVENTS
Let’s rewrite Order class using Event Sourcing. First, we need events that will represent any
state mutations:
@dataclass(frozen=True)
class Event:
_subclasses: ClassVar[Dict[str, Type]] = {}
created_at: datetime
version: int
@dataclass(frozen=True)
class OrderDrafted(Event):
customer_id: CustomerId
@dataclass(frozen=True)
class OrderConfirmed(Event):
pass
T H E S E A R E N OT D O M A I N E V E N T S W E S AW B E F O R E !
A resemblance between Event Sourcing events and Domain Events used mostly for
integration purposes in previous chapters is visible to the naked eye. Both stand for a
significant fact within a domain.
However, they are not the same and should not be used interchangeably! Event
Sourcing events should never cross a boundary of a module containing an Aggregate
emitting these events. Domain Events may be used outside, published and used for
integration. Event Sourcing events absolutely not. The latter is more like persistence
details and are not meant to be used outside of the module.
For more details, read Why Event Sourcing is a microservice communication anti-pattern by
Olivier Libutzki https://dev.to/olibutzki/why-event-sourcing-is-a-microservice-anti-
pattern-3mcj
In such a simple example, there are only two events. Note that their naming is as specific for
orders as possible and familiar to a domain expert. One should avoid creating general event
classes to spare keystrokes like StatusChanged with a status field. The first event,
OrderDrafted, is a standard way of starting any event stream for Order. The second event,
OrderConfirmed, represents an act of confirming Order. Such event classes only carry as
much information as is it required for rebuilding the state of an aggregate. This is a
significant difference between Event Sourcing events and Domain Events used for
integration. For following Order calls:
event_stream = [
OrderDrafted(customer_id=1, created_at=""...),
OrderConfirmed(created_at=""...),
]
ORDER AS AGGREGATE
In the end, we should be able to load Order state using these events. There goes a rewritten
version of Order class that is a full-fledged ES aggregate. It will accept an instance of
EventStream as an argument:
@dataclass(frozen=True)
class EventStream:
uuid: UUID
events: List[Event]
version: int
This is a simple data structure with UUID of the Aggregate, a list of past events to rebuild
the state from and a version of Aggregate that will come in handy a bit later.
class Order:
def !__init!__(
self, event_stream: EventStream
) "-> None: # 1
self._uuid = event_stream.uuid
self._version = event_stream.version
self._customer_id = 0 # 2
self._status = OrderStatus.NEW
self._lines: Dict[ProductId, int] = {}
self._new_events = [] # 4
@property
def uuid(self):
return self._uuid
@property
def _next_version(self):
"""Useful for events creation"""
return self._version + 1
@property
def changes(self) "-> AggregateChanges: # 5
return AggregateChanges(
self._uuid,
self._new_events[:],
self._version,
)
event = OrderConfirmed(
datetime.now(tz=pytz.UTC),
self._next_version,
) # 8
self._apply(event)
self._new_events.append(event) # 9
1. To instantiate Order one has to give it a list of events it should be initialized with
2. Before initializing class, we sometimes may need to set default values for fields
knowing they will be overridden
4. The Aggregate will keep new events created after initialization due to method calls...
5. … to be later obtained to be saved. Here, we use a @property + list copy trick to make
sure no one will mangle with recorded changes from the outside
6. A heart of every Event Sourcing Aggregate is _apply method. Inside it we mutate state.
Real-life implementation should not use if-elif + isinstance calls. This would look
much better in a language that supports polymorphic methods overloading.
7. Finally, a public method which is responsible for the business logic of confirming
orders
Testing Event Sourcing Aggregates is trivial. Arrange (Given) step comes down to
instantiating an Aggregate with given events. Act (When) step involves calling a public
method on the aggregate while Assert (Then) checks either for expected events or
exception:
@freeze_time("2019-01-14")
def test_order_newly_created_confirmation_changes_status(self):
now = datetime.now(tz=pytz.UTC)
order = Order(
EventStream(
uuid=uuid4(),
events=[OrderDrafted(customer_id=1, created_at=now)],
version=1
)
)
order.confirm()
@freeze_time("2019-01-14")
def test_order_newly_created_cannot_be_confirmed_twice(self):
now = datetime.now(tz=pytz.UTC)
order = Order(
EventStream(
uuid=uuid4(),
events=[
OrderDrafted(customer_id=1, created_at=now),
OrderConfirmed(created_at=now)],
version=1
)
)
with self.assertRaises(IllegalStatusChange):
order.confirm()
PERSISTENCE IN EVENT SOURCING
A P P E N D - O N LY E V E N T S T R E A M S
Since events are first-class citizens required to rebuild the state of an Aggregate, we need to
be able to retrieve them using aggregate id and append new ones to the existing event
stream.
These functionalities could be provided by a class inheriting from such an abstract class:
class EventStore(abc.ABC):
@abc.abstractmethod
def load_stream(
self, aggregate_uuid: UUID
) "-> EventStream:
pass
@abc.abstractmethod
def append_to_stream(
self, changes: AggregateChanges
) "-> None:
pass
load_stream method returns EventStream instance - simple data structure with a list of
events needed for Aggregate’s initialization, UUID, and current version:
@dataclass(frozen=True)
class EventStream:
uuid: UUID
events: List[Event]
version: int
We will use version to protect against concurrent updates using optimistic locking.
append_to_stream method accepts AggregateChanges - another simple data structure:
@dataclass(frozen=True)
class AggregateChanges:
aggregate_uuid: UUID
events: List[Event]
expected_version: int
It consists of Aggregate’s UUID, expected version (the same value we obtained from
load_stream) and a list of events our Aggregate produced. Please note that we do not have to
store old events, only new ones. This is possible because we are not allowed to delete any
events ever, so this is an append-only structure.
Aforementioned expected_version parameter serves for protection against concurrent
updates. If such a situation occurs, this method should raise an exception:
class ConcurrentStreamWriteError(RuntimeError):
pass
Another critical question, what database should be used? Some people state that almost any
is fine. I find it hardly a valid answer. Is transactional database like MySQL a good choice?
What to pay attention to? To precisely answer this question, one has to consider the nature
of events stream and how they are used to reconstruct Aggregates.
RETRIEVIN G S TRATEGY
To rebuild an aggregate, we need all events that were ever emitted by it. Our Aggregates will
usually have a unique ID, in particular UUID. In other words, we should be able to query
our event store by Aggregate’’s ID. This requirement can be easily met by many substantially
varying database engines.
Redis
This is a popular data-structure DB that can store data in a few handy structures, such as:
• strings
• lists
• sets
• sorted sets
Assuming Redis is our choice, we would store events using lists and use UUID Aggregate as
a part of a key name. To retrieve all events for a given UUID, we would use the following
query:
LRANGE event_stream_f42d9a33-81da-45ba-a066-32de5e747067 0 -1
WARNING: Redis’ implementation of lists causes such queries to lower performance
linearly with an increase of a number of events for the given Aggregate52. This means Redis is
not an optimal choice.
A natural way of modeling event stream using RDBMS is to create a table for all of them.
Although we are going to store many types of events with different fields, it is not feasible
to create a separate table for each one. Firstly, the performance of querying will suffer.
Secondly, it will make queries more complicated to implement and maintain. Thirdly, events
may evolve over time, including additional data. Of course, we can not change events from
the past, but somehow we would have to store new ones with altered structure alongside
old ones. So staying with the one-table-for-all-events design is the right thing to do.
events
-------------------------------------------------------
| uuid | aggregate_uuid | name | data | <sort_column> |
-------------------------------------------------------
On the other hand, we may consider creating a separate table for each Aggregate type to get
logical shards of data. Due to the nature of Aggregates (they are separated from each other),
one has also a possibility to shard by aggregate_uuid.
Each event has also its own uuid. aggregate_uuid is a column allowing us for easily querying
by it. We should put an index on it. name is self explanatory (e.g. OrderConfirmed). Then we
have a flexible part – data. We will store JSON-encoded details inside. Depending on a
chosen database engine, we would use a dedicated data type (PostgreSQL supports JSON
columns) or simply TEXT. Finally, we need a column to sort by. Depending on the chosen
design, it may be either created_at timestamp or version. At least one of them has to be
assigned to events when they are created in code.
Querying is trivial:
# MongoDB
db.events.find(
{aggregate_uuid: 'f42d9a33-81da-45ba-a066-32de5e747067'}
)
# RethinkDB
r.table('events').get_all(
'f42d9a33-81da-45ba-a066-32de5e747067', index='aggregate_uuid'
).run()
S TORIN G S TRATEGY
As I have already mentioned, we never delete events. Event Sourcing does not limit the
number of events by Aggregate, so we should be prepared that our events table/collection
will grow indefinitely. Therefore, we need a database that is able to scale and maintain
approximate read/write times regardless of the number of events (up to some extent, of
course).
Events are our source of truth, so we can not afford any data loss. Thus, we need a database
with strong consistency guarantees.
Event Sourcing assumes that only one Aggregate should be saved within one business
operation. Saving a single Aggregate has to be atomic. We do not need to have a full-fledged
all-or-nothing guarantee as it is with relational SQL databases that spans entire HTTP
request or whatever. We just need to make sure that once we attempt to save changes to
Aggregate and bump up its version, it will be an atomic operation.
Protection against concurrent updates is not something I can pass over in silence. Frankly, I
find it bizarre that most of articles about Event Sourcing implementation do not say a word
about these issues and possible solutions. A commonly used READ - MODIFY - WRITE
approach is an invitation for race conditions. It would be great if a database engine provided
means to implement an optimistic clocking to prevent bad things from happening. Of
course, we could work around this problem using pessimistic locking and Redis, yet it
makes implementation harder.
REQUIREMENTS WRAP-UP
Table design
Not much changed for events table design. We have both created_at and version columns,
though for sorting it version will be used:
events
--------------------------------------------------------------
| uuid | aggregate_uuid | name | data | created_at | version |
--------------------------------------------------------------
However, to get simple protection against concurrent updates, we are going to use one extra
table:
aggregates
------------------
| uuid | version |
------------------
A version will be bumped up by one every time we have some events to save. Using
additional condition in UPDATE query and a returned number of affected rows, we can
easily tell if we won the race or not:
If this query returns affected rows count equal 1 – we are good to go. Otherwise, it means
someone changed history in the meantime and we should raise
ConcurrentStreamWriteError.
Note there are no foreign key constraints present. They were omitted on purpose as a form
of optimization.
There are numerous ways of getting data from these tables to Python. One of them is using
an ORM. Mapping in SQLAlchemy can look like this:
class AggregateModel(Base):
"__tablename"__ = "aggregates"
uuid = Column(
postgresql.UUID, primary_key=True
)
version = Column(BigInteger, default=1)
class EventModel(Base, EventMixin):
"__tablename"__ = "events"
"__table_args"__ = (
Index(
"ix_events_aggregate_version",
"aggregate_uuid",
"version",
),
)
uuid = Column(
postgresql.UUID, primary_key=True
)
aggregate_uuid = Column(
postgresql.UUID,
ForeignKey("aggregates.uuid"),
)
name = Column(VARCHAR(50))
data = Column(postgresql.JSON, nullable=True)
created_at = Column(DateTime(timezone=True))
version = Column(BigInteger)
aggregate = relationship(
AggregateModel,
uselist=False,
backref="events",
)
class PostgreSQLEventStore(EventStore):
def !__init!__(self, session: Session) "-> None:
# we rely on SQLAlchemy, so we need Session to be passed for future usage
self._session = session
def load_stream(
self, aggregate_uuid: UUID
) "-> EventStream:
events_query: List[
EventModel
] = self._session.query(
EventModel
).filter(
EventModel.aggregate_uuid
"== str(aggregate_uuid)
).order_by(
EventModel.version
).all()
if not events:
raise NotFound
aggregate_uuid = events[0].aggregate_uuid
aggregate_version = events[-1].version
events_objects = [
self._to_event_object(model)
for model in events
]
return EventStream(
aggregate_uuid,
events_objects,
aggregate_version,
)
def _to_event_object(
self, event_model: EventModel
) "-> Event:
event_cls = Event.subclass_for_name(
event_model.name
)
return event_cls(
created_at=event_model.created_at,
version=event_model.version,
"**event_model.data
)
The only magic part is in _to_event_object method. This uses class method of Event that
keeps all known subclasses in an internal dictionary. After getting the right class by name,
we may easily reconstruct it using unpacking with double asterisk.
One should consider abandoning rich ORM capabilities in favor of lighter SQLAlchemy core
or even raw queries to gain some performance boost. ORM does not add much value here
and is definitely less efficient than other methods.
class PostgreSQLEventStore(EventStore):
""...
def append_to_stream(
self,
changes: AggregateChanges,
) "-> None:
if not changes.events:
raise NoEventsToAppend
if changes.expected_version:
self._perform_update(changes)
else:
self._perform_create(changes)
self._insert_events(changes)
def _perform_update(
self, changes: AggregateChanges
) "-> None:
stmt = (
AggregateModel."__table"__.update()
.values(
version=changes.expected_version
+ 1
)
.where(
(
AggregateModel.version
"== changes.expected_version
)
& (
AggregateModel.uuid
"== changes.aggregate_uuid
)
)
)
connection = self._session.connection()
result = connection.execute(stmt)
if result.rowcount "!= 1:
# optimistic lock failed
raise ConcurrentStreamWriteError
def _perform_create(
self, changes: AggregateChanges
) "-> None:
stmt = AggregateModel."__table"__.insert().values(
uuid=str(changes.aggregate_uuid),
version=1,
)
self._session.connection().execute(stmt)
def _insert_events(
self, changes: AggregateChanges
) "-> None:
connection = self._session.connection()
for event in changes.events:
connection.execute(
EventModel."__table"__.insert().values(
uuid=str(uuid4()),
aggregate_uuid=str(
changes.aggregate_uuid
),
name=event."__class"__."__name"__,
data=event.as_dict(),
created_at=event.created_at,
version=event.version,
)
)
The method works in one of two modes. If it is a first save of an Aggregate, then we do insert
to aggregates table. Otherwise, we increment the version by one using conditional
UPDATE. One can tell from a number of updated rows if someone has not updated
aggregate in the meantime and if so, raise ConcurrentStreamWriteError.
Whenever you want to load an aggregate from Event Store you need its UUID:
event_store = PostgreSQLEventStore(session)
event_stream = event_store.load_stream(
UUID(“36b89a56-cbf1-45f2-9f94-b7958481e3d1”)
)
order = Order(event_stream)
order.confirm()
event_store.append_to_stream(order.changes)
@retry(
retry_on_exception=lambda exc: isinstance(
exc, ConcurrentStreamWriteError
)
)
def cancel_order(
event_store: EventStore, order_uuid: UUID
) "-> None:
event_stream = event_store.load_stream(
order_uuid
)
order = Order(event_stream)
order.cancel()
event_store.append_to_stream(order.changes)
Assume there is a race condition between setting two statuses, cancelled and confirmed. If
latter wins, then code above will raise ConcurrentStreamWriteError during execution of
event_store.append_to_stream. @retry decorator will take care of rerunning the whole thing
and reloading the entire Aggregate with the most recent version. Provided there are no more
concurrent updates we finally are able to cancel our newly confirmed order OR raise another
exception if our business rules do not allow for cancelling an order that is confirmed. It is
crucial to have a limited number of retries and not try to retry indefinitely. In highly
concurrent systems this may lead to unpleasant long-running retry races.
Although Event Store itself is an abstraction we can hide it further by using a familiar
Repository pattern and treat Order Aggregate as an Entity without knowing it uses Event
Sourcing:
class OrdersRepository:
def !__init!__(
self, event_store: EventStore
) "-> None:
self._event_store = event_store
SNAPSHOTS
The whole idea with rebuilding the state of an Aggregate from a potentially very long event
stream may sound risky from a performance point of view. In case of Order Aggregate, it is
rather unlikely that this is going to be a real problem since orders are relatively short-living
(minutes, hours, maybe days) so their event stream should not exceed several events in
length.
@dataclass(frozen=True)
class OrderSnapshot(Event):
customer_id: CustomerId
status: OrderStatus
lines: Dict[ProductId, int]
class Order:
""...
return OrderSnapshot(
datetime.now(tz=pytz.UTC),
version,
self._customer_id,
self._status,
self._lines.copy(),
)
Restoring a state using a snapshot is as simple as applying events. One extends _apply
method of the Aggregate:
class Order:
""...
The next step is to extend the logic of fetching event stream for a given Aggregate’s UUID in
Event Store. The algorithm is pretty simple - first, one looks for the newest snapshot. If there
is none, one fetches all existing events. Otherwise, we fetch only events that are newer than
our snapshot + the snapshot itself.
class PostgreSQLEventStore(EventStore):
""...
def load_stream(
self, aggregate_uuid: UUID
) "-> EventStream:
events_query: List[
EventModel
] = self._session.query(
EventModel
).filter(
EventModel.aggregate_uuid
"== str(aggregate_uuid)
).order_by(
EventModel.version
)
try:
latest_snapshot = (
self._session.query(SnapshotModel)
.filter(
SnapshotModel.aggregate_uuid
"== str(aggregate_uuid)
)
.order_by(
SnapshotModel.version.desc()
)
.limit(1)
.one()
)
except exc.NoResultFound:
events = events_query.all()
else:
# for this to work, snapshot has to
# be created AFTER the last command
newer_events = events_query.filter(
EventModel.version
> latest_snapshot.version
)
events = [
latest_snapshot
] + newer_events.all()
if not events:
raise NotFound
aggregate_uuid = events[0].aggregate_uuid
aggregate_version = events[-1].version
events_objects = [
self._to_event_object(model)
for model in events
]
return EventStream(
aggregate_uuid,
events_objects,
aggregate_version,
)
In this simple example, snapshots were kept in the same table as events. In contrast to
events, snapshots can be deleted. Actually, they are fully disposable since one can always
take another one. Hence, it would make sense to keep them in a separate table and keep just
one, the latest snapshot of any Aggregate.
The only remaining question is when and how to create snapshots? There are several
options. One might generate them asynchronously by looking for Aggregates that have more
than N events newer than its latest snapshot. Another option is to put the logic of
snapshotting in a Repository or Event Store. In the latter case, we would have to change the
signature of append_to_stream method, so it accepts an entire Aggregate.
For example, we might take a snapshot of an Aggregate every time version is a multiplication
of 100:
class OrdersRepository:
""...
class EventStore(abc.ABC):
""...
@abc.abstractmethod
def save_snapshot(
self,
aggregate_uuid: UUID,
snapshot: Event,
) "-> None:
pass
The corresponding concrete implementation is fairly simple; we just save our special Event
subclass to a designated table:
class PostgreSQLEventStore(EventStore):
""...
def save_snapshot(
self,
aggregate_uuid: UUID,
snapshot: Event,
) "-> None:
self._session.connection().execute(
SnapshotModel."__table"__.insert().values(
uuid=str(uuid4()),
aggregate_uuid=str(
aggregate_uuid
),
name=snapshot."__class"__."__name"__,
data=snapshot.as_dict(),
created_at=snapshot.created_at,
version=snapshot.version,
)
)
Be aware that this implementation of snapshots creating and storing does certain
assumptions. For example, the version on the latest snapshot is always equal to version of
the newest event in AggregateChanges. This is reflected also in code for load_stream method
that has to have a way to order snapshot and events correctly to successfully restore
Aggregate’s state.
PROJECTIONS
Even though having a complete history may be invaluable, querying data or displaying
anything to an end-user would be a nightmare if all we had were bare events. Since the
latter is a source of truth, we may use them to generate tailored read models - so-called
projections. A projection is a super simple concept - it can be as simple as a function that
takes an event stream and produces a document meant to be displayed.
class OrderProjection(Base):
"__tablename"__ = "projection_orders"
uuid = Column(
postgresql.UUID, primary_key=True
)
version = Column(BigInteger)
customer_id = Column(Integer)
status = Column(VARCHAR(64))
total_quantity = Column(Integer)
lines = Column(postgresql.JSON)
updated_at = Column(DateTime(timezone=True))
Then, we need actual projection function which will iterate over Events from the stream and
appropriately create/update the flattened view of our Aggregate:
def project_order(
session: Session,
aggregate_uuid: UUID,
events: List[Event],
) "-> None:
def _update_version_and_ts(
projection: OrderProjection, event: Event
) "-> None:
projection.version = event.version
projection.updated_at = event.created_at
@singledispatch
def project(event: Event) "-> None:
raise ValueError(
f"Unknown event: {type(event)}"
)
@project.register
def _(event: OrderDrafted) "-> None:
projection = OrderProjection(
uuid=str(aggregate_uuid),
customer_id=event.customer_id,
status="NEW",
total_quantity=0,
lines={},
)
_update_version_and_ts(projection, event)
session.add(projection)
session.flush()
@project.register
def _(event: OrderConfirmed) "-> None:
projection = session.query(
OrderProjection
).get(str(aggregate_uuid))
projection.status = "CONFIRMED"
_update_version_and_ts(projection, event)
session.flush()
@project.register
def _(event: NewProductAdded) "-> None:
projection = session.query(
OrderProjection
).get(str(aggregate_uuid))
projection.lines = {
"**projection.lines,
"**{
str(
event.product_id
): event.quantity
},
}
projection.total_quantity += (
event.quantity
)
_update_version_and_ts(projection, event)
session.flush()
@project.register
def _(
event: ProductQuantityIncreased,
) "-> None:
projection = session.query(
OrderProjection
).get(str(aggregate_uuid))
lines_idx = str(event.product_id)
projection.lines = {
"**projection.lines,
"**{
lines_idx: projection.lines[
lines_idx
]
+ event.quantity
},
}
projection.total_quantity += (
event.quantity
)
_update_version_and_ts(projection, event)
session.flush()
This implementation is the simplest example, tightly coupled with SQLAlchemy Core. One
might also imagine splitting this into two stages. The first would be technology agnostic and
would produce simple data structures, like dictionaries.
The second stage would be infrastructure-specific and would save calculated projections into
designated places:
class AccountBalance(TypedDict):
account_uuid: UUID
balance: Decimal
def project_account_balance(
events: List[Event],
) "-> AccountBalance:
result = {
"account_uuid": events[0].account_uuid,
"balance": Decimal("0.00"),
}
for event in events:
if isinstance(event, CashDeposited):
result["balance"] += event.amount
elif isinstance(event, CashWithdrawn):
result["balance"] -= event.amount
return result
Then, the infrastructure-specific counterpart of the projection has to persist such it. For
example, if one was to use PostgreSQL, they could use UPSERT to update the read model
conveniently.
The advantage of such a solution is that it increases unit-testability of a module it lives in.
Assuming that the first stage (transforming events into simple data structures) is an
application-specific thing, one could call Use Cases and test if projections change as expected.
Projections can be generated within the same transaction/process, but nothing stands
against processing them in the background. That is even better in terms of scalability,
though also means making friends with eventual consistency - read model data is not
consistent with the write model.
Last but not least, projections are disposable just like snapshots. We should design them in
such a way that it should be possible to regenerate it without extra fuss.
E V E N T S O U R C I N G I S A P R I VAT E D E TA I L O F A M O D U L E
(INCLUDING TESTING)
Knowledge about using Event Sourcing should not leak outside of a module that uses it. No
code outside a module should know about Event Sourcing events. Additionally, one should
be able to unit-test such a module as a whole without inspecting what ES events were
generated. For testing, we have virtually the same possibilities like for non-ES module - we
call its Use Cases and check the behavior of the module. The biggest advantage of ES module
is that we can treat projections as part of the module’s API if only we split them into the
two-phase process as it was described a few paragraphs earlier.
We still may be checking for events when we write tests at the level of Aggregates, though.
Domain Events may be used in combination with Event Sourcing modules. If there is an event
that should trigger action in other modules, we may use Domain Events in our Aggregates as
an addition to ES events. The implementation may be the same as for Entities - Aggregate
keeps Domain Events in its field (separately from ES events!). Later after saving an Aggregate,
Repository picks up Domain Events and publishes them via Event Bus.
BIBLIOGRAPHY
Andrea Saltarello, Dino Esposito, Microsoft .NET: Architecting Applications for the
Enterprise, Second Edition
Andrew Hunt, David Thomas, The Pragmatic Programmer: your journey to mastery, 20th
Anniversary Edition, 2nd Edition
Bert Bates, Eric Freeman, Elisabeth Robson, Kathy Sierra, Head First Design Patterns
Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides, Design Patterns: Elements of
Reusable Object-Oriented Software
Gerard Meszaros, xUnit Test Patterns: Refactoring Test Code 1st Edition
Kirk Knoernschild, Java Application Architecture: Modularity Patterns with Examples Using
OSGi
Martin Fowler, Refactoring: Improving the Design of Existing Code 2nd edition
Mary Poppendieck, Tom Poppendieck, Leading Lean Software Development: Results Are Not
the Point
Nick Chamberlain, Applying Domain-Driven Design with CQRS and Event Sourcing
https://buildplease.com/products/fpc/