Azure Architecture Center Micro - Microsoft
Azure Architecture Center Micro - Microsoft
Reference Architectures
A set of recommended architectures for Azure. Each architecture includes best practices, prescriptive steps, and a deployable
solution.
NEW
Microservices have become a popular architectural style for building cloud applications that are resilient, highly
scalable, and able to evolve quickly. To be more than just a buzzword, however, microservices require a different
approach to designing and building applications.
In this set of articles, we explore how to build and run a microservices architecture on Azure. Topics include:
Using Domain Driven Design (DDD) to design a microservices architecture.
Choosing the right Azure technologies for compute, storage, messaging, and other elements of the design.
Understanding microservices design patterns.
Designing for resiliency, scalability, and performance.
Building a CI/CD pipeline.
Throughout, we focus on an end-to-end scenario: A drone delivery service that lets customers schedule packages to
be picked up and delivered via drone. You can find the code for our reference implementation on GitHub
Reference implementation
But first, let's start with fundamentals. What are microservices, and what are the advantages of adopting a
microservices architecture?
No free lunch
These benefits don't come for free. This series of articles is designed to address some of the challenges of building
microservices that are resilient, scalable, and manageable.
Service boundaries. When you build microservices, you need to think carefully about where to draw the
boundaries between services. Once services are built and deployed in production, it can be hard to refactor
across those boundaries. Choosing the right service boundaries is one of the biggest challenges when
designing a microservices architecture. How big should each service be? When should functionality be
factored across several services, and when should it be kept inside the same service? In this guide, we
describe an approach that uses domain-driven design to find service boundaries. It starts with Domain
analysis to find the bounded contexts, then applies a set of tactical DDD patterns based on functional and
non-functional requirements.
Data consistency and integrity. A basic principle of microservices is that each service manages its own
data. This keeps services decoupled, but can lead to challenges with data integrity or redundancy. We explore
some of these issues in the Data considerations.
Network congestion and latency. The use of many small, granular services can result in more interservice
communication and longer end-to-end latency. The chapter Interservice communication describes
considerations for messaging between services. Both synchronous and asynchronous communication have a
place in microservices architectures. Good API design is important so that services remain loosely coupled,
and can be independently deployed and updated.
Complexity. A microservices application has more moving parts. Each service may be simple, but the
services have to work together as a whole. A single user operation may involve multiple services. In the
chapter Ingestion and workflow, we examine some of the issues around ingesting requests at high
throughput, coordinating a workflow, and handling failures.
Communication between clients and the application. When you decompose an application into many
small services, how should clients communicate with those services? Should a client call each individual
service directly, or route requests through an API Gateway?
Monitoring. Monitoring a distributed application can be a lot harder than a monolithic application, because
you must correlate telemetry from multiple services. The chapter Logging and monitoring addresses these
concerns.
Continuous integration and delivery (CI/CD). One of the main goals of microservices is agility. To
achieve this, you must have automated and robust CI/CD, so that you can quickly and reliably deploy
individual services into test and production environments.
NOTE
For help in choosing between a microservices architecture and other architectural styles, see the Azure Application
Architecture Guide.
Our reference implementation uses Kubernetes with Azure Container Service (ACS). However, many of the high-
level architectural decisions and challenges will apply to any container orchestrator, including Azure Service Fabric.
Domain analysis
Designing microservices: Domain analysis
12/12/2017 • 13 min to read • Edit Online
One of the biggest challenges of microservices is to define the boundaries of individual services. The general rule
is that a service should do "one thing" — but putting that rule into practice requires careful thought. There is no
mechanical process that will produce the "right" design. You have to think deeply about your business domain,
requirements, and goals. Otherwise, you can end up with a haphazard design that exhibits some undesirable
characteristics, such as hidden dependencies between services, tight coupling, or poorly designed interfaces. In this
chapter, we take a domain-driven approach to designing microservices.
Microservices should be designed around business capabilities, not horizontal layers such as data access or
messaging. In addition, they should have loose coupling and high functional cohesion. Microservices are loosely
coupled if you can change one service without requiring other services to be updated at the same time. A
microservice is cohesive if it has a single, well-defined purpose, such as managing user accounts or tracking
delivery history. A service should encapsulate domain knowledge and abstract that knowledge from clients. For
example, a client should be able to schedule a drone without knowing the details of the scheduling algorithm or
how the drone fleet is managed.
Domain-driven design (DDD) provides a framework that can get you most of the way to a set of well-designed
microservices. DDD has two distinct phases, strategic and tactical. In strategic DDD, you are defining the large-scale
structure of the system. Strategic DDD helps to ensure that your architecture remains focused on business
capabilities. Tactical DDD provides a set of design patterns that you can use to create the domain model. These
patterns include entities, aggregates, and domain services. These tactical patterns will help you to design
microservices that are both loosely coupled and cohesive.
In this chapter and the next, we'll walk through the following steps, applying them to the Drone Delivery
application:
1. Start by analyzing the business domain to understand the application's functional requirements. The output
of this step is an informal description of the domain, which can be refined into a more formal set of domain
models.
2. Next, define the bounded contexts of the domain. Each bounded context contains a domain model that
represents a particular subdomain of the larger application.
3. Within a bounded context, apply tactical DDD patterns to define entities, aggregates, and domain services.
4. Use the results from the previous step to identity the microservices in your application.
In this chapter, we cover the first three steps, which are primarily concerned with DDD. In the next chapter, we will
identify the microservices. However, it's important to remember that DDD is an iterative, ongoing process. Service
boundaries aren't fixed in stone. As an application evolves, you may decide to break apart a service into several
smaller services.
NOTE
This chapter is not meant to show a complete and comprehensive domain analysis. We deliberately kept the example brief, in
order to illustrate the main points. For more background on DDD, we recommend Eric Evans' Domain-Driven Design, the
book that first introduced the term. Another good reference is Implementing Domain-Driven Design by Vaughn Vernon.
Shipping is placed in the center of the diagram, because it's core to the business. Everything else in the
diagram exists to enable this functionality.
Drone management is also core to the business. Functionality that is closely related to drone management
includes drone repair and using predictive analysis to predict when drones need servicing and maintenance.
ETA analysis provides time estimates for pickup and delivery.
Third-party transportation will enable the application to schedule alternative transportation methods if a
package cannot be shipped entirely by drone.
Drone sharing is a possible extension of the core business. The company may have excess drone capacity
during certain hours, and could rent out drones that would otherwise be idle. This feature will not be in the
initial release.
Video surveillance is another area that the company might expand into later.
User accounts, Invoicing, and Call center are subdomains that support the core business.
Notice that at this point in the process, we haven't made any decisions about implementation or technologies.
Some of the subsystems may involve external software systems or third-party services. Even so, the application
needs to interact with these systems and services, so it's important to include them in the domain model.
NOTE
When an application depends on an external system, there is a risk that the external system's data schema or API will leak
into your application, ultimately compromising the architectural design. This is particularly true with legacy systems that may
not follow modern best practices, and may use convoluted data schemas or obsolete APIs. In that case, it's important to
have a well-defined boundary between these external systems and the application. Consider using the Strangler Pattern or
the Anti-Corruption Layer Pattern for this purpose.
Video
surveillance Drone management
Drone sharing
Shipping
Third-party
transportation
External system
Accounts
Call center
External system
Bounded contexts are not necessarily isolated from one another. In this diagram, the solid lines connecting the
bounded contexts represent places where two bounded contexts interact. For example, Shipping depends on User
Accounts to get information about customers, and on Drone Management to schedule drones from the fleet.
In the book Domain Driven Design, Eric Evans describes several patterns for maintaining the integrity of a domain
model when it interacts with another bounded context. One of the main principles of microservices is that services
communicate through well-defined APIs. This approach corresponds to two patterns that Evans calls Open Host
Service and Published Language. The idea of Open Host Service is that a subsystem defines a formal protocol (API)
for other subsystems to communicate with it. Published Language extends this idea by publishing the API in a
form that other teams can use to write clients. In the chapter on API Design, we discuss using OpenAPI
Specification (formerly known as Swagger) to define language-agnostic interface descriptions for REST APIs,
expressed in JSON or YAML format.
For the rest of this journey, we will focus on the Shipping bounded context.
Tactical DDD
During the strategic phase of DDD, you are mapping out the business domain and defining bounded contexts for
your domain models. Tactical DDD is when you define your domain models with more precision. The tactical
patterns are applied within a single bounded context. In a microservices architecture, we are particularly interested
in the entity and aggregate patterns. Applying these patterns will help us to identify natural boundaries for the
services in our application (see next chapter). As a general principle, a microservice should be no smaller than an
aggregate, and no larger than a bounded context. First, we'll review the tactical patterns. Then we'll apply them to
the Shipping bounded context in the Drone Delivery application.
Overview of the tactical patterns
This section provides a brief summary of the tactical DDD patterns, so if you are already familiar with DDD, you
can probably skip this section. The patterns are described in more detail in chapters 5 – 6 of Eric Evans' book, and
in Implementing Domain-Driven Design by Vaughn Vernon.
Entities. An entity is an object with a unique identity that persists over time. For example, in a banking application,
customers and accounts would be entities.
An entity has a unique identifier in the system, which can be used to look up or retrieve the entity. That doesn't
mean the identifier is always exposed directly to users. It could be a GUID or a primary key in a database.
An identity may span multiple bounded contexts, and may endure beyond the lifetime of the application. For
example, bank account numbers or government-issued IDs are not tied to the lifetime of a particular
application.
The attributes of an entity may change over time. For example, a person's name or address might change, but
they are still the same person.
An entity can hold references to other entities.
Value objects. A value object has no identity. It is defined only by the values of its attributes. Value objects are
also immutable. To update a value object, you always create a new instance to replace the old one. Value objects
can have methods that encapsulate domain logic, but those methods should have no side-effects on the object's
state. Typical examples of value objects include colors, dates and times, and currency values.
Aggregates. An aggregate defines a consistency boundary around one or more entities. Exactly one entity in an
aggregate is the root. Lookup is done using the root entity's identifier. Any other entities in the aggregate are
children of the root, and are referenced by following pointers from the root.
The purpose of an aggregate is to model transactional invariants. Things in the real world have complex webs of
relationships. Customers create orders, orders contain products, products have suppliers, and so on. If the
application modifies several related objects, how does it guarantee consistency? How do we keep track of
invariants and enforce them?
Traditional applications have often used database transactions to enforce consistency. In a distributed application,
however, that's often not feasible. A single business transaction may span multiple data stores, or may be long
running, or may involve third-party services. Ultimately it's up to the application, not the data layer, to enforce the
invariants required for the domain. That's what aggregates are meant to model.
NOTE
An aggregate might consist of a single entity, without child entities. What makes it an aggregate is the transactional
boundary.
Domain and application services. In DDD terminology, a service is an object that implements some logic
without holding any state. Evans distinguishes between domain services, which encapsulate domain logic, and
application services, which provide technical functionality, such as user authentication or sending an SMS
message. Domain services are often used to model behavior that spans multiple entities.
NOTE
The term service is overloaded in software development. The definition here is not directly related to microservices.
Domain events. Domain events can be used to notify other parts of the system when something happens. As the
name suggests, domain events should mean something within the domain. For example, "a record was inserted
into a table" is not a domain event. "A delivery was cancelled" is a domain event. Domain events are especially
relevant in a microservices architecture. Because microservices are distributed and don't share data stores, domain
events provide a way for microservices to coordinate with each other. The chapter Interservice communication
discusses asynchronous messaging in more detail.
There are a few other DDD patterns not listed here, including factories, repositories, and modules. These can be
useful patterns for when you are implementing a microservice, but they are less relevant when designing the
boundaries between microservice.
What is the right size for a microservice? You often hear something to the effect of, "not too big and not too small"
— and while that's certainly correct, it's not very helpful in practice. But if you start from a carefully designed
domain model, it's much easier to reason about microservices.
This chapter describes considerations for managing data in a microservices architecture. Because every
microservice manages its own data, data integrity and data consistency are critical challenges.
A basic principle of microservices is that each service manages its own data. Two services should not share a data
store. Instead, each service is responsible for its own private data store, which other services cannot access directly.
The reason for this rule is to avoid unintentional coupling between services, which can result if services share the
same underlying data schemas. If there is a change to the data schema, the change must be coordinated across
every service that relies on that database. By isolating each service's data store, we can limit the scope of change,
and preserve the agility of truly independent deployments. Another reason is that each microservice may have its
own data models, queries, or read/write patterns. Using a shared data store limits each team's ability to optimize
data storage for their particular service.
This approach naturally leads to polyglot persistence — the use of multiple data storage technologies within a
single application. One service might require the schema-on-read capabilities of a document database. Another
might need the referential integrity provided by an RDBMS. Each team is free to make the best choice for their
service. For more about the general principle of polyglot persistence, see Use the best data store for the job.
NOTE
It's fine for services to share the same physical database server. The problem occurs when services share the same schema,
or read and write to the same set of database tables.
Challenges
Some challenges arise from this distributed approach to managing data. First, there may be redundancy across the
data stores, with the same item of data appearing in multiple places. For example, data might be stored as part of a
transaction, then stored elsewhere for analytics, reporting, or archiving. Duplicated or partitioned data can lead to
issues of data integrity and consistency. When data relationships span multiple services, you can't use traditional
data management techniques to enforce the relationships.
Traditional data modeling uses the rule of "one fact in one place." Every entity appears exactly once in the schema.
Other entities may hold references to it but not duplicate it. The obvious advantage to the traditional approach is
that updates are made in a single place, which avoids problems with data consistency. In a microservices
architecture, you have to consider how updates are propagated across services, and how to manage eventual
consistency when data appears in multiple places without strong consistency.
Communication between microservices must be efficient and robust. With lots of small services interacting to
complete a single transaction, this can be a challenge. In this chapter, we look at the tradeoffs between
asynchronous messaging versus synchronous APIs. Then we look at some of the challenges in designing resilient
interservice communication, and the role that a service mesh can play.
Challenges
Here are some of the main challenges arising from service-to-service communication. Service meshes, described
later in this chapter, are designed to handle many of these challenges.
Resiliency. There may be dozens or even hundreds of instances of any given microservice. An instance can fail for
any number of reasons. There can be a node-level failure, such as a hardware failure or a VM reboot. An instance
might crash, or be overwhelmed with requests and unable to process any new requests. Any of these events can
cause a network call to fail. There are two design patterns that can help make service-to-service network calls
more resilient:
Retry. A network call may fail because of a transient fault that goes away by itself. Rather than fail outright,
the caller should typically retry the operation a certain number of times, or until a configured time-out
period elapses. However, if an operation is not idempotent, retries can cause unintended side effects. The
original call might succeed, but the caller never gets a response. If the caller retries, the operation may be
invoked twice. Generally, it's not safe to retry POST or PATCH methods, because these are not guaranteed
to be idempotent.
Circuit Breaker. Too many failed requests can cause a bottleneck, as pending requests accumulate in the
queue. These blocked requests might hold critical system resources such as memory, threads, database
connections, and so on, which can cause cascading failures. The Circuit Breaker pattern can prevent a
service from repeatedly trying an operation that is likely to fail.
Load balancing. When service "A" calls service "B", the request must reach a running instance of service "B". In
Kubernetes, the Service resource type provides a stable IP address for a group of pods. Network traffic to the
service's IP address gets forwarded to a pod by means of iptable rules. By default, a random pod is chosen. A
service mesh (see below) can provide more intelligent load balancing algorithms based on observed latency or
other metrics.
Distributed tracing. A single transaction may span multiple services. That can make it hard to monitor the overall
performance and health of the system. Even if every service generates logs and metrics, without some way to tie
them together, they are of limited use. The chapter Logging and monitoring talks more about distributed tracing,
but we mention it here as a challenge.
Service versioning. When a team deploys a new version of a service, they must avoid breaking any other
services or external clients that depend on it. In addition, you might want to run multiple versions of a service side-
by-side, and route requests to a particular version. See API Versioning for more discussion of this issue.
TLS encryption and mutual TLS authentication. For security reasons, you may want to encrypt traffic between
services with TLS, and use mutual TLS authentication to authenticate callers.
NOTE
Service mesh is an example of the Ambassador pattern — a helper service that sends network requests on behalf of the
application.
Right now, the main options for a service mesh in Kubernetes are linkerd and Istio. Both of these technologies are
evolving rapidly. At the time we wrote this guide, the latest Istio release is 0.2, so it is still very new. However, some
features that both linkerd and Istio have in common include:
Load balancing at the session level, based on observed latencies or number of outstanding requests. This
can improve performance over the layer-4 load balancing that is provided by Kubernetes.
Layer-7 routing based on URL path, Host header, API version, or other application-level rules.
Retry of failed requests. A service mesh understands HTTP error codes, and can automatically retry failed
requests. You can configure that maximum number of retries, along with a timeout period in order to
bound the maximum latency.
Circuit breaking. If an instance consistently fails requests, the service mesh will temporarily mark it as
unavailable. After a backoff period, it will try the instance again. You can configure the circuit breaker based
on various criteria, such as the number of consecutive failures,
Service mesh captures metrics about interservice calls, such as the request volume, latency, error and
success rates, and response sizes. The service mesh also enables distributed tracing by adding correlation
information for each hop in a request.
Mutual TLS Authentication for service-to-service calls.
Do you need a service mesh? The value they add to a distributed system is certainly compelling. If you don't have
a service mesh, you will need to consider each of the challenges mentioned at the beginning of the chapter. You
can solve problems like retry, circuit breaker, and distributed tracing without a service mesh, but a service mesh
moves these concerns out of the individual services and into a dedicated layer. On the other hand, service meshes
are a relatively new technology that is still maturing. Deploying a service mesh adds complexity to the setup and
configuration of the cluster. There may be performance implications, because requests now get routed through
the service mesh proxy, and because extra services are now running on every node in the cluster. You should do
thorough performance and load testing before deploying a service mesh in production.
API design
Designing microservices: API design
12/12/2017 • 10 min to read • Edit Online
Good API design is important in a microservices architecture, because all data exchange between services happens
either through messages or API calls. APIs must be efficient to avoid creating chatty I/O. Because services are
designed by teams working independently, APIs must have well-defined semantics and versioning schemes, so
that updates don't break other services.
Considerations
Here are some things to think about when choosing how to implement an API.
REST vs RPC. Consider the tradeoffs between using a REST-style interface versus an RPC-style interface.
REST models resources, which can be a natural way express your domain model. It defines a uniform
interface based on HTTP verbs, which encourages evolvability. It has well-defined semantics in terms of
idempotency, side effects, and response codes. And it enforces stateless communication, which improves
scalability.
RPC is more oriented around operations or commands. Because RPC interfaces look like local method calls,
it may lead you to design overly chatty APIs. However, that doesn't mean RPC must be chatty. It just means
you need to use care when designing the interface.
For a RESTful interface, the most common choice is REST over HTTP using JSON. For an RPC-style interface, there
are several popular frameworks, including gRPC, Apache Avro, and Apache Thrift.
Efficiency. Consider efficiency in terms of speed, memory, and payload size. Typically a gRPC-based interface is
faster than REST over HTTP.
Interface definition language (IDL). An IDL is used to define the methods, parameters, and return values of an
API. An IDL can be used to generate client code, serialization code, and API documentation. IDLs can also be
consumed by API testing tools such as Postman. Frameworks such as gRPC, Avro, and Thrift define their own IDL
specifications. REST over HTTP does not have a standard IDL format, but a common choice is OpenAPI (formerly
Swagger). You can also create an HTTP REST API without using a formal definition language, but then you lose the
benefits of code generation and testing.
Serialization. How are objects serialized over the wire? Options include text-based formats (primarily JSON) and
binary formats such as protocol buffer. Binary formats are generally faster than text-based formats. However,
JSON has advantages in terms of interoperability, because most languages and frameworks support JSON
serialization. Some serialization formats require a fixed schema, and some require compiling a schema definition
file. In that case, you'll need to incorporate this step into your build process.
Framework and language support. HTTP is supported in nearly every framework and language. gRPC, Avro,
and Thrift all have libraries for C++, C#, Java, and Python. Thrift and gRPC also support Go.
Compatibility and interoperability. If you choose a protocol like gRPC, you may need a protocol translation
layer between the public API and the back end. A gateway can perform that function. If you are using a service
mesh, consider which protocols are compatible with the service mesh. For example, linkerd has built-in support for
HTTP, Thrift, and gRPC.
Our baseline recommendation is to choose REST over HTTP unless you need the performance benefits of a binary
protocol. REST over HTTP requires no special libraries. It creates minimal coupling, because callers don't need a
client stub to communicate with the service. There is rich ecosystems of tools to support schema definitions,
testing, and monitoring of RESTful HTTP endpoints. Finally, HTTP is compatible with browser clients, so you don't
need a protocol translation layer between the client and the backend.
However, if you choose REST over HTTP, you should do performance and load testing early in the development
process, to validate whether it performs well enough for your scenario.
These sorts of coding practices are particularly important when building a traditional monolithic application. With
a large code base, many subsystems might use the Location object, so it's important for the object to enforce
correct behavior.
Another example is the Repository pattern, which ensures that other parts of the application do not make direct
reads or writes to the data store:
In a microservices architecture, however, services don't share the same code base and don't share data stores.
Instead, they communicate through APIs. Consider the case where the Scheduler service requests information
about a drone from the Drone service. The Drone service has its internal model of a drone, expressed through
code. But the Scheduler doesn't see that. Instead, it gets back a representation of the drone entity — perhaps a
JSON object in an HTTP response.
The Scheduler service can't modify the Drone service's internal models, or write to the Drone service's data store.
That means the code that implements the Drone service has a smaller exposed surface area, compared with code
in a traditional monolith. If the Drone service defines a Location class, the scope of that class is limited — no other
service will directly consume the class.
For these reasons, this guidance doesn't focus much on coding practices as they relate to the tactical DDD patterns.
But it turns out that you can also model many of the DDD patterns through REST APIs.
For example:
Aggregates map naturally to resources in REST. For example, the Delivery aggregate would be exposed as a
resource by the Delivery API.
Aggregates are consistency boundaries. Operations on aggregates should never leave an aggregate in an
inconsistent state. Therefore, you should avoid creating APIs that allow a client to manipulate the internal
state of an aggregate. Instead, favor coarse-grained APIs that expose aggregates as resources.
Entities have unique identities. In REST, resources have unique identifiers in the form of URLs. Create
resource URLs that correspond to an entity's domain identity. The mapping from URL to domain identity
may be opaque to client.
Child entities of an aggregate can be reached by navigating from the root entity. If you follow HATEOAS
principles, child entities can be reached via links in the representation of the parent entity.
Because value objects are immutable, updates are performed by replacing the entire value object. In REST,
implement updates through PUT or PATCH requests.
A repository lets clients query, add, or remove objects in a collection, abstracting the details of the
underlying data store. In REST, a collection can be a distinct resource, with methods for querying the
collection or adding new entities to the collection.
When you design your APIs, think about how they express the domain model, not just the data inside the model,
but also the business operations and the constraints on the data.
API versioning
An API is a contract between a service and clients or consumers of that service. If an API changes, there is a risk of
breaking clients that depend on the API, whether those are external clients or other microservices. Therefore, it's a
good idea to minimize the number of API changes that you make. Often, changes in the underlying
implementation don't require any changes to the API. Realistically, however, at some point you will want to add
new features or new capabilities that require changing an existing API.
Whenever possible, make API changes backward compatible. For example, avoid removing a field from a model,
because that can break clients that expect the field to be there. Adding a field does not break compatibility, because
clients should ignore any fields they don't understand in a response. However, the service must handle the case
where an older client omits the new field in a request.
Support versioning in your API contract. If you introduce a breaking API change, introduce a new API version.
Continue to support the previous version, and let clients select which version to call. There are a couple of ways to
do this. One is simply to expose both versions in the same service. Another option is to run two versions of the
service side-by-side, and route requests to one or the other version, based on HTTP routing rules.
There's a cost to supporting multiple versions, in terms of developer time, testing, and operational overhead.
Therefore, it's good to deprecate old versions as quickly as possible. For internal APIs, the team that owns the API
can work with other teams to help them migrate to the new version. This is when having a cross-team governance
process is useful. For external (public) APIs, it can be harder to deprecate an API version, especially if the API is
consumed by third parties or by native client applications.
When a service implementation changes, it's useful to tag the change with a version. The version provides
important information when troubleshooting errors. It can be very helpful for root cause analysis to know exactly
which version of the service was called. Consider using semantic versioning for service versions. Semantic
versioning uses a MAJOR.MINOR.PATCH format. However, clients should only select an API by the major version
number, or possibly the minor version if there are significant (but non-breaking) changes between minor versions.
In other words, it's reasonable for clients to select between version 1 and version 2 of an API, but not to select
version 2.1.3. If you allow that level of granularity, you risk having to support a proliferation of versions.
For further discussion of API versioning, see Versioning a RESTful web API.
Ingestion and workflow
Designing microservices: Ingestion and workflow
12/13/2017 • 17 min to read • Edit Online
Microservices often have a workflow that spans multiple services for a single transaction. The workflow must be
reliable; it can't lose transactions or leave them in a partially completed state. It's also critical to control the
ingestion rate of incoming requests. With many small services communicating with each other, a burst of
incoming requests can overwhelm the interservice communication.
Ingestion
Based on business requirements, the development team identified the following non-functional requirements for
ingestion:
Sustained throughput of 10K requests/sec.
Able to handle spikes of up to 50K/sec without dropping client requests or timing out.
Less than 500ms latency in the 99th percentile.
The requirement to handle occasional spikes in traffic presents a design challenge. In theory, the system could be
scaled out to handle the maximum expected traffic. However, provisioning that many resources would be very
inefficient. Most of the time, the application will not need that much capacity, so there would be idle cores, costing
money without adding value.
A better approach is to put the incoming requests into a buffer, and let the buffer act as a load leveler. With this
design, the Ingestion service must be able to handle the maximum ingestion rate over short periods, but the
backend services only need to handle the maximum sustained load. By buffering at the front end, the backend
services shouldn't need to handle large spikes in traffic. At the scale required for the Drone Delivery application,
Azure Event Hubs is a good choice for load leveling. Event Hubs offers low latency and high throughput, and is a
cost effective solution at high ingestion volumes.
For our testing, we used a Standard tier event hub with 32 partitions and 100 throughput units. We observed
about 32K events / second ingestion, with latency around 90ms. Currently the default limit is 20 throughput units,
but Azure customers can request additional throughput units by filing a support request. See Event Hubs quotas
for more information. As with all performance metrics, many factors can affect performance, such as message
payload size, so don't interpret these numbers as a benchmark. If more throughput is needed, the Ingestion
service can shard across more than one event hub. For even higher throughput rates, Event Hubs Dedicated offers
single-tenant deployments that can ingress over 2 million events per second.
It's important to understand how Event Hubs can achieve such high throughput, because that affects how a client
should consume messages from Event Hubs. Event Hubs does not implement a queue. Rather, it implements an
event stream.
With a queue, an individual consumer can remove a message from the queue, and the next consumer won't see
that message. Queues therefore allow you to use a Competing Consumers pattern to process messages in parallel
and improve scalability. For greater resiliency, the consumer holds a lock on the message and releases the lock
when it's done processing the message. If the consumer fails — for example, the node it's running on crashes —
the lock times out and the message goes back onto the queue.
Event Hubs, on the other hand, uses streaming semantics. Consumers read the stream independently at their own
pace. Each consumer is responsible for keeping track of its current position in the stream. A consumer should
write its current position to persistent storage at some predefined interval. That way, if the consumer experiences
a fault (for example, the consumer crashes, or the host fails), then a new instance can resume reading the stream
from the last recorded position. This process is called checkpointing.
For performance reasons, a consumer generally doesn't checkpoint after each message. Instead, it checkpoints at
some fixed interval, for example after processing n messages, or every n seconds. As a consequence, if a consumer
fails, some events may get processed twice, because a new instance always picks up from the last checkpoint.
There is a tradeoff: Frequent checkpoints can hurt performance, but sparse checkpoints mean you will replay more
events after a failure.
Event Hubs is not designed for competing consumers. Although multiple consumers can read a stream, each
traverses the stream independently. Instead, Event Hubs uses a partitioned consumer pattern. An event hub has
up to 32 partitions. Horizontal scale is achieved by assigning a separate consumer to each partition.
What does this mean for the drone delivery workflow? To get the full benefit of Event Hubs, the Delivery
Scheduler cannot wait for each message to be processed before moving onto the next. If it does that, it will spend
most of its time waiting for network calls to complete. Instead, it needs to process batches of messages in parallel,
using asynchronous calls to the backend services. As we'll see, choosing the right checkpointing strategy is also
important.
Workflow
We looked at three options for reading and processing the messages: Event Processor Host, Service Bus queues,
and the IoTHub React library. We chose IoTHub React, but to understand why, it helps to start with Event
Processor Host.
Event Processor Host
Event Processor Host is designed for message batching. The application implements the IEventProcessor interface,
and the Processor Host creates one event processor instance for each partition in the event hub. The Event
Processor Host then calls each event processor's ProcessEventsAsync method with batches of event messages. The
application controls when to checkpoint inside the ProcessEventsAsync method, and the Event Processor Host
writes the checkpoints to Azure storage.
Within a partition, Event Processor Host waits for ProcessEventsAsync to return before calling again with the next
batch. This approach simplifies the programming model, because your event processing code doesn't need to be
reentrant. However, it also means that the event processor handles one batch at a time, and this gates the speed at
which the Processor Host can pump messages.
NOTE
The Processor Host doesn't actually wait in the sense of blocking a thread. The ProcessEventsAsync method is
asynchronous, so the Processor Host can do other work while the method is completing. But it won't deliver another batch
of messages for that partition until the method returns.
In the drone application, a batch of messages can be processed in parallel. But waiting for the whole batch to
complete can still cause a bottleneck. Processing can only be as fast as the slowest message within a batch. Any
variation in response times can create a "long tail," where a few slow responses drag down the entire system. Our
performance tests showed that we did not achieve our target throughput using this approach. This does not mean
that you should avoid using Event Processor Host. But for high throughput, avoid doing any long-running tasks
inside the ProcesssEventsAsync method. Process each batch quickly.
IotHub React
IotHub React is an Akka Streams library for reading events from Event Hub. Akka Streams is a stream-based
programming framework that implements the Reactive Streams specification. It provides a way to build efficient
streaming pipelines, where all streaming operations are performed asynchronously, and the pipeline gracefully
handles backpressure. Backpressure occurs when an event source produces events at a faster rate than the
downstream consumers can receive them — which is exactly the situation when the drone delivery system has a
spike in traffic. If backend services go slower, IoTHub React will slow down. If capacity is increased, IoTHub React
will push more messages through the pipeline.
Akka Streams is also a very natural programming model for streaming events from Event Hubs. Instead of
looping through a batch of events, you define a set of operations that will be applied to each event, and let Akka
Streams handle the streaming. Akka Streams defines a streaming pipeline in terms of Sources, Flows, and Sinks. A
source generates an output stream, a flow processes an input stream and produces an output stream, and a sink
consumes a stream without producing any output.
Here is the code in the Scheduler service that sets up the Akka Streams pipeline:
This code configures Event Hubs as a source. The map statement deserializes each event message into a Java
class that represents a delivery request. The filter statement removes any null objects from the stream; this
guards against the case where a message can't be deserialized. The via statement joins the source to a flow that
processes each delivery request. The to method joins the flow to the checkpoint sink, which is built into IoTHub
React.
IoTHub React uses a different checkpointing strategy than Event Host Processor. Checkpoints are written by the
checkpoint sink, which is the terminating stage in the pipeline. The design of Akka Streams allows the pipeline to
continue streaming data while the sink is writing the checkpoint. That means the upstream processing stages
don't need to wait for checkpointing to happen. You can configure checkpointing to occur after a timeout or after
a certain number of messages have been processed.
The deliveryProcessor method creates the Akka Streams flow:
private static Flow<AkkaDelivery, MessageFromDevice, NotUsed> deliveryProcessor() {
return Flow.of(AkkaDelivery.class).map(delivery -> {
CompletableFuture<DeliverySchedule> completableSchedule = DeliveryRequestEventProcessor
.processDeliveryRequestAsync(delivery.getDelivery(),
delivery.getMessageFromDevice().properties());
completableSchedule.whenComplete((deliverySchedule,error) -> {
if (error!=null){
Log.info("failed delivery" + error.getStackTrace());
}
else{
Log.info("Completed Delivery",deliverySchedule.toString());
}
});
completableSchedule = null;
return delivery.getMessageFromDevice();
});
}
The flow calls a static processDeliveryRequestAsync method that does the actual work of processing each message.
Scaling with IoTHub React
The Scheduler service is designed so that each container instance reads from a single partition. For example, if the
Event Hub has 32 partitions, the Scheduler service is deployed with 32 replicas. This allows for a lot of flexibility in
terms of horizontal scaling.
Depending on the size of the cluster, a node in the cluster might have more than one Scheduler service pod
running on it. But if the Scheduler service needs more resources, the cluster can be scaled out, in order to
distribute the pods across more nodes. Our performance tests showed that the Scheduler service is memory- and
thread-bound, so performance depended greatly on the VM size and the number of pods per node.
Each instance needs to know which Event Hubs partition to read from. To configure the partition number, we took
advantage of the StatefulSet resource type in Kubernetes. Pods in a StatefulSet have a persistent identifier that
includes a numeric index. Specifically, the pod name is <statefulset name>-<index> , and this value is available to the
container through the Kubernetes Downward API. At run time, the Scheduler services reads the pod name and
uses the pod index as the partition ID.
If you needed to scale out the Scheduler service even further, you could assign more than one pod per event hub
partition, so that multiple pods are reading each partition. However, in that case, each instance would read all of
the events in the assigned partition. To avoid duplicate processing, you would need to use a hashing algorithm, so
that each instance skips over a portion of the messages. That way, multiple readers can consume the stream, but
every message is processed by only one instance.
Handling failures
There are three general classes of failure to consider.
1. A downstream service may have a non-transient failure, which is any failure that's unlikely to go away by
itself. Non-transient failures include normal error conditions, such as invalid input to a method. They also
include unhandled exceptions in application code or a process crashing. If this type of error occurs, the
entire business transaction must be marked as a failure. It may be necessary to undo other steps in the
same transaction that already succeeded. (See Compensating Transactions, below.)
2. A downstream service may experience a transient failure such as a network timeout. These errors can often
be resolved simply by retrying the call. If the operation still fails after a certain number of attempts, it's
considered a non-transient failure.
3. The Scheduler service itself might fault (for example, because a node crashes). In that case, Kubernetes will
bring up a new instance of the service. However, any transactions that were already in progress must be
resumed.
Compensating transactions
If a non-transient failure happens, the current transaction might be in a partially failed state, where one or more
steps already completed successfully. For example, if the Drone service already scheduled a drone, the drone must
be canceled. In that case, the application needs to undo the steps that succeeded, by using a Compensating
Transaction. In some cases, this must be done by an external system or even by a manual process.
If the logic for compensating transactions is complex, consider creating a separate service that is responsible for
this process. In the Drone Delivery application, the Scheduler service puts failed operations onto a dedicated
queue. A separate microservice, called the Supervisor, reads from this queue and calls a cancellation API on the
services that need to compensate. This is a variation of the Scheduler Agent Supervisor pattern. The Supervisor
service might take other actions as well, such as notify the user by text or email, or send an alert to an operations
dashboard.
Idempotent vs non-idempotent operations
To avoid losing any requests, the Scheduler service must guarantee that all messages are processed at least once.
Event Hubs can guarantee at-least-once delivery if the client checkpoints correctly.
If the Scheduler service crashes, it may be in the middle of processing one or more client requests. Those
messages will be picked up by another instance of the Scheduler and reprocessed. What happens if a request is
processed twice? It's important to avoid duplicating any work. After all, we don't want the system to send two
drones for the same package.
One approach is to design all operations to be idempotent. An operation is idempotent if it can be called multiple
times without producing additional side-effects after the first call. In other words, a client can invoke the operation
once, twice, or many times, and the result will be the same. Essentially, the service should ignore duplicate calls.
For a method with side effects to be idempotent, the service must be able to detect duplicate calls. For example,
you can have the caller assign the ID, rather than having the service generate a new ID. The service can then check
for duplicate IDs.
NOTE
The HTTP specification states that GET, PUT, and DELETE methods must be idempotent. POST methods are not guaranteed
to be idempotent. If a POST method creates a new resource, there is generally no guarantee that this operation is
idempotent.
It's not always straightforward to write idempotent method. Another option is for the Scheduler to track the
progress of every transaction in a durable store. Whenever it processes a message, it would look up the state in
the durable store. After each step, it would write the result to the store. There may be performance implications to
this approach.
A request method is considered "idempotent" if the intended effect on the server of multiple identical requests
with that method is the same as the effect for a single such request. (RFC 7231)
It's important to understand the difference between PUT and POST semantics when creating a new entity. In both
cases, the client sends a representation of an entity in the request body. But the meaning of the URI is different.
For a POST method, the URI represents a parent resource of the new entity, such as a collection. For
example, to create a new delivery, the URI might be /api/deliveries . The server creates the entity and assigns
it a new URI, such as /api/deliveries/39660 . This URI is returned in the Location header of the response. Each
time the client sends a request, the server will create a new entity with a new URI.
For a PUT method, the URI identifies the entity. If there already exists an entity with that URI, the server
replaces the existing entity with the version in the request. If no entity exists with that URI, the server
creates one. For example, suppose the client sends a PUT request to api/deliveries/39660 . Assuming there is no
delivery with that URI, the server creates a new one. Now if the client sends the same request again, the
server will replace the existing entity.
Here is the Delivery service's implementation of the PUT method.
[HttpPut("{id}")]
[ProducesResponseType(typeof(Delivery), 201)]
[ProducesResponseType(typeof(void), 204)]
public async Task<IActionResult> Put([FromBody]Delivery delivery, string id)
{
logger.LogInformation("In Put action with delivery {Id}: {@DeliveryInfo}", id, delivery.ToLogInfo());
try
{
var internalDelivery = delivery.ToInternal();
It's expected that most requests will create a new entity, so the method optimistically calls CreateAsync on the
repository object, and then handles any duplicate-resource exceptions by updating the resource instead.
API gateways
Designing microservices: API gateways
12/9/2017 • 7 min to read • Edit Online
In a microservices architecture, a client might interact with more than one front-end service. Given this fact, how
does a client know what endpoints to call? When happens when new services are introduced, or existing services
are refactored? How do services handle SSL termination, authentication, and other concerns? An API gateway can
help to address these challenges.
Deployment considerations
Deploying Nginx or HAProxy to Kubernetes
You can deploy Nginx or HAProxy to Kubernetes as a ReplicaSet or DaemonSet that specifies the Nginx or
HAProxy container image. Use a ConfigMap to store the configuration file for the proxy, and mount the ConfigMap
as a volume. Create a service of type LoadBalancer to expose the gateway through an Azure Load Balancer.
An alternative is to create an Ingress Controller. An Ingress Controller is a Kubernetes resource that deploys a load
balancer or reverse proxy server. Several implementations exist, including Nginx and HAProxy. A separate
resource called an Ingress defines settings for the Ingress Controller, such as routing rules and TLS certificates.
That way, you don't need to manage complex configuration files that are specific to a particular proxy server
technology. Ingress Controllers are still a beta feature of Kubernetes at the time of this writing, and the feature will
continue to evolve.
The gateway is a potential bottleneck or single point of failure in the system, so always deploy at least two replicas
for high availability. You may need to scale out the replicas further, depending on the load.
Also consider running the gateway on a dedicated set of nodes in the cluster. Benefits to this approach include:
Isolation. All inbound traffic goes to a fixed set of nodes, which can be isolated from backend services.
Stable configuration. If the gateway is misconfigured, the entire application may become unavailable.
Performance. You may want to use a specific VM configuration for the gateway for performance reasons.
Azure Application Gateway
To connect Application Gateway to a Kubernetes cluster in Azure:
1. Create an empty subnet in the cluster VNet.
2. Deploy Application Gateway.
3. Create a Kubernetes service with type=NodePort. This exposes the service on each node so that it can be
reached from outside the cluster. It does not create a load balancer.
4. Get the assigned port number for the service.
5. Add an Application Gateway rule where:
The backend pool contains the agent VMs.
The HTTP setting specifies the service port number.
The gateway listener listens on ports 80/443
Set the instance count to 2 or more for high availability.
Azure API Management
To connect API Management to a Kubernetes cluster in Azure:
1. Create an empty subnet in the cluster VNet.
2. Deploy API Management to that subnet.
3. Create a Kubernetes service of type LoadBalancer. Use the internal load balancer annotation to create an
internal load balancer, instead of an Internet-facing load balancer, which is the default.
4. Find the private IP of the internal load balancer, using kubectl or the Azure CLI.
5. Use API Management to create an API that directs to the private IP address of the load balancer.
Consider combining API Management with a reverse proxy, whether Nginx, HAProxy, or Azure Application
Gateway. For information about using API Management with Application Gateway, see Integrate API Management
in an internal VNET with Application Gateway.
Logging and monitoring
Designing microservices: Logging and monitoring
12/12/2017 • 13 min to read • Edit Online
In any complex application, at some point something will go wrong. In a microservices application, you need to
track what's happening across dozens or even hundreds of services. Logging and monitoring are critically
important to give you a holistic view of the system.
In a microservices architecture, it can be especially challenging to pinpoint the exact cause of errors or
performance bottlenecks. A single user operation might span multiple services. Services may hit network I/O limits
inside the cluster. A chain of calls across services may cause backpressure in the system, resulting in high latency
or cascading failures. Moreover, you generally don't know which node a particular container will run in. Containers
placed on the same node may be competing for limited CPU or memory.
To make sense of what's happening, the application must emit telemetry events. You can categorize these into
metrics and text-based logs.
Metrics are numerical values that can be analyzed. You can use them to observe the system in real time (or close to
real time), or to analyze performance trends over time. Metrics include:
Node-level system metrics, including CPU, memory, network, disk, and file system usage. System metrics
help you to understand resource allocation for each node in the cluster, and troubleshoot outliers.
Kubernetes metrics. Because services run in containers, you need to collect metrics at the container level,
not just at the VM level. In Kubernetes, cAdvisor (Container Advisor) is the agent that collects statistics about
the CPU, memory, file system, and network resources used by each container. The kubelet daemon collects
resource statistics from cAdvisor and exposes them through a REST API.
Application metrics. This includes any metrics that are relevant to understanding the behavior of a service.
Examples include the number of queued inbound HTTP requests, request latency, message queue length, or
number of transactions processed per second.
Dependent service metrics. Services inside the cluster may call external services that are outside the cluster,
such as managed PaaS services. You can monitor Azure services by using Azure Monitor. Third-party
services may or may not provide any metrics. If not, you'll have to rely on your own application metrics to
track statistics for latency and error rate.
Logs are records of events that occur while the application is running. They include things like application logs
(trace statements) or web server logs. Logs are primarily useful for forensics and root cause analysis.
Considerations
The article Monitoring and diagnostics describes general best practices for monitoring an application. Here are
some particular things to think about in the context of a microservices architecture.
Configuration and management. Will you use a managed service for logging and monitoring, or deploy
logging and monitoring components as containers inside the cluster? For more discussion of these options, see
the section Technology Options below.
Ingestion rate. What is the throughput at which the system can ingest telemetry events? What happens if that
rate is exceeded? For example, the system may throttle clients, in which case telemetry data is lost, or it may
downsample the data. Sometimes you can mitigate this problem by reducing the amount of data that you collect:
Aggregate metrics by calculating statistics, such as average and standard deviation, and send that statistical
data to the monitoring system.
Downsample the data — that is, process only a percentage of the events.
Batch the data to reduce the number of network calls to the monitoring service.
Cost. The cost of ingesting and storing telemetry data may be high, especially at high volumes. In some cases it
could even exceed the cost of running the application. In that case, you may need to reduce the volume of
telemetry by aggregating, downsampling, or batching the data, as described above.
Data fidelity. How accurate are the metrics? Averages can hide outliers, especially at scale. Also, if the sampling
rate is too low, it can smooth out fluctuations in the data. It may appear that all requests have about the same end-
to-end latency, when in fact a significant fraction of requests are taking much longer.
Latency. To enable real-time monitoring and alerts, telemetry data should be available quickly. How "real-time" is
the data that appears on the monitoring dashboard? A few seconds old? More than a minute?
Storage. For logs, it may be most efficient to write the log events to ephemeral storage in the cluster, and
configure an agent to ship the log files to more persistent storage. Data should eventually be moved to long-term
storage so that it's available for retrospective analysis. A microservices architecture can generate a large volume of
telemetry data, so the cost of storing that data is an important consideration. Also consider how you will query the
data.
Dashboard and visualization. Do you get a holistic view of the system, across all of the services, both within the
cluster and external services? If you are writing telemetry data and logs to more than one location, can the
dashboard show all of them and correlate? The monitoring dashboard should show at least the following
information:
Overall resource allocation for capacity and growth. This includes the number of containers, file system metrics,
network, and core allocation.
Container metrics correlated at the service level.
System metrics correlated with containers.
Service errors and outliers.
Distributed tracing
As mentioned, one challenge in microservices is understanding the flow of events across services. A single
operation or transaction may involve calls to multiple services. To reconstruct the entire sequence of steps, each
service should propagate a correlation ID that acts as a unique identifier for that operation. The correlation ID
enables distributed tracing across services.
The first service that receives a client request should generate the correlation ID. If the service makes an HTTP call
to another service, it puts the correlation ID in a request header. Similarly, if the service sends an asynchronous
message, it puts the correlation ID into the message. Downstream services continue to propagate the correlation
ID, so that it flows through the entire system. In addition, all code that writes application metrics or log events
should include the correlation ID.
When service calls are correlated, you can calculate operational metrics such as the end-to-end latency for a
complete transaction, the number of successful transactions per second, and the percentage of failed transactions.
Including correlation IDs in application logs makes it possible to perform root cause analysis. If an operation fails,
you can find the log statements for all of the service calls that were part of the same operation.
Here are some considerations when implementing distributed tracing:
There is currently no standard HTTP header for correlation IDs. Your team should standardize on a custom
header value. The choice may be decided by your logging/monitoring framework or choice of service mesh.
For asynchronous messages, if your messaging infrastructure supports adding metadata to messages, you
should include the correlation ID as metadata. Otherwise, include it as part of the message schema.
Rather than a single opaque identifier, you might send a correlation context that includes richer
information, such as caller-callee relationships.
The Azure Application Insights SDK automatically injects correlation context into HTTP headers, and includes
the correlation ID in Application Insights logs. If you decide to use the correlation features built into
Application Insights, some services may still need to explicitly propagate the correlation headers, depending
on the libraries being used. For more information, see Telemetry correlation in Application Insights.
If you are using Istio or linkerd as a service mesh, these technologies automatically generate correlation
headers when HTTP calls are routed through the service mesh proxies. Services should forward the relevant
headers.
Istio: Distributed Request Tracing
linkerd: Context Headers
Consider how you will aggregate logs. You may want to standardize across teams on how to include
correlation IDs in logs. Use a structured or semi-structured format, such as JSON, and define a common
field to hold the correlation ID.
Technology options
Application Insights is a managed service in Azure that ingests and stores telemetry data, and provides tools for
analyzing and searching the data. To use Application Insights, you install an instrumentation package in your
application. This package monitors the app and sends telemetry data to the Application Insights service. It can also
pull telemetry data from the host environment. Application Insights provides built-in correlation and dependency
tracking. It lets you track system metrics, application metrics, and Azure service metrics, all in one place.
Be aware that Application Insights throttles if the data rate exceeds a maximum limit; for details, see Application
Insights limits. A single operation may generate several telemetry events, so if the application experiences a high
volume of traffic, it is likely to get throttled. To mitigate this problem, you can perform sampling to reduce the
telemetry traffic. The tradeoff is that your metrics will be less precise. For more information, see Sampling in
Application Insights. You can also reduce the data volume by pre-aggregating metrics — that is, calculating
statistical values such as average and standard deviation, and sending those values instead of the raw telemetry.
The following blog post describes an approach to using Application Insights at scale: Azure Monitoring and
Analytics at Scale.
In addition, make sure that you understand the pricing model for Application Insights, because you are charged
based on data volume. For more information, see Manage pricing and data volume in Application Insights. If your
application generates a large volume of telemetry, and you don't wish to perform sampling or aggregation of the
data, then Application Insights may not be the appropriate choice.
If Application Insights doesn't meet your requirements, here are some suggested approaches that use popular
open-source technologies.
For system and container metrics, consider exporting metrics to a time-series database such as Prometheus or
InfluxDB running in the cluster.
InfluxDB is a push-based system. An agent needs to push the metrics. You can use Heapster, which is a
service that collects cluster-wide metrics from kubelet, aggregates the data, and pushes it to InfluxDB or
other time-series storage solution. Azure Container Service deploys Heapster as part of the cluster setup.
Another option is Telegraf, which is an agent for collecting and reporting metrics.
Prometheus is a pull-based system. It periodically scrapes metrics from configured locations. Prometheus
can scrape metrics generated by cAdvisor or kube-state-metrics. kube-state-metrics is a service that collects
metrics from the Kubernetes API server and makes them available to Prometheus (or a scraper that is
compatible with a Prometheus client endpoint). Whereas Heapster aggregates metrics that Kubernetes
generates and forwards them to a sink, kube-state-metrics generates its own metrics and makes them
available through an endpoint for scraping. For system metrics, use Node exporter, which is a Prometheus
exporter for system metrics. Prometheus supports floating point data, but not string data, so it is
appropriate for system metrics but not logs.
Use a dashboard tool such as Kibana or Grafana to visualize and monitor the data. The dashboard service
can also run inside a container in the cluster.
For application logs, consider using Fluentd and Elasticsearch. Fluentd is an open source data collector, and
Elasticsearch is a document database that is optimized to act as a search engine. Using this approach, each service
sends logs to stdout and stderr , and Kubernetes writes these streams to the local file system. Fluentd collects the
logs, optionally enriches them with additional metadata from Kubernetes, and sends the logs to Elasticsearch. Use
Kibana, Grafana, or a similar tool to create a dashboard for Elasticsearch. Fluentd runs as a daemonset in the
cluster, which ensures that one Fluentd pod is assigned to each node. You can configure Fluentd to collect kubelet
logs as well as container logs. At high volumes, writing logs to the local file system could become a performance
bottleneck, especially when multiple services are running on the same node. Monitor disk latency and file system
utilization in production.
One advantage of using Fluentd with Elasticsearch for logs is that services do not require any additional library
dependencies. Each service just writes to stdout and stderr , and Fluentd handles exporting the logs into
Elasticsearch. Also, the teams writing services don't need to understand how to configure the logging
infrastructure. One challenge is to configure the Elasticsearch cluster for a production deployment, so that it scales
to handle your traffic.
Another option is to send logs to Operations Management Suite (OMS) Log Analytics. The Log Analytics service
collects log data into a central repository, and can also consolidate data from other Azure services that your
application uses. For more information, see Monitor an Azure Container Service cluster with Microsoft Operations
Management Suite (OMS).
Here is an ILogger implementation that wraps the Winston library. It takes the correlation ID as a constructor
parameter, and injects the ID into every log message.
The Package service needs to extract the correlation ID from the HTTP request. For example, if you're using linkerd,
the correlation ID is found in the l5d-ctx-trace header. In Koa, the HTTP request is stored in a Context object that gets
passed through the request processing pipeline. We can define a middleware function to get the correlation ID
from the Context and initialize the logger. (A middleware function in Koa is simply a function that gets executed for
each request.)
This middleware invokes a caller-defined function, getCorrelationId , to get the correlation ID. Then it creates an
instance of the logger and stashes it inside ctx.state , which is a key-value dictionary used in Koa to pass
information through the pipeline.
The logger middleware is added to the pipeline on startup:
app.use(logger(Settings.logLevel(), function (ctx) {
return ctx.headers[Settings.correlationHeader()];
}));
Once everything is configured, it's easy to add logging statements to the code. For example, here is the method
that looks up a package. It makes two calls to the ILogger.info method.
await next();
if (pkg == null) {
logger.info(`getById: %s not found`, packageId);
ctx.response.status= 404;
return;
}
ctx.response.status = 200;
ctx.response.body = this.mapPackageDbToApi(pkg);
}
We don't need to include the correlation ID in the logging statements, because that's done automatically by the
middleware function. This makes the logging code cleaner, and reduces the chance that a developer will forget to
include the correlation ID. And because all of the logging statements use the abstract ILogger interface, it would be
easy to replace the logger implementation later.
Continuous integration and delivery
Designing microservices: Continuous integration
12/12/2017 • 11 min to read • Edit Online
Continuous integration and continuous delivery (CI/CD) are a key requirement for achieving success with
microservices. Without a good CI/CD process, you will not achieve the agility that microservices promise. Some of
the CI/CD challenges for microservices arise from having multiple code bases and heterogenous build
environments for the various services. This chapter describes the challenges and recommends some approaches to
the problem.
Faster release cycles are one of the biggest reasons to adopt a microservices architecture.
In a purely monolithic application, there is a single build pipeline whose output is the application executable. All
development work feeds into this pipeline. If a high-priority bug is found, a fix must be integrated, tested, and
published, which can delay the release of new features. It's true that you can mitigate these problems by having
well-factored modules and using feature branches to minimize the impact of code changes. But as the application
grows more complex, and more features are added, the release process for a monolith tends to become more
brittle and likely to break.
Following the microservices philosophy, there should never be a long release train where every team has to get in
line. The team that builds service "A" can release an update at any time, without waiting for changes in service "B"
to be merged, tested, and deployed. The CI/CD process is critical to making this possible. Your release pipeline
must be automated and highly reliable, so that the risks of deploying updates are minimized. If you are releasing to
production daily or multiple times a day, regressions or service disruptions must be very rare. At the same time, if
a bad update does get deployed, you must have a reliable way to quickly roll back or roll forward to a previous
version of a service.
When we talk about CI/CD, we are really talking about several related processes: Continuous integration,
continuous delivery, and continuous deployment.
Continuous integration means that code changes are frequently merged into the main branch, using
automated build and test processes to ensure that code in the main branch is always production-quality.
Continuous delivery means that code changes that pass the CI process are automatically published to a
production-like environment. Deployment into the live production environment may require manual
approval, but is otherwise automated. The goal is that your code should always be ready to deploy into
production.
Continuous deployment means that code changes that pass the CI/CD process are automatically deployed
into production.
In the context of Kubernetes and microservices, the CI stage is concerned with building and testing container
images, and pushing those images to a container registry. In the deployment stage, pod specs are updated to pick
up the latest production image.
Challenges
Many small independent code bases. Each team is responsible for building its own service, with its own
build pipeline. In some organizations, teams may use separate code repositories. This could lead to a
situation where the knowledge of how to build the system is spread across teams, and nobody in the
organization knows how to deploy the entire application. For example, what happens in a disaster recovery
scenario, if you need to quickly deploy to a new cluster?
Multiple languages and frameworks. With each team using its own mix of technologies, it can be difficult
to create a single build process that works across the organization. The build process must be flexible
enough that every team can adapt it for their choice of language or framework.
Integration and load testing. With teams releasing updates at their own pace, it can be challenging to
design robust end-to-end testing, especially when services have dependencies on other services. Moreover,
running a full production cluster can be expensive, so it's unlikely that every team will be able to run its own
full cluster at production scales, just for testing.
Release management. Every team should have the ability to deploy an update to production. That doesn't
mean that every team member has permissions to do so. But having a centralized Release Manager role can
reduce the velocity of deployments. The more that your CI/CD process is automated and reliable, the less
there should be a need for a central authority. That said, you might have different policies for releasing
major feature updates versus minor bug fixes. Being decentralized does not mean there should be zero
governance.
Container image versioning. During the development and test cycle, the CI/CD process will build many
container images. Only some of those are candidates for release, and then only some of those release
candidates will get pushed into production. You should have a clear versioning strategy, so that you know
which images are currently deployed to production, and can roll back to a previous version if necessary.
Service updates. When you update a service to a new version, it shouldn't break other services that
depend on it. If you do a rolling update, there will be a period of time when a mix of versions is running.
These challenges reflect a fundamental tension. On the one hand, teams need to work as independently as
possible. On the other hand, some coordination is needed so that a single person can do tasks like running an
integration test, redeploying the entire solution to a new cluster, or rolling back a bad update.
Updating services
There are various strategies for updating a service that's already in production. Here we discuss three common
options: Rolling update, blue-green deployment, and canary release.
Rolling update
In a rolling update, you deploy new instances of a service, and the new instances start receiving requests right
away. As the new instances come up, the previous instances are removed.
Rolling updates are the default behavior in Kubernetes when you update the pod spec for a Deployment. The
Deployment controller creates a new ReplicaSet for the updated pods. Then it scales up the new ReplicaSet while
scaling down the old one, to maintain the desired replica count. It doesn't delete old pods until the new ones are
ready. Kubernetes keeps a history of the update, so you can use kubectl to roll back an update if needed.
If your service performs a long startup task, you can define a readiness probe. The readiness probe reports when
the container is ready to start receiving traffic. Kubernetes won't send traffic to the pod until the probe reports
success.
One challenge of rolling updates is that during the update process, a mix of old and new versions are running and
receiving traffic. During this period, any request could get routed to either of the two versions. That may or may
not cause problems, depending on the scope of the changes between the two versions.
Blue -green deployment
In a blue-green deployment, you deploy the new version alongside the previous version. After you validate the
new version, you switch all traffic at once from the previous version to the new version. After the switch, you
monitor the application for any problems. If something goes wrong, you can swap back to the old version.
Assuming there are no problems, you can delete the old version.
With a more traditional monolithic or N-tier application, blue-green deployment generally meant provisioning two
identical environments. You would deploy the new version to a staging environment, then redirect client traffic to
the staging environment — for example, by swapping VIP addresses.
In Kubernetes, you don't need to provision a separate cluster to do blue-green deployments. Instead, you can take
advantage of selectors. Create a new Deployment resource with a new pod spec and a different set of labels. Create
this deployment, without deleting the previous deployment or modifying the service that points to it. Once the new
pods are running, you can update the service's selector to match the new deployment.
An advantage of blue-green deployments is that the service switches all the pods at the same time. After the
service is updated, all new requests get routed to the new version. One drawback is that during the update, you are
running twice as many pods for the service (current and next). If the pods require a lot of CPU or memory
resources, you may need to scale out the cluster temporarily to handle the resource consumption.
Canary release
In a canary release, you roll out an updated version to a small number of clients. Then you monitor the behavior of
the new service before rolling it out to all clients. This lets you do a slow rollout in a controlled fashion, observe
real data, and spot problems before all customers are affected.
A canary release is more complex to manage than either blue-green or rolling update, because you must
dynamically route requests to different versions of the service. In Kubernetes, you can configure a Service to span
two replica sets (one for each version) and adjust the replica counts manually. However, this approach is rather
coarse-grained, because of the way Kubernetes load balances across pods. For example, if you have a total of ten
replicas, you can only shift traffic in 10% increments. If you are using a service mesh, you can use the service mesh
routing rules to implement a more sophisticated canary release strategy. Here are some resources that may be
helpful:
Kubernetes without service mesh: Canary deployments
Linkerd: Dynamic request routing
Istio: Canary Deployments using Istio
Conclusion
In recent years, there has been a sea change in the industry, a movement from building systems of record to
building systems of engagement.
Systems of record are traditional back-office data management applications. At the heart of these systems there
often sits an RDBMS that is the single source of truth. The term "system of engagement" is credited to Geoffrey
Moore, in his 2011 paper Systems of Engagement and the Future of Enterprise IT. Systems of engagement are
applications focused on communication and collaboration. They connect people in real time. They must be
available 24/7. New features are introduced regularly without taking the application offline. Users expect more and
are less patient of unexpected delays or downtime.
In the consumer space, a better user experience can have measurable business value. The amount of time that a
user engages with an application may translate directly into revenue. And in the realm of business systems, users'
expectations have changed. If these systems aim to foster communication and collaboration, they must take their
cue from consumer-facing applications.
Microservices are a response to this changing landscape. By decomposing a monolithic application into a group of
loosely coupled services, we can control the release cycle of each service, and enable frequent updates without
downtime or breaking changes. Microservices also help with scalability, failure isolation, and resiliency. Meanwhile,
cloud platforms are making it easier to build and run microservices, with automated provisioning of compute
resources, container orchestrators as a service, and event-driven serverless environments.
But as we've seen, microservices architectures also being a lot of challenges. To succeed, you must start from a
solid design. You must put careful thought into analyzing the domain, choosing technologies, modeling data,
designing APIs, and building a mature DevOps culture. We hope that this guide, and the accompanying reference
implementation, has helped to illuminate the journey.