0% found this document useful (0 votes)

9 views

Mu Cache

The paper presents MuCache, a framework designed to enhance the performance of microservice applications by implementing inter-service caching. By utilizing a novel non-blocking cache coherence protocol, MuCache reduces request latency by up to 2.5 times and increases throughput by up to 60%, while ensuring that the application's behavior remains consistent with its original implementation. The framework is open-source and aims to simplify caching for developers, particularly for those in smaller companies who lack the resources to manage complex caching systems.

Uploaded by

xuyihua2017

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Mu Cache

Uploaded by

xuyihua2017

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

MuCache: a General Framework for Caching

in Microservice Graphs
Haoran Zhang, Konstantinos Kallas, Spyros Pavlatos, Rajeev Alur, Sebastian Angel,
and Vincent Liu, University of Pennsylvania
https://www.usenix.org/conference/nsdi24/presentation/zhang-haoran

This paper is included in the

Proceedings of the 21st USENIX Symposium on
Networked Systems Design and Implementation.
April 16–18, 2024 • Santa Clara, CA, USA
978-1-939133-39-7

Open access to the Proceedings of the

21st USENIX Symposium on Networked
Systems Design and Implementation
is sponsored by
MuCache: A General Framework for Caching in Microservice Graphs

Haoran Zhang*, Konstantinos Kallas*, Spyros Pavlatos, Rajeev Alur, Sebastian Angel, Vincent Liu
University of Pennsylvania

Abstract such calls whenever possible. One way to do so is by having

each microservice reuse the responses of the services it con-
This paper introduces MuCache, a framework for extend-
tacts if it knows that a request will produce the same answer.
ing arbitrary microservice applications with inter-service
Caching the responses in this fashion improves both the la-
caches. MuCache significantly improves the performance
tency of the target request (if the branch would have been on
of microservice graphs (commonly found in large applica-
the request’s critical path) and the system’s throughput as a
tions like Uber or Twitter) by eliminating the need for one
whole (by freeing resources for other requests).
microservice to call another when the relevant state has not
Response caching is, of course, a common technique in
changed. MuCache is enabled by a novel non-blocking cache
system design that many services are already employing to
coherence and invalidation protocol for graph topologies that
great effect. For example, a recent Alibaba analysis of its
minimizes critical-path overhead. For this protocol, we prove
storefront microservice architecture found that around 40%
a strong correctness result: any execution observed by the
of its call graphs have a depth of 3 because they hit a cache;
cache-enabled microservice application could have been ob-
while requests that do not hit caches reach call graph sizes of
served by the original application without caches. Our eval-
more than 40 [28]. Similarly, half of Twitter’s cache clusters
uation on well-known microservice benchmarks shows that
are used to cache intermediate computation results [38], and
MuCache reduces the median request latency by up to 2.5×,
a study of the cache clusters at Facebook [37] reported that
and increases throughput by up to 60%.
the cache-hit ratio for a specific cluster was more than 80%.
Unfortunately, adding caching to a microservice graph is
1 Introduction
difficult, and is something that only well-resourced companies
Many of today’s largest web services, such as Uber, Lyft, can afford to do correctly. Indeed, no existing service mesh
Twitter (now X), and Meta structure their applications as mi- provides support for inter-service caching, leaving mid-size
croservices to enjoy the benefits of developer agility, resource and small companies to deal with caching on their own.
provisioning, maintainability, fault tolerance, and other im- The complexity of caching in microservices stems from the
portant metrics. Fortunately, these benefits can be reaped by fact that—unlike traditional cache coherence protocols where
smaller companies and individual developers too; recent run- one can reason about individual read and write operations to
times and service mesh frameworks like Dapr [3], Envoy [4], a target object—microservice responses are a function of the
and Istio [6] have been created to help design, deploy, and input request, the service’s state, and an unbounded number
manage microservices. In a microservice architecture, ap- of downstream services that are recursively called during the
plications are decomposed into a call graph of services that processing of a request. The resulting web of dependencies
interact with each other and with end-users through API calls. means that developers must carefully study the interactions
The ‘root’ of the microservice call graph is typically a client- between all services in the system.
facing frontend service, while the ‘leaves’ are databases that Consider, for example, a user’s home timeline in a social
store service state. This call graph is dynamic in the sense network (e.g., [14, 36]). A cached timeline response can be
that it may be different for each request. invalidated by changes to the user’s followees, the content of
The call graphs of today’s services are complex. Services contained posts, the security policies of users included in the
like Airbnb and Uber consist of thousands of unique microser- timeline, and tweaks to the ranking algorithm; these changes
vices that support their functionality [5, 40]. Each user request can stem from requests that never touch the home timeline
will flow through a substantial subset of these microservices. mixer, modifications to objects from services several hops
A typical Twitter request, for instance, can traverse a call downstream, and revisions to the control flow of dependent
graph that is 6 levels deep with significant fan-out at each services. Note that, depending on the nature of the change,
level [36], and requests to popular Facebook pages often in- subtrees of the call graph may be cacheable even if the fi-
volve hundreds of servers [30]. Given the depth and breadth nal response is not, and effective caching approaches should
of modern microservice call graphs and that each edge corre- consider that distinction.
sponds to a network request, it is important to avoid making Given the benefits of caching and the absence of automated
* Equal
tools that developers can use to help them with this task, what
contribution.

USENIX Association 21st USENIX Symposium on Networked Systems Design and Implementation 221
/endpoint Node 1 /endpoint1 (RO) Node 2
can developers do today? Broadly speaking, developers today /endpoint2
add caching by either (a) creating manual or application- /…
Clients Service1 W Service2 W …
specific coherence protocols, which are error-prone and fail
to generalize; (b) focusing on the backend-storage layer [24, C C
W W
30], which ignores the significant advantages of terminating
request call graphs early; or (c) giving up on consistency and D
CM
D
CM

implementing simple TTL-based eviction mechanisms [1, 27,

33] that can produce stale responses. F IGURE 1—MuCache’s Architecture. (C) denote caches, (CM)
In this paper, we propose MuCache, a framework that ex- cache managers, (W) wrappers, and (D) the datastores. Wrappers
tends existing service meshes like Dapr to automatically pro- are interceptor functions in the sidecar of each service. Solid arrows
vide microservice applications with inter-service caches that denote baseline communication while dashed arrows and blue com-
improve performance. Users declare the read-only methods ponents denote additions by our system. RO means read-only.
of each service’s API, and then MuCache caches the results
also allows teams to design, develop, deploy, scale, and op-
of calls to these methods to avoid re-executing them if the
timize each microservice independently. Furthermore, each
data has not changed. To keep caches coherent, MuCache im-
microservice can manage its own datastore, ensuring data
plements a novel lazy-invalidation cache coherence protocol
sovereignty, which is important for fault tolerance and when
for dynamic graphs of services. This new protocol has two
regulations place strict access control requirements on data.
important features. First, all cache accesses are non-blocking,
so requests never need to wait for other requests to finish. Microservice graphs. Today’s microservice applications col-
This greatly reduces latency during cache hits and ensures late information from multiple backend sources and distill it
that in the event of a cache miss, the overhead of having had to into a single user interface with the help of intermediate pro-
access the cache is both constant and small. Second, the proto- cessing functions. Such a design naturally leads to a directed
col is provably correct and provides a very strong consistency graph of microservices that collectively implement the appli-
guarantee: all executions of a cache-enabled application are cation’s behavior. In this graph, the vertices are microservices,
equivalent to an execution of the original application without and the edges represent calls between them.
caches. Note that optimal tuning of these caches (e.g., select- A drawback of this approach is that it incurs higher commu-
ing the optimal cache size or eviction policy) is out of the nication costs and response latency compared to monolithic
scope of this paper; instead, MuCache allows the developer to systems. Whereas in a monolithic system, all invocations
use other tools to tune each cache separately without having would typically be local function calls that leverage a ma-
to worry about coherence. chine’s fast memory and native data structures, in a microser-
Our experimental evaluation shows that MuCache achieves vice graph, the caller needs to create the request (e.g., an
a median request latency reduction of up to 2.5× and a 95th- HTTP request with a serialized JSON payload) and send it
percentile tail latency reduction of up to 1.8× for well-known over the network to the callee (which may include compres-
microservice benchmarks and applications. Additionally, Mu- sion and encryption), who must then deserialize it and execute
Cache increases throughput by up to 60% and allows appli- it—potentially having to call further microservices. The re-
cations with MuCache to scale as well as the original im- sponse will incur similar overheads. For long chains of calls,
plementation. We also perform worst-case tests, artificially the additional latency can be in the order of 100ms.
reducing the cache hit rate to 0%, and the results indicate that
MuCache’s overheads are minimal. 3 Goals and Overview
MuCache is open-source at https://github.com/
eniac/mucache. Given the prevalence of microservice architectures and the
high costs associated with inter-service communication, we
wish to design a general caching layer that avoids having one
2 Background microservice call another whenever such a call is unnecessary,
In this section, we briefly review microservice architectures e.g. when the request is similar to a prior request and the state
and the graphs that are formed by their interconnections. of the callee has not changed. While there is a vast design
Microservices. Applications today often comprise various space that one could consider for achieving this high-level
services, each of which handles an incoming request, per- goal, we are grounded by a set of pragmatic requirements:
forms some task, and returns a response. This microservice • Correctness: The cache should not introduce behaviors
paradigm has many benefits over prior (monolithic) architec- that are not part of the original application.
tures in which all functionality exists within a single compo- • Non-blocking and low overhead: The cache should not
nent. For example, microservices are modular, so they can require blocking on the critical path, and any overhead
be implemented in any language and with any features so should be minimal.
long as they expose an appropriate API (e.g., REST). This • Dynamic graphs: It should support microservices where

222 21st USENIX Symposium on Networked Systems Design and Implementation USENIX Association
F IGURE 2—(Left) Moview Review application fragment. (Right) An example execution of this application. Each line corresponds to a different
component, and arrows denote communication. (C) components are caches and (CM) cache managers.

the call-graph varies per-request and is not known a priori. Page

(1) Write(id, v)
(2) (3) (2) Cache.get(get_revs(id))
• Sharding: It should support microservices that are de- (8)
-> revs
Page.C
ployed with multiple shards to enable scaleout. (7)
(3) Cache.get(get_revs(id))
Page.CM -> revs
• Application and datastore agnostic: It should not require … (4) Inv(id)
(6)
any modifications to application logic or backend datas- MR.CM (5) Inv(call)
(1) (5) (6) Inv(get_revs(id))
tores for easier adoption. Rev.Stor. (7) Cache.delete(get_revs(id))
(4) (8) Cache.get(get_revs(id))
• Incremental deployment: It should provide benefits even RS.CM -> None
when only deployed on a subset of the microservice graph,
e.g., if subgraphs are managed by different organizations. F IGURE 3—An example execution of the movie review application
that includes an invalidation.
3.1 Overview of MuCache infer this by treating ‘GET’ endpoints as RO. The cache will
To meet the above requirements, we design and implement then store the responses of successful requests to the RO end-
MuCache, a new caching framework for microservice graphs; points of other services. For example, the cache of Service1
we depict its architecture in Figure 1. In this figure, MuCache would store the return values of requests to /endpoint1 of
extends an application that consists of two microservices, Service2. The cache manager of Service2 would then track all
Service1 and Service2, each of which has its own datastore of the keys that were read during each RO call, and whenever
and exposes a set of available methods that clients or other one of these keys is modified by a write, it sends messages
services can call. We refer to these methods as endpoints, to all of its caller’s cache managers (in this case, the CM of
borrowing from REST terminology. Throughout the paper, Service1) to invalidate the relevant cache entries.
when a Service1 calls a Service2, we refer to Service1 as the As a concrete example, consider Figure 2 (left), which
upstream and Service2 as the downstream. shows a fragment of a movie review application (Cf.
MuCache extends each service with wrappers (W), a cache IMDB) [24]; clients request the page of a specific movie
manager (CM), and a cache (C). The wrappers are a shim from the Page service, which in turn calls the MovieReview
layer that intercepts all communication among services and and Plot service to compute its results. The right side of Fig-
their datastores. MuCache’s wrappers are implemented on top ure 2 shows one example execution. The first time Page tries
of Dapr [3], a distributed application runtime that orchestrates to get the plot for movie id, it does not find it in the cache
service invocations and datastore accesses through its API. (1) and then invokes Plot. After the call returns, the cache
Dapr supports many datastores through the same API, so manager of the Plot service informs the cache manager of
our wrappers inherit this compatibility without additional Page to save the return value of this call (5). For a subsequent
effort. The cache manager saves and deletes entries from call with the same arguments (7), the return value is found in
the cache to maintain coherence by tracking all inter-service the cache, and the Plot service is never contacted. Figure 3
communication and datastore accesses through the wrappers. shows an execution where some other user adds a new review
It is deployed as a separate executable on the same node with to the movie id (1). While the write happens, different users
the service. Microservices are often sharded across multiple successfully access the page of that movie from the cache (2),
instances to support larger workloads; in such cases, each (3). The invalidation is propagated in the background between
shard has its own cache manager and cache. MuCache does cache managers, until it invalidates all affected saved cache
not impose any configuration restrictions on the cache which entries, including the one in the cache of Page.
can be configured to have any eviction policy, cache size, We note that most of the processing to update and inval-
etc. MuCache acquires knowledge of the graph topology in a idate caches in MuCache is done off the critical path. With
decentralized way: each cache manager only knows about its reference to Figure 2, the only operations in the critical path
immediate predecessor cache managers. are steps (1), (2b), and (4b), all of which are local accesses.
In particular, (1) is an access to a local cache, and (2b) and
Workflow. To deploy MuCache, developers must first declare
(4b) involve communication with a co-located cache manager.
the read-only (RO) endpoints that do not mutate the datastore.
Furthermore, MuCache supports sharding without any com-
If developers use REST APIs, MuCache can automatically

USENIX Association 21st USENIX Symposium on Networked Systems Design and Implementation 223
(1) (2) 4 MuCache Protocol
S2 (1) call(“write”, k, v)
(2) call(“write”, k, v)
S1 S4 Figure 5 shows the complete MuCache protocol for the wrap-
(3) call(“read”, k)
(4) call(“read”, k)
pers of a single service shard and its cache manager in Python-
(3) S3 (4) like pseudocode. The wrapper of each service communicates
with its associated cache manager through an ordered mes-
F IGURE 4—An application exhibiting a “diamond” pattern. sage queue (using SendToCM). Downstream cache managers
munication between shards on the request processing critical also issue Save/Inv events to upstream ones through the same
path—cache managers of different shards only communi- queue. Cache managers in different shards of the same ser-
cate invalidation in the background. At the same time, the vice also communicate with each other when broadcasting
invalidation delays in MuCache are very small (ms)—orders invalidations using SendToShardCM.
of magnitude smaller than standard values of TTL used in The code on the left depicts wrapper logic run before
practice to evict cache items (seconds to hours) (§7). a request starts processing (preReqStart), when a request
has finished processing (preReturn), when a request reads
Correctness. The correctness condition for MuCache is based
from a key (preRead), before a request performs a call to
on classical refinement modulo reordering, i.e. that all behav-
another service (preCall), and after a request writes to a key
iors exposed by a cache-enabled application are equivalent to
(postWrite). The code on the right depicts cache manager
a behavior of a cache-free version after potentially reordering
logic, which processes events in the message queue sent by
independent observable events. The execution in Figure 3 is
the wrappers and cache managers of other services.
correct because it could have been observed from the original
application if the write (1) had happened right after (2) and Wrapper. The wrappers keep two types of state. The first
(3) since they are independent requests from different clients. is a global (per service shard) readsets map from request
Guaranteeing correctness is challenging for call graphs identifiers to sets of keys and call arguments, which keeps
with more than one path between the same two services, i.e., the dependencies of each pending read-only (RO) request.
when a request accesses the same backend service twice in The second is the per-request context ctx, which is carried
its lifetime. Figure 4 shows such an example of a ‘diamond’ around while a request is processed. ctx contains (1) the
pattern. In this example, a service S1 first calls S2, which in id of the request (ctx.call_id); (2) the hash value of the
turn calls S4 that writes to its store. Then S1 calls S3, which request’s arguments (ctx.ca); (3) the caller of the request
calls S4 trying to read from the same value that was written (ctx.caller); (4) the visited services of the request and
by S2. It is possible that S1 could find the result of a previous its subrequests (ctx.visited); and (5) whether the current
call to S3 in its cache, reading a stale value, leading to an request is read-only and, therefore, cacheable by its caller
execution that would not be observable without caches. Since (ctx.isRO). Wrappers send a Start(ca) message to their as-
microservice call graphs are dynamic, this pattern cannot be sociated cache manager before a request starts processing and
identified and prevented statically (before execution). Mu- then maintain the request readset when a read or a subrequest
Cache addresses this at runtime by keeping track of visited is performed. Once the request is complete, the entire readset,
services during request processing, not checking a cached along with the call arguments, the caller, the return value,
entry if it depends on a service that has already been visited. and the visited services are sent to the cache manager as
an End(ca, rs, caller, ret, vs) message. Wrappers also
Summary. We conclude this section by describing how Mu-
send Inv(k) messages to cache managers after a datastore key
Cache satisfies the previously stated requirements:
k is modified. preCall checks the cache before invocation
• Correctness: We prove that MuCache does not introduce and returns directly upon cache hits.
behaviors that are not part of the original application (§5).
Cache manager. The cache manager controls the contents of
• Non-blocking and low overhead: Cache managers do all
the cache. The cache manager contains two global (per service
processing in the background and the wrappers only send
shard) state components: saved and history. The saved map
messages to them, never blocking for a response.
acts as an inverted index of wrappers’ readsets by mapping
• Dynamic graphs: MuCache tracks dependencies to guar- keys (or call arguments) to the corresponding service that
antee correctness in the presence of dynamic call graphs. has read (or called) them. When a key or a set of calls is
• Sharding: MuCache supports sharding without any addi- invalidated, the cache manager looks up saved to locate all
tional communication on the critical path. the affected upstream services and asks them to invalidate the
• Application and datastore agnostic: MuCache does not set of relevant calls that they have cached by sending them
require any modification to the application or datastore Inv messages. The second state component, history, is a
code because wrappers intercept all communication. sequence of calls and invalidations used to determine whether
• Incremental deployment: Developers can gradually declare a call can be safely cached upstream. When a request with
read-only endpoints to get incremental benefits. readset rs is complete, the cache manager scans the history
in reverse chronological order for invalidations that intersect

224 21st USENIX Symposium on Networked Systems Design and Implementation USENIX Association
1 global readsets : map(Key, set(Key | CallArgs)) 1 # Tracks which keys and calls will invalidate
2 2 # which cache entries upstream
3 def preReqStart(ctx): 3 global saved : map(Key | CallArgs, map(Service, CallArgs))
4 if ctx.isRO: 4 # Sequence of calls and invalidations
5 cid, ca = ctx.call_id, ctx.ca 5 global history : list(Call(CallArgs) | Inv(Key | CallArgs))
6 readsets[cid] = set() 6
7 SendToCM(Start(ca)) 7 def startHandler(ca):
8 8 history.append(Call(ca))
9 def preReturn(ctx, ret): 9
10 if ctx.isRO: 10 def endHandler(ca, rs, caller, ret, vs):
11 cid, ca = ctx.call_id, ctx.ca 11 # Checks if there are any invalidations
12 rs = readsets.pop(cid) 12 # to the readset since the call start
13 caller = ctx.caller 13 if empty([for Inv(k) in history.invs_after(Call(ca))
14 vs = ctx.visited 14 if k in rs]):
15 SendToCM(End(ca, rs, caller, ret, vs)) 15 SendToCM(caller, Save(ca, ret, vs))
16 16 saved.store(rs, ca, caller)
17 def postWrite(ctx, k, _v): 17
18 SendToCM(Inv(k)) 18 def invHandler(k):
19 19 match type(k):
20 def preRead(ctx, k): 20 case Key:
21 if ctx.isRO: 21 history.append(Inv(k))
22 cid = ctx.call_id 22 case CallArgs:
23 readsets[cid].insert(k) 23 history.extend([Inv(ca) for ca in k])
24 24 # Inform CMs of same-service shards
25 def preCall(ctx, ca): 25 SendToShardCMs(Inv(k)) # (see Sec. 4.1)
26 if ctx.isRO: 26 # Ask all affected callers to invalidate
27 cid = ctx.call_id 27 affected = saved.pop(k)
28 readsets[cid].insert(ca) 28 for caller, cas in affected:
29 # Check if ca refers to a read-only endpoint and if 29 SendToCM(caller, Inv(cas))
30 # the visited services are disjoint with the cache 30
31 # item subtree 31 def saveHandler(ca, ret, vs):
32 if ca.isRO and visited_disjoint(ctx, ca): 32 save_visited(ca, vs)
33 return cache.get(ca) 33 cache.set(ca, ret)
34 return None
F IGURE 5—(Left) The wrapper code of the protocol that intercepts the start of request processing, returns, writes, reads, and calls. (Right) The
cache manager code that processes work queue items sent by the wrappers and other cache managers.
S1 (r)
(1) Call(ca) S.T1 S.T1 (r) Read(k)
(2) (6)
(2) Return v
(s) (e) (w) (i) (w) Write(k, v)
S1.C (w) (r)
(1) (5) (3) End(ca, …) S.T2 S.T2 (s) Start(ca)
S1.CM (i)
(4) Inv(ca) (s) (e) (e) End(ca, …)
(4)
S2 (5) Cache.delete(ca) S.CM S.CM (i) Inv(k)
(3) (6) Cache.save(ca) -> v
S2.CM F IGURE 7—Possible imprecisions in invalidation. The three lines
represent two service threads processing requests and the cache
F IGURE 6—A bug that would occur if invalidate messages were
manager.
allowed to overtake saves.
ager could track the exact order of all reads and writes to pre-
with rs since the call started. If there is no such invalidation,
cisely track invalidations. Since requests are being processed
it asks the upstream cache manager to save the result. The
concurrently, this would require coordination across different
cost of this scan is proportional to the product of request
service threads, which would significantly slow down request
rate and average request duration, which is typically a small
processing along the critical path. MuCache relaxes the track-
number. For example, a service handling 10,000 requests
ing of reads and writes in two ways that do not jeopardize cor-
per second, each lasting 100 milliseconds, requires scanning
rectness, but reduce the synchronization overhead. First, all
several thousand items.
reads of a request are gathered by the wrappers (preRead) and
Saving a new cache entry. A naive method of saving a new are only sent to the cache manager at the end of the request
entry involves the caller immediately saving it to the cache (the rs argument in the End message). To ensure correctness,
upon the result’s arrival, rather than awaiting an explicit Save the cache manager then assumes that all reads happened at
message from the callee’s cache manager. This is not correct, the start of the call, considering the call invalid if a write
as it allows the bug shown in Figure 6 where the invalidation happened in its duration even if it happened before the reads
message by the S2 cache manager “overtakes” the save done (Fig. 7, Left). Second, writes are intercepted in a non-atomic
by S1, leading to the cache entry never being invalidated. fashion after they have been completed (postWrite). This
Thus, it is necessary for Invs and Saves to not be reordered. could allow for a call to start and complete in between the
MuCache achieves that by issuing them sequentially through actual write and postWrite, leading to its cached response
the cache manager. being unnecessarily invalidated (Fig. 7, Right).
Invalidating an entry. Invalidations are triggered when a key Evicting an entry. There are two types of evictions in Mu-
used in a cached result is modified. Naively, the cache man- Cache. First, a cache could fill up with entries and needs to

USENIX Association 21st USENIX Symposium on Networked Systems Design and Implementation 225
evict an entry to make space for new ones; in this case, the manager also stores the services, S’, that were visited during
eviction is safe without any additional work since the proto- the processing of ca. Before checking the cache, the wrap-
col is robust to re-invalidations (i.e., it is safe to invalidate a per checks if the downstream service has ever visited a ser-
cache entry that was previously evicted). Second, the cache vice in S’ that has also been visited by the current request
manager might need to reclaim space if it is keeping track of visited_disjoint(ctx, ca); if so, it does not retrieve the
the dependencies of many calls. It reclaims space by evicting return value from the cache to preserve correctness. MuCache
a key or call from its saved dependencies and consequently tracks visited services using a binary encoding that keeps its
sends invalidation messages to all affected calls upstream as size small—less than 1 KB for 1000 services.
if the key were invalidated (see inv(k) in Figure 5).
Garbage collection. The cache manager has two state com- 5 Protocol Correctness
ponents that grow during execution: (1) the history and (2) To demonstrate the correctness of MuCache, we show that
the dependencies. It keeps the history bounded by remov- clients cannot differentiate a MuCache-enabled application
ing completed calls when processing an End request, adding from the original without caches. We give semantics to mi-
minimal overhead. The protocol preserves correctness in the croservice applications (with and without caches) using ob-
presence of multiple pending calls with the same arguments servable execution events and traces. Events are indivisible
by removing the latest occurrence of a call start (potentially actions (steps) that can be performed by a microservice ap-
overapproximating the duration of the other calls). When the plication; examples of events include reading from a key in
cache manager reaches a memory limit, it deletes some of its the datastore and receiving a response from a completed sub-
saved dependencies as long as it informs the upstream caches request. An application can be uniquely described by the set
to evict relevant entries (similarly to a normal invalidation). of traces (event sequences) that can be observed in it. Two
The current implementation evicts dependencies following an traces are said to be equivalent modulo reordering when all
LRU policy, though other choices could be used. events in one trace exist in the other trace but potentially in a
Sharding. MuCache supports sharded service deployments different order. Reorderings are necessary for our correctness
by attaching a cache manager to each shard; the only require- theorem to allow reads and writes to proceed concurrently (as
ment being that read-only calls with the same arguments are in Figure 3). In this section, we informally describe three as-
always processed by the same shard (e.g., by load balancing sumptions that are central to our formal development, the first
these calls based on a hash of the call arguments). This guar- two hold for all microservice applications, and the last one is
antees that a single cache manager is the sole authority for a requirement of MuCache. We then state our main theorem
the invalidations of each read-only call, ensuring that they and give the high-level intuition for the proof. The complete
will be the only ones to send cache-save and cache-invalidate formal development and proof can be found in Appendix A,
messages for that call. The only change in the protocol is that which is available in the supplementary material.
a cache manager processing an invalidate due to a write needs (A1) Always enabled requests. Requests in a microservice
to broadcast it to all cache managers of the other shards of the application only block when waiting for subrequests that they
same service, so that they can invalidate their relevant calls have invoked to finish executing and there is no blocking
(see L.25 in Figure 5). It is important to note that broadcasts communication across independent requests. In other words,
only happen upon users’ writes; transitive invalidations propa- if a trace can be observed in an application, then we can pick
gated upstream do not trigger broadcasts. Broadcasting out of and execute any pending request, or any of its subrequests,
the critical path is safe because, similarly to the single-shard until it produces an execution event, and the new trace will
protocol, overapproximating the write duration might lead to also be part of the application’s set of traces.
additional invalidations but not fewer. MuCache, therefore,
does not add any latency overhead on the request processing (A2) Reordering independent events. Two events are depen-
critical path to support sharded services. dent when the first event affects the execution of the second:
some examples include two events that are part of the same
Handling Dynamic Call-graphs. Microservice applications request, or a write and a read event to the same key in a ser-
can exhibit a diamond pattern (Figure 4) where a request vice datastore. The complete definition of dependent events
performs multiple subrequests to the same service through is given in Appendix A. We assume that due to multithread-
its lifetime. In such applications naive caching could lead ing, independent events commute; that is, reordering any two
to executions that cannot be observed without caches. Mu- consecutive independent events in an application trace results
Cache addresses this by keeping track of the visited services in a trace that can also be observed by the application.
in two locations. First, each request keeps visited services in
its context (ctx.visited); whenever a subrequest ca returns, (A3) Linearizable datastores. We assume that the datastore
the parent request adds all the visited services of the subre- of each service is linearizable [26]: operations on an object
quest (ca.visited) to its own visited services (ctx.visited). take place atomically, in an order consistent with the oper-
Second, when saving a cache entry for call ca, the cache ations’ real-time order. For instance, if a write completes
before a read begins, then the read must observe the effects

226 21st USENIX Symposium on Networked Systems Design and Implementation USENIX Association
of the write and complete after it. This is necessary due to to a single communication protocol, cache, or datastore. Our
the requirement that MuCache does not modify the under- wrappers are built on top of Dapr [3], a service mesh extended
lying datastore and can only observe writes to the datastore to also support state accesses through its API. Dapr supports
before or after they are completed. If we were to use a non- custom middlewares that can be used to intercept invocations
linearizable datastore a write could take effect after it returns, and state accesses. It also provides a common abstraction for
making it impossible to track which calls it invalidates. many service communication protocols and different storage
backends, allowing us to implement our wrappers once and
Theorem 1 (Protocol Correctness). For all traces in a cache- inherit support for all the alternatives.
enabled application, there exists a trace in the original appli-
Dependencies between client requests. MuCache’s caching
cation without caches, such that all the client events in the
protocol treats client requests as independent and allows them
two traces are equivalent modulo reordering.
to be reordered, processing reads and writes from different
Proof intuition. To show the theorem, we prove a stronger clients without synchronization. However, this might not al-
lemma, namely that for all cache-enabled traces, we can con- ways be desirable, e.g., when a client request expects to see
struct an original trace where (1) all request subtraces are the the effects of a previous request. To support this, we extend
same (modulo the missing requests due to cache hits), and (2) MuCache’s dependencies (Sec. 4) to client requests. Specifi-
that the application state is the same at the end of both traces. cally, when a client request is complete, visited services are
The proof proceeds by induction on the length of traces and included in the result, and passed to the subsequent request
has three phases: (1) given a trace in the cache-enabled ap- of the same client (if one is performed), allowing MuCache
plication that ends with a cache-hit, it uses assumption (A2) to avoid violating dependencies across client requests.
to move writes that happened before the cache-hit but would Supporting third-party services. Microservice applications
later invalidate its entry to the end of the trace (together with often perform requests to third-party services that might not
their dependencies); (2) it then uses the inductive hypothesis be extensible with MuCache, e.g., if they are owned by a dif-
to construct a trace in the original application for the prefix ferent organization. To support such applications, MuCache
up to the cache-hit; and (3) it uses assumption (A1) to fill in allows declaring requests to third-party services as read-only
all subrequest events that are missing due to the cache-hit, using a TTL, saving their values to the cache on return, but
and then it fills in the writes and all their dependencies (A3), invalidating them when the TTL has passed instead of wait-
ending up with a trace that satisfies the requirement. ing for a downstream cache manager. This setup provides
caching benefits with at least as strong guarantees as if all
6 Implementation the caches in the application were configured with a TTL,
The MuCache implementation comprises roughly 2k LoC of however for the complete subtrees of the microservice graph
Go [12], including the wrappers that intercept invocations and that are MuCache-enabled the guarantees are stronger.
state accesses, and the cache manager that makes invalidation
and saving decisions. Communication between wrappers and
7 Evaluation
the cache manager happens with ZeroMQ [16] and between Our evaluation aims to answer these high-level questions:
cache managers with HTTP. Our current implementation uses • (Q1) Throughput and latency benefits: Does MuCache
Redis [9] as the cache, but any in-memory store could be used provide throughput and latency benefits compared to other
in its place. We use 32-bit FNV-1a [11] algorithm to compute caching alternatives? Does it scale with sharding? How do
the hash values of call arguments. cache sizes affect its performance? How are its benefits
Batching. Cache managers instruct their upstream counter- affected by the application call-graph? (§7.3)
parts to save or invalidate cache entries by sending HTTP • (Q2) Costs: What are the costs of deploying MuCache?
requests that might become a bottleneck when the load is What is its CPU and memory usage, total network costs,
high. To increase throughput at high loads without affecting and its latency overhead on the critical path? Does the
correctness, MuCache allows batching requests that are sent cache manager throughput become a bottleneck? (§7.4)
upstream. At low loads, batching increases the time it takes • (Q3) Invalidation: How fast can MuCache invalidate
for an invalidation to propagate through the system based on cache entries? (§7.5)
the batching timeout, which is currently set to 1ms. Batching Before we answer these questions, we describe the experimen-
also enables the simplification of upstream requests by can- tal setup (§7.1) and our methodology and baselines (§7.2).
celing out operations at the sender, i.e., invalidates and saves
override previous invalidates and saves on the same key. This 7.1 Experimental Setup
reduces the size of requests and the number of operations We deploy a Kubernetes [7] cluster on CloudLab [2] m510 ma-
upstream cache managers have to process, while incurring chines that have 8-core 2.0 GHz CPUs, 64GB RAM, 256GB
minimal cost since it requires a single pass over the batch. NVMe SSDs, and 10GB NICs. Machines run Ubuntu 20.04.
General support. MuCache is designed to not be limited The average round-trip time between servers is 0.15ms. Ex-

USENIX Association 21st USENIX Symposium on Networked Systems Design and Implementation 227
cept for sharding experiments, we utilize a single Kubernetes Benchmark Services LoC RO/NonRO Sources
cluster where the number of worker nodes is equal to the
1 SocialMedia 6 532 90/10 [10, 24, 32]
number of services, plus one node acting as a control plane. 2 MovieReview 12 913 90/10 [13, 24]
Each service is deployed via Dapr [3] and is affinitized to a 3 HotelRes 6 608 80/20 [24]
single node. We use Redis [9] configured with an LRU evic- 4 OnlineBoutique 9 1,088 75/25 [8]
tion policy as the cache. Unless otherwise noted, MuCache is
configured with a sending batch size of 20 and a 1 ms timeout. F IGURE 8— Real-world applications used in our evaluation.
Cache manager dependencies are stored in a LRU cache, with
the maximum number of entries being proportional to the
user cache size. In our experiments, the cache manager stores
100 dependencies per 1 MB of user cache (e.g., a user cache
of 20 MB allows the storage of 2,000 dependencies).
F IGURE 9—Shapes of synthetic benchmark call-graphs.
7.2 Applications, Method, and Baselines
fan-out, and fan-in. ChainApp has four stateless services and
Throughout our evaluation we perform experiments on four a stateful backend. FanoutApp has a single frontend forward-
open-source microservice applications, as detailed in Figure 8, ing requests to four backends. FaninApp has four separate
along with four synthetic ones. Workloads are adapted from frontends, each forwarding requests to one backend.
the original testbeds, including the dataset and request distri-
Method. We measure throughput and latency (median and
bution. Cache sizes are set relative to the application working
95th percentile) using the wrk2 [15] HTTP benchmarking
data set; small enough that they do not fit the entire work-
tool. Experiments include a 30-second cache pre-warming
ing data set but big enough so that there is a non-negligible
period, followed by a 60-second testing period. Each experi-
amount of cache-hits.
ment is run three times, and the average is reported. We run
SocialMedia. A social network application (Cf. Twitter or MuCache and the baselines with the same CPU resources;
Facebook) that provides three main endpoints, viewing a that is, MuCache’s cache managers are not given extra cores
user’s homepage timeline (RO), viewing a user’s personal but share resources with the application.
timeline (RO), and composing a post. The workload ratio is
Baselines. We compare MuCache to the following baselines.
60% homepage, 30% user timeline, and 10% compose post.
The cache size for each service is set to 20 MB. When there BC (Backend Cache): A baseline that lacks inter-service
are no new posts and each timeline contains 10 posts, the total caching and only caches data from the backend datastore.
cacheable posts are around 20 MB. TTL: A baseline that reflects the current best practices for
MovieReview. A movie review application (Cf. IMDB or automated inter-service caching [1, 27, 33]. Caching occurs at
Rotten Tomatoes) that offers two main endpoints: viewing the both the backend and intermediate services. Upon invocation,
page of a movie (RO) and creating a review. The workload the caller saves the result in the cache asynchronously without
ratio is 90% viewing a page and 10% creating reviews. The communicating with any cache manager. The caches can then
cache size for each service is set to 70 MB. evict an entry when they become full or, in the case of an inter-
HotelRes. A hotel reservation application (Cf. Booking or service cache, when a configured time-to-live (TTL) timer
Airbnb) that offers two main endpoints: searching for hotels has expired. Cached data can be inconsistent and arbitrarily
in a specific area (RO) and making a reservation. The work- stale (depending on the TTL and access pattern).
load ratio is 80% searching for hotels and 20% making a TTL-∞: A special case of TTL that serves as an upper bound
reservation. The cache size for each service is set to 20 MB. on the performance achievable by TTL implementations;
OnlineBoutique. An online store application (Cf. Amazon cache entries never expire and are only evicted when the
or Walmart) that offers multiple endpoints, retrieving the cache reaches maximum capacity.
store homepage (RO), updating the currency rate, viewing a
product (RO), adding a product to the cart, and checking out.
The workload ratio is 75% read-only (homepage, viewing 7.3 (Q1) Throughput and Latency Benefits
products, and carts) and 25% non-read-only (updating the
currency, updating the cart, checking out). The cache size for We first measure the throughput and latency of a set of real-
each service is set to 80 MB. world applications with and without MuCache (§7.3.1). We
Synthetic Benchmarks. Figure 9 shows four synthetic ap- then compare it against different TTL baselines (§7.3.2), we
plications: ProxyApp, a two-service app where a stateless evaluate whether it limits throughput scalability in the pres-
frontend forwards requests to the backend, which in turn ence of sharding (§7.3.3), and we evaluate whether configur-
reads/writes to a key-value store; and three applications that ing caches with different sizes and whether different applica-
extend ProxyApp with archetype call-graph patterns—chain, tion call-graphs affect MuCache’s benefits (§7.3.4–7.3.5).

228 21st USENIX Symposium on Networked Systems Design and Implementation USENIX Association
HotelRes MovieReview SocialMedia OnlineBoutique
200
100
Latency (ms)

50
20
10
5

1,000 2,000 3,000 4,000 1,000 2,000 3,000 4,000 500 1,000 1,500 2,000 2,000 3,500 5,000 6,500
Request Rate (rps) Request Rate (rps) Request Rate (rps) Request Rate (rps)
BC 50/95th MuCache 50/95th TTL-∞ 50/95th

F IGURE 10—Latency and throughput of real-world applications (described in Figure 8).

7.3.1 Real-world applications 100

Latency (ms)
50
We evaluate MuCache’s benefits on throughput and latency on
the four open-source microservice applications. We compare 20
MuCache against (1) BC to evaluate performance benefits 10
over not having inter-service caches, and (2) TTL-∞ to eval- 1,000 1,500 2,000 2,500 3,000 3,500 4,000
uate how close MuCache is to an implementation that caches Request Rate (rps)
results but provides no consistency guarantees.
TTL-0.1s 50/95th TTL-1s 50/95th
Results. Figure 10 shows the results, where the X-axis is TTL-10s 50/95th MuCache 50/95th
request rate, and the Y-axis shows latency in ms. MuCache
F IGURE 11—HotelRes: Latency and throughput of MuCache com-
reduces median latency by up to 1.8× in HotelRes, 2.5× in
pared with various TTL.
MovieReview, 1.5× in SocialMedia, and 2.1× in OnlineBou-
tique. The tail latency between MuCache and BC is similar, performs TTL-1s (1.3× lower median latency), but is outper-
except for OnlineBoutique, where MuCache reduces tail la- formed by TTL-10s (which performs similarly to TTL-∞).
tency by up to 1.8× by avoiding many invocations from the Take away. Getting comparable performance to MuCache
Checkout service, such as retrieving product information, get- with a TTL-based caching approach requires setting the TTL
ting shipping quotes, etc. Furthermore, MuCache improves to a high value (>1 s)—orders of magnitudes higher than
throughput by 1.6× in HotelRes, 1.5× in MovieReview, and the MuCache invalidation times (on the order of ms per call-
1.4× in SocialMedia, while achieving similar throughput in graph depth as shown in Section 7.4.3). Furthermore, finding
OnlineBoutique. Compared to TTL-∞, MuCache’s median an appropriate TTL value is challenging for developers, as this
latencies are up to 1.2× higher before saturation, and Mu- value has implications for the correctness of the application.
Cache’s throughput is around 0.95×. In contrast, MuCache requires no tuning of expiration times,
Take away. MuCache outperforms BC in terms of median and and invalidations happen automatically and correctly.
tail latency, and throughput across all workloads. MuCache
also performs close to the upper bound TTL-∞. Improve- 7.3.3 Sharding Scalability
ments in median latency can be attributed to cache hits, while We evaluate the scalability of MuCache by deploying So-
improvements in throughput are due to lower utilization of cialMedia to multiple shards. We provision a fixed pool of
backend services. machines and restrict each shard to a fixed CPU usage of
2 cores (1 running the service, 1 running the Dapr sidecar)
7.3.2 Comparison with TTL baselines to have multiple shards on a single machine. Each shard is
Tuning TTL values for caches in real systems is complex and deployed with its own cache manager. We compare against
depends on the application requirements; suggested values BC to determine whether MuCache limits scalability.
could range from seconds to hours [18, 23]. To simulate that Results. Figure 12 shows the maximum throughput of the
in a shorter experiment, we vary TTL from 100 ms to 10 s— SocialMedia when deployed using 1, 2, and 4 shards, with and
values under 100 ms lead to negligible cache hits, and a TTL without MuCache. MuCache scales as well as BC (achieving
of 10 s is already a large fraction of the total experiment (60 s). 1.44×, 1.38×, and 1.37× the throughput of BC).
Results. Figure 11 shows the results. As the TTL increases Take away. MuCache does not limit scalability for sharded
from 0.1 to 10 s, median latency drops from 18.2 ms to applications as the only cost occurs in the background; when
10.9 ms, tail latency drops from 29.3 ms to 10.9 ms, and the cache manager of a shard broadcasts received writes to all
throughput increases from 2,489 to 3,470 rps. MuCache out- cache managers that belong to the same service shards.

USENIX Association 21st USENIX Symposium on Networked Systems Design and Implementation 229
1,500 1.37x Chain Fanout Fanin
Baseline 200
T-put (r/s)

Latency (ms)
1.38x 100
1,000 MuCache 50
1.44x 20
500 10
5
1 shard 2 shards 4 shards
1 3 5 7 1 1.5 2 2.5 3 2 6 10 14
F IGURE 12—Throughput of MuCache and BC when sharding the Request Rate (krps)
services in SocialMedia.
BC 50/95th MuCache 50/95th
25 1
Latency (ms)

20 0.8

Hit rate
15 0.6 F IGURE 14—Latency and throughput of the graph shape mi-
10 0.4 crobenchmarks (Fig. 9).
5 0.2
0 0 Benchmark Average (MB) Max (MB) Cache Size (MB)
16 32 64 128 256 512 1024
Cache size (MB) 1 HotelRes 0.08 0.27 20
MuCache 50/95th MuCache hit rate
2 MovieReview 0.07 0.31 70
TTL-∞ 50/95th TTL-∞ hit rate 3 SocialMedia 0.02 0.09 20
4 OnlineBoutique 0.1 0.45 80
F IGURE 13—HotelRes: Impact of different cache sizes on latency
(left Y-axis) and combined cache hit rate (right Y-axis). F IGURE 15—Cache manager state and cache size for each service.

7.3.4 Cache size effect improves tail-latency but not median latency since the fron-
tend has to wait for the slowest path to respond; and when the
To evaluate how MuCache responds to the cache size of each
backend is the bottleneck it improves throughput by reducing
service, we measure latency and cache hits on HotelRes with
the number of requests that reach the backend.
a fixed load of 1K req/s while varying the cache size from
16 MB to 1024 MB. TTL-∞ acts as an upper-bound baseline. 7.4 (Q2) MuCache costs and overheads
Results. Figure 13 shows the results. Increasing the cache In order to evaluate the costs of MuCache, we measure its
size lowers the median latency of MuCache from 9.9 ms to CPU, memory, and network usage (§7.4.1), its latency over-
8.2 ms and tail latency from 22 ms to 13.6 ms; it also increases head on the critical path (§7.4.2), and the cache manager’s
the cache hit rate from 5% to 91%. Similarly, in TTL-∞, the throughput and whether it can be a bottleneck (§7.4.3).
median latency decreases from 9.9 ms to 7.3 ms, tail latency
from 21.6 ms to 10.6 ms, and cache hit rate from 5% to 100%. 7.4.1 Memory / CPU / Network costs
Take away. Caching with MuCache reduces mean and tail We evaluate MuCache’s memory cost on all four applications
latency. Furthermore, the reductions achieved by MuCache and its CPU and network usage on HotelRes. We evaluate
are close to those achieved by TTL-∞ across all cache sizes. MuCache’s network usage by measuring data transfer be-
7.3.5 Application call-graph effect on performance tween nodes using iftop. We measure the memory cost of
each cache manager instance as the average size of its state
To evaluate how the application call-graph pattern affects the (history and dependencies) and CPU cost as the average CPU
benefits of MuCache, we use the three synthetic applications usage of each service during the experiment. We use standard
in Figure 9. We use a synthetic workload with 50% cache hit cache sizes and load (2K req/s for HotelRes, 2.5K req/s for
rates and compare against BC. MovieReview, 1K req/s for SocialMedia, and 3.5K req/s for
Results. Figure 14 shows the results. For ChainApp, Mu- OnlineBoutique) for 300 seconds.
Cache’s median latency is 2.6–3.1× lower than that of BC, Results. Figure 15 shows the cache manager state size and the
while its tail is comparable before reaching saturation. Its cache size across services. The average size of the CM state
maximum throughput is 1.5× higher. For FanoutApp, the across services ranges from 0.1–0.4% of the cache size per
median latency and maximum throughput of MuCache are service. Garbage collection plays an important role in keeping
similar to that of the BC, but its tail latency is up to 1.6× the memory usage low: without GC, the CM state in HotelRes
lower. In FaninApp, MuCache improves median latency by goes up to 5 MB in 1 minute. Figure 16 shows the average
1.1–1.3× and 95th percentile latency by up to 1.9×; maxi- CPU usage of each service during the experiment. Usage is
mum throughput is 1.75× higher than BC. broken down between the service logic, the Dapr sidecar, and
Take away. MuCache provides different benefits depending the cache manager. The average CPU usage across services
on the call-graph shape. For long call-chains MuCache re- with and without MuCache is 4.2 and 5.1 cores respectively.
duces latency by avoiding network hops; for fan-out it slightly The average CM CPU usage across services is 0.5 cores. The

230 21st USENIX Symposium on Networked Systems Design and Implementation USENIX Association
Baseline MuCache Batch Size 1 2 5 10 20 50
CPU (cores)

6 Throughput (krps) 19.2 26.5 44.5 74.7 75.2 74.6

Service
4 Dapr F IGURE 18—Batching effects on cache manager throughput.
2 CM
Chain Size 2 3 4 5
Mean invalidation time (ms) 3.94 6.13 8.41 10.66
Pr nt.

Re t e
Se rv.

Pr nt.

Re te
Se rv.

r
Rale

U h

Rale

U h
se

se
c

c
ofi

ofi
ar

ar
se

se
o

o
Fr

Fr F IGURE 19—Invalidation time for different chain sizes.

F IGURE 16—CPU usage per service for HotelRes.
performs minimal work), MuCache imposes a low (∼10%)
5 8 latency penalty on cache misses.
Latency (ms)

4 7
7.4.3 MuCache’s throughput
3 6
2 5 To determine whether MuCache’s cache manager can be a
1 4
bottleneck, we measure the its maximum throughput on the
ProxyApp and load the backend’s cache manager directly be-
0
0.2 0.5 0.7 0.8 0.9 0.95 0.975 0.99 cause the backend service becomes the bottleneck otherwise.
Percentile Percentile The load is 80% read-only requests and we vary the batch
BC MuCache (0%/60%) TTL-∞ (0%/60%) size of the HTTP sending buffer between cache managers.
Results. Figure 18 shows the throughput in terms of the
F IGURE 17—Latency distribution w.r.t. hit-rate for ProxyApp. Solid
number of events the cache manager processes per second.
and dashed lines show the latencies when the hit rate is 0% and 60%
Without batching, the cache manager has a throughput of
respectively. Split in 70th percentile for clarity.
∼19K events per second, while gradually increasing the batch
average network usage per service without MuCache is 9.0 size up to 20 improves it to ∼75K events per second.
MB/s, while the average with MuCache is 6.6 MB/s, of which Take away. The cache manager has a reasonably high
cache managers use 2.9 MB/s. throughput and is not the bottleneck even for an application
Take away. Memory costs are low compared to the cache with minimal computation. To further increase throughput,
size (<0.4% on average). The CPU usage of MuCache is developers may deploy multiple shards for each service.
13% of the total service CPU on average while at the same
7.5 (Q3) Invalidation time
time reducing the total CPU usage of the whole application
due to some backend services being less utilized because of We evaluate the time needed for invalidations to reach the root
cache hits in the frontend. Though cache managers use some of the call-graph, namely the frontend service, by measuring
bandwidth to save/invalidate caches, MuCache reduces the the observed inconsistency window [19], the elapsed time
total network usage by 27% due to local cache hits. between the write happening in the backend and the inval-
idation becoming visible in the frontend. The invalidation
7.4.2 MuCache latency overhead time in our experiment is determined solely by the depth of
the call graph. To measure the increase in invalidation time
We evaluate MuCache’s latency overhead by focusing on
per hop, we conducted experiments on a microservice chain
ProxyApp, which performs minimal work, to measure the
consisting of 2 to 5 services, which represents the typical
worst-case overhead. We create a synthetic workload with
depths of call-graphs in the applications that we studied.
0% and 60% cache hit rates and compare against (1) BC to
evaluate the overhead over no caches when there are no hits, Results. Figure 19 shows the results. For a two-service ap-
and (2) TTL-∞ to evaluate the wrapper overhead. plication, the invalidation time is ∼4 ms; for a five-service
application it is ∼10 ms. Each additional service in the chain
Results. Figure 17 shows the complete request latency distri-
increases invalidation time by ∼2.2 ms.
bution. We report overheads as absolute values because they
are constant and independent of the work that the services Take away. MuCache’s invalidation time is ∼2.2 ms per
do. For a hit rate of 0%, MuCache’s median latency (4 ms) is call-graph hop—orders of magnitude smaller than the typical
0.5 ms higher than BC and 0.3 ms higher than TTL-∞, while invalidation times observed in TTL-based approaches (which
the 95th-percentile (5.7 ms) is 0.9 ms and 0.5 ms higher re- range from seconds to hours [18, 23]).
spectively. When the hit rate is 60%, MuCache’s median and
95-th percentile latencies are 0.15 ms and 0.5 ms higher than 8 Related Work
TTL-∞. When the hit rate is 60%, MuCache median latency Caching in microservice applications. Several works study
is 1.4 ms better than BC (3.5 ms to 2.1 ms). cache usage in real-world microservices, including work from
Take away. Even in a worst-case scenario (an application that Alibaba [28], Twitter [38], and Facebook [37]. These papers

USENIX Association 21st USENIX Symposium on Networked Systems Design and Implementation 231
confirm that caches are heavily used in microservice applica- would be more challenging since caches should not violate
tions and provide significant performance benefits, but only transactional guarantees, which would require additional syn-
mention manual, ad-hoc, or inconsistent coherence schemes chronization in the protocol. Supporting non-KV stores, such
and do not propose an automatic way to manage these caches. as relational databases, would require monitoring the depen-
Caching frameworks for web services. There is a lot of dencies of read-only calls and determining when to invalidate
work on caching frameworks for web services for both static cache entries, which could be done by leveraging the expres-
and dynamic data. These frameworks focus on three key as- sive semantics of SQL (as in the case of Noria [25]).
pects: (1) content admission, (2) cache size management, and Supporting weaker consistency datastores. The correctness
(3) invalidation and data freshness (for a more detailed clas- of MuCache depends on the datastores being linearizable;
sification see a recent survey [29]). The first two aspects are MuCache needs to be sure that after a write has completed,
orthogonal to our work since we do not focus on optimizing it has taken effect in the database. Being able to determine
the performance of a cache given a specific workload, but the order of reads and writes by intercepting the datastore ac-
rather propose a general system for keeping caches coherent cesses is necessary so that MuCache is database-agnostic (see
in a microservice setting. To the best of our knowledge, all requirements in Section 3). Supporting weaker consistency
frameworks that focus on invalidation (e.g., [21, 22, 31]) are datastores would likely require a more intrusive design with
designed as a single cache layer on top of a database without modifications to a datastore—tightly integrating wrappers in
taking into account the inter-service caching. the store to provide additional metadata to the cache managers
Cache coherence protocols. There is extensive literature on about the precise order of reads and writes—forfeiting the
cache coherence protocols (see survey [34]), none of which generality of being database-agnostic.
considers inter-service caching. Lazy caching [17] exploits Application debuggability. Extending an application with
the fact that writes do not always require exclusivity (M or MuCache provides performance benefits and does not affect
E in MOESI [35]), allowing cores to perform concurrent the application behavior but adds complexity to the end-to-
buffered writes, albeit blocking reads to ensure that depen- end deployment and therefore increases the effort required
dencies are not violated. Our work extends this insight by to maintain and debug it. This is an inherent software en-
avoiding all blocking communication on the request’s criti- gineering challenge—the bigger a codebase is, the harder
cal path—allowing writes downstream without immediately it is to maintain it. A direction for future work that could
informing the upstream caches and without blocking on reads. help address this is to integrate MuCache with existing dis-
Incremental computation. Caches are also used to enable in- tributed tracing and debugging tools for microservices, so that
cremental and reactive computation: some examples include engineers have visibility on MuCache’s state and actions.
Reactive Caching [20], Noria [25], and Diamond [39]. Reac- Write-intensive workloads. Even though a service might
tive Caching proposes caches for graphs of single-threaded offer a read-only endpoint, its workload might be write-
services to support reactive computation, i.e., writes down- intensive, leading to overheads without the accompanied ben-
stream are propagated upstream to refresh the results. Noria is efits if extended with MuCache. Developers can currently
an incremental stream processing engine that uses caches for manually detect such cases and avoid declaring those end-
fast propagation of updates in a dataflow. Both differ from our points as read-only, but it would be interesting to explore
work in two ways: (1) they only provide eventual consistency whether MuCache can be extended with an adaptive moni-
that violates dependencies when there are multiple paths be- toring mechanism that only enables caching if the read-write
tween two services (see Fig. 4); and (2) they do not support ratio of a service is above some threshold.
true multi-threading, as Noria limits writes to a single thread Sharding. MuCache requires hard affinity sharding of read
and Reactive Caching only supports single-threaded services. requests to ensure correctness, i.e., all read-only calls with
Diamond is a system that automates data management for the same arguments need to be processed by the same shard.
distributed reactive applications by providing reactive trans- Write requests have no such limitation and can be dispatched
actions to clients. Similarly to MuCache, Diamond reactively to any shard. An interesting avenue for future research would
informs clients about data invalidations in the backend store, be to lift the requirement for hard affinity, allowing for more
but in contrast to MuCache it does not support service graphs. flexible load balancing and autoscaling.

9 Discussion and Limitations Acknowledgments

Supporting transactions and non-KV stores. Our imple- We thank the NSDI 24 and SOSP 23 reviewers, our shep-
mentation does not currently support transactions or non-KV herd, Gábor Rétvári, as well as Achilles Benetopoulos, Akis
stores. Supporting single-service transactions would require Giannoukos, Jiali Xing, Nathaniel Hoaglund, and Nikos Vasi-
that the wrappers perform the postWrite after the transaction lakis, for discussions and feedback on the paper. This work
has completed to overapproximate the time that the write was partially supported by NSF awards CCF 2124184, CNS
operation completed. Supporting multi-service transactions 2107147, and CNS 2321726.

232 21st USENIX Symposium on Networked Systems Design and Implementation USENIX Association
References ceedings of the ACM SIGMOD Conference (SIGMOD),
2001.
[1] Caching Guidance - Azure Architecture [22] Jim Challenger, Arun Iyengar, and Paul Dantzig. A scal-
Center. https://learn.microsoft.com/enable system for consistently caching dynamic web data.
us/azure/architecture/best-practices/caching. In Proceedings of the IEEE International Conference
[2] CloudLab - A testbed for cloud computing research. on Computer Communications (INFOCOM), 1999.
https://www.cloudlab.us/. [23] Cloudflare. Edge and Browser Cache TTL. https:
[3] Dapr - Distributed Application Runtime. https:// //developers.cloudflare.com/cache/how-
dapr.io/. to/edge-browser-cache-ttl/, 2023.
[4] Envoy Proxy. https://www.envoyproxy.io/. [24] Yu Gan, Yanqi Zhang, Dailun Cheng, Ankitha Shetty,
[5] From Monolith to Microservices: How to Scale Your Ar- Priyal Rathi, Nayan Katarki, Ariana Bruno, Justin Hu,
chitecture. https://www.youtube.com/watch?v= Brian Ritchken, Brendon Jackson, Kelvin Hu, Meghna
N1BWMW9NEQc. Pancholi, Yuan He, Brett Clancy, Chris Colen, Fukang
[6] Istio Service Mesh. https://istio.io/latest/ Wen, Catherine Leung, Siyuan Wang, Leon Zaruvinsky,
about/service-mesh/. Mateo Espinosa, Rick Lin, Zhongling Liu, Jake Padilla,
[7] Kubernetes - An open-source container orchestration and Christina Delimitrou. An open-source benchmark
system. https://kubernetes.io/. suite for microservices and their hardware-software im-
[8] Online Boutique – Microservices Demo. plications for cloud & edge systems. In Proceedings of
https://github.com/GoogleCloudPlatform/ the International Conference on Architectural Support
microservices-demo. for Programming Languages and Operating Systems
[9] Redis - An open-source in-memory data store. https: (ASPLOS), 2019.
//redis.io/. [25] Jon Gjengset, Malte Schwarzkopf, Jonathan Behrens,
[10] Rutgers Social Network Graph. https:// Lara Timbó Araújo, Martin Ek, Eddie Kohler, M Frans
networkrepository.com/socfb-Rutgers89.php. Kaashoek, and Robert Tappan Morris. Noria: dynamic,
[11] The FNV Non-Cryptographic Hash Algorithm. partially-stateful data-flow for high-performance web
https://datatracker.ietf.org/doc/html/ applications. In Proceedings of the USENIX Sympo-
draft-eastlake-fnv-17.html. sium on Operating Systems Design and Implementation
[12] The Go programming language. https://go.dev/. (OSDI), 2018.
[13] The Movie Database. https://www.themoviedb. [26] Maurice P. Herlihy and Jeannette M. Wing. Linearizabil-
org/. ity: A correctness condition for concurrent objects. ACM
[14] Twitter’s recommendation algorithm. https:// Transactions on Programming Languages and Systems
github.com/twitter/the-algorithm. (TOPLAS), 12(3), July 1990.
[15] wrk2: A constant throughput, correct latency record- [27] Joydip Kanjilal. Scaling microservices architecture us-
ing variant of wrk. https://github.com/giltene/ ing caching. https://www.developer.com/design/scaling-
wrk2. microservices-using-cache/, 2021.
[16] ZeroMQ - An open-source universal messaging library. [28] Shutian Luo, Huanle Xu, Chengzhi Lu, Kejiang Ye,
https://zeromq.org/. Guoyao Xu, Liping Zhang, Yu Ding, Jian He, and
[17] Yehuda Afek, Geoffrey Brown, and Michael Merritt. Chengzhong Xu. Characterizing microservice depen-
Lazy caching. In ACM Transactions on Programming dency and performance: Alibaba trace analysis. In Pro-
Languages and Systems (TOPLAS), 1993. ceedings of the ACM Symposium on Cloud Computing
[18] AWS. Caching Best Practices. https://aws.amazon. (SOCC), 2021.
com/caching/best-practices/, 2023. [29] Jhonny Mertz and Ingrid Nunes. Understanding
[19] David Bermbach and Stefan Tai. Eventual consistency: application-level caching in web applications: a com-
How soon is eventual? An evaluation of Amazon S3’s prehensive introduction and survey of state-of-the-art
consistency behavior. In Workshop on Middleware for approaches. In ACM Computing Surveys (CSUR), 2017.
Service Oriented Computing (MW4SOC), 2011. [30] Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc
[20] Sebastian Burckhardt and Tim Coppieters. Reactive Kwiatkowski, Herman Lee, Harry C Li, Ryan McElroy,
caching for composed services: polling at the speed of Mike Paleczny, Daniel Peek, Paul Saab, et al. Scaling
push. In Proceedings of the ACM SIGPLAN Conference memcache at facebook. In Proceedings of the USENIX
on Object-Oriented Programming Systems, Languages Symposium on Networked Systems Design and Imple-
and Applications (OOPSLA), 2018. mentation (NSDI), 2013.
[21] K Selçuk Candan, Wen-Syan Li, Qiong Luo, Wang-Pin [31] Dan RK Ports, Austin T Clements, Irene Zhang, Samuel
Hsiung, and Divyakant Agrawal. Enabling dynamic Madden, and Barbara Liskov. Transactional consistency
content caching for database-driven web sites. In Pro- and automatic management in an application data cache.

USENIX Association 21st USENIX Symposium on Networked Systems Design and Implementation 233
In Proceedings of the USENIX Symposium on Operating
Systems Design and Implementation (OSDI), 2010.
[32] Ryan A. Rossi and Nesreen K. Ahmed. The network
data repository with interactive graph analytics and vi-
sualization. In Proceedings of the AAAI conference on
artificial intelligence (AAAI), 2015.
[33] Irfan Saleem, Pallavi Nargund, and Peter Buonora. Data
caching across microservices in a serverless architecture.
https://aws.amazon.com/blogs/architecture/data-
caching-across-microservices-in-a-serverless-
architecture/, 2008.
[34] Per Stenstrom. A survey of cache coherence schemes
for multiprocessors. In IEEE Computer, 1990.
[35] Paul Sweazey and Alan Jay Smith. A class of com-
patible cache consistency protocols and their support
by the ieee futurebus. In ACM SIGARCH Computer
Architecture News (SIGARCH), 1986.
[36] Alex Xu. Twitter architecture 2022 vs. 2012. what’s
changed over the past 10 years?, Nov 2022.
[37] Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike
Paleczny. Characterizing Facebook’s Memched Work-
load. In IEEE Internet Computing, 2013.
[38] Juncheng Yang, Yao Yue, and KV Rashmi. A large
scale analysis of hundreds of in-memory cache clusters
at Twitter. In Proceedings of the USENIX Symposium on
Operating Systems Design and Implementation (OSDI),
2020.
[39] Irene Zhang, Niel Lebeck, Pedro Fonseca, Brandon Holt,
Raymond Cheng, Ariadna Norberg, Arvind Krishna-
murthy, and Henry M Levy. Diamond: Automating data
management and storage for wide-area, reactive appli-
cations. In Proceedings of the USENIX Symposium on
Operating Systems Design and Implementation (OSDI),
2016.
[40] Zhizhou Zhang, Murali Krishna Ramanathan, Prithvi
Raj, Abhishek Parwal, Timothy Sherwood, and Milind
Chabbi. CRISP: Critical path analysis of Large-Scale
microservice architectures. In Proceedings of the
USENIX Annual Technical Conference (ATC), 2022.

234 21st USENIX Symposium on Networked Systems Design and Implementation USENIX Association
A Detailed Protocol Correctness {Calli (ca, i′ ) : ∀i, ca, i′ } that are events that are determined
by the program when processing a single request, and input
Preliminaries. We start with some basic notation: events ΣI = {Reqi (ca) : ∀i, ca} ∪ {Respi (v, i′ ) : ∀i, v, i′ }
• nS denotes a service name that are events that are given as inputs to the processing of a
single request. Finally, we can define the set of client events
• ca denotes the arguments of a service call, including ΣC = {Reqi (v) : ∀i, client(i)} ∪ {Reti (v) : ∀i, client(i)}
the name of the service (which can be extracted using We can now describe complete executions of microservice
name(ca)) and the endpoint. applications using traces t, i.e., sequences of the above events.
We can project all events of a trace t from a particular set Σ
• i ∈ R denotes request identifiers (each request has a using t[Σ], e.g., t[ΣW ] are all the write events in a trace. Note
unique i). The service name and the arguments to the that this projection creates an ordered sequence of events by
request can be extracted using name(i) and ca(i). We maintaining the trace order.
will define a binary relation sr ⊆ R × R that determines
Applications and Assumptions. We can now define the be-
when a request is spawned by another request. We can
havior of a microservice application P ∈ P using its execu-
also define sr∗ as the reflexive transitive closure of sr.
tion traces, JP K ⊆ Σ∗ , and state some assumptions on these
There is also a client(i) predicate, which returns true for
traces. First of all, an application determines the processing
requests that are initiated by a client.
of each request using the step : P × R × Σ∗ × (ΣO ∪ {⊥})
• v denotes a return value relation, that determines the next step of the processing of a
request, or ⊥ if the request is waiting for a response or hasn’t
• k denotes a key that indexes values in the state of a started yet. Now we define what it means for a trace to be
service well formed.
• rs(i, t) is a function that returns all of the keys that a Property 2 (Well-formed traces). All traces t ∈ JP K are well-
particular request (and all of its subrequests) have read formed, i.e., for each trace t the following properties hold:
in trace t. We will often ignore t when it is obvious (1) Reqi (ca) are the first events for any request i and Reti (v)
which trace we refer to. are the last; (2) for each i ∈ t there exists a unique Reqi (ca)
and at most one Reti (v); (3) a Reqi (ca) always comes after
Events and traces. We will describe microservice applica- a Calli (ca, i′ ) except in the case of client requests; (4) a
tions and their executions using traces, i.e., sequences of Respi (v, i′ ) always comes after a Calli (ca, i′ ) and Reti′ (v);
events that describe application actions. We are only inter- (5) for all Calli (ca, i′ ), sr(i, i′ ); and for all prefixes t′ =
ested in events that describe interactions between services t0 .e with e ∈ ΣI , either step(P, i, t0 , e) or step(P, i, t0 , ⊥);
and other services and actions on their states. We call the set (6) for all e ∈ ΣC for any i ∈ t, s.t. client(i) holds, then
of all events Σ, and we now define all events in it. ∄Calli′ (ca′ , i) ∈ ΣC .
• Reqi (ca) denotes the start of processing of a single re- The last requirement relates the step relation with the traces,
quest with id i and arguments ca. i.e., each event in the trace is the result of stepping a request
• Reti (v) denotes that a request with id i has finished or a request start or response. We also know that the events in
processing and is returning value v. a trace are equivalent up to an injective renaming of request
identifiers.
• Readi (k, v) denotes that request with id i performed a
read of key k from its state and returned v. Property 3. For any microservice application P , for all
traces t ∈ JP K and for all i ∈ t, then for any i′ ∈ / t we can
• Writei (k, v) denotes that request with id i performed a construct a new trace t′ = t [i 7→ i′ ], s.t. t′ ∈ JP K.
write with value v to key k of its state.
In addition to the above, we also know that requests are
′ always enabled in microservice applications, i.e., a pending
• Calli (ca, i ) denotes that request with id i performed a
call to another service with arguments ca and the request request can always take a step.
id of that internal request is i’.
Definition 1 (Pending Requests). We say that a request
• Respi (v, i′ ) denotes that request with id i received a Reqi (ca) is pending in a trace t iff Reti (v) does not exist
response with value v from a finished call with id i’. in t.

We represent the set of all events for a request with identi- Property 4 (Request Step Always Enabled). For any mi-
fier i as Σi and the set of all read (or write) events as ΣR (or croservice application P , for all traces t ∈ JP K, and for all
ΣW ). We also define a set of output events ΣO = {Reti (v) : pending requests Reqi (ca) for some i, there exists a trace
∀i, v} ∪ {Readi (k, v) : ∀i, k, v} ∪ {Writei (k, v) : ∀i, k, v} ∪ t′ ∈ JP K such that t′ = t.ei , where ei ∈ Σi′ and sr∗ (i, i′ ).

USENIX Association 21st USENIX Symposium on Networked Systems Design and Implementation 235
Property 4 means that requests are always enabled to take protocol does not affect the stepping of requests other than
a step, sometimes through their subrequests. This is a valid allowing some calls to return immediately with call hits. We
assumption for microservice applications since they are mul- can also lift the step relation to account for cache-enabled
tithreaded, and therefore a single request can not block other applications. The lifted step relation describes the logic of our
requests from proceeding, and a request can only block while cache coherence protocol.
waiting for a response from its subrequests. Note that this
Property 6 (Cache Stepping). For any application P the
assumption requires that the network does not drop requests,
transformed Pe can step, i.e. step(Pe, i, t, e) holds, if
i.e., calls eventually lead to request starts and that returns
eventually lead to response events. • step(P, i, t, e) when e ∈ Σ or
We also know that the values of read events depend on the
latest write to the same key or the original value. • e = Save(nS , i′ , v) and ∃Reti′ (v) ∈ t or

Property 5 (Read return value). For all applications P , re- • e = Inv(nS , i′ , i′′ ) and ∃Writei′ (k, v) ∈ t with k ∈
quest identifiers i and traces t s.t. step(P, i, t, Readi (k, v)) rs(i′′ ).
holds, then either ∃i′ , Writei′ (k, v) = last(t[ΣW (k) ]) or • e = Inv(nS , i′ , i′′ ) and ∃Inv(nS , i′ , i′′′ ) ∈ t with
v = ⊥. ca(i′′′ ) ∈ rs(i′′ ).
Intuitively, this means that writes are immediately visible to • e = CacheHiti (ca, v) and ∃Save(name(i), i′ , v) ∈ t
reads, thus that the underlying stores are linearizable, which and ∄Inv(name(i), i′′ , i′′′ ) ∈ t and ca = ca(i′ ) =
is a valid assumption for most key value stores. ca(i′′′ ) between the save and the cache-hit.
We can now define read-only calls, that is calls that never
perform writes (even in their subrequests). Intuitively, Property 6 means that the cache-enabled appli-
cation does not affect the next steps of any specific request
Definition 2 (Read-only requests). Given an application P other than sometimes finding a result in the cache.
a request with request identifier i and call arguments ca,
i.e. ca(i) = ca, is read-only for this application iff for all Definition 3 (Dependency). We say that event e′ ∈ Σi′ is a
traces t ∈ JP K, and for all i′ such that sr∗ (i, i′ ), it holds that dependency of e ∈ Σi in a trace t if e′ is after e and if either:
t[ΣW ∩ Σi′ ] = ∅. We define a predicate RO(i) that holds for • i = i′ , i.e. the two events are part of the same request
read-only requests.
• e = Calli (ca, i′ ) and e′ = Reqi′ (ca) i ̸= i′ and
State. We represent the state of an application as σ ∈ D. sr∗ (i, i′ ), i.e., the second event is a part of a subrequest
Concretely, a state σ is a tuple of maps from keys to values, of the first event
one for each service. We define the function S : Σ∗ → D
that returns the state of an application after the trace t. Due to • e = Reti (v) and e′ = Respi′ (v, i), i.e., the events are a
Property 5 the state at each point in the execution depends on pair of return and handle response.
the prefix of write events and the starting state. We assume • e = Writei (k, v) and e′ = Readi′ (k, v ′ ) or e =
that all executions start from the same starting state σ0 . Writei (k, v) and e′ = Writei′ (k, v ′ ) or e = Readi (k, v)
Caching. Up to this point we have established all important and e′ = Writei′ (k, v ′ ), i.e., read and write events to
properties of microservice applications without mentioning a key k are dependencies of a prior write to the k and
caches. A cache-enabled application Pe can be similarly de- write events are dependencies of a prior read.
fined by its execution traces, JPeK ⊆ Σ e ∗ , where Σ
e ∗ is a su-
perset of the set of events of applications without caches, i.e. • e = Reti (v) and e′ = Save(nS , i, v) for some nS
Σ ⊆ Σ.e The additional cache related events are defined as • e = Writei (k, v) and e′ = Inv(nS , i, i′ ) for some nS
followed:
• e = Inv(nS , i′ , i′′′ ) and e′ = Inv(nS , i′ , i′′ ) with
• CacheHiti (ca, v) denotes a cache-hit that replaces a ca(i′′′ ) ∈ rs(i′′ ).
Respi′ (v, i) for some i′ (also conforming to its well-
formedness conditions Property 2). • e = Save(name(i), i′ , v) and e′ = CacheHiti (ca, v)
where ca(i′ ) = ca
• Save(nS , i, v) denotes that the cache of service nS has
saved the value v for request i with call arguments ca(i). • e = Save(nS , i′ , v) and e′ = Inv(nS , i, i′′ ) for some i
and ca(i′ ) = ca(i′′ )
• Inv(nS , i, i′ ) denotes an invalidation of the cache of ser-
vice nS with ca(i′ ) from a write with identifier i. We will use deps(e) to refer to all the transitive depen-
dencies of an event e. We now state a final assumption on
Essentially, a cache-enabled application Pe is a transformation application traces, namely that two independent events can
of a regular microservice application P . We know that our be commuted.

236 21st USENIX Symposium on Networked Systems Design and Implementation USENIX Association
Property 7 (Commute independent events). For any trace t ∈ For this cache-hit to have happened, the step relations implies
JP K with t = t0 .e.e′ .t1 and e′ ̸∈ deps(e), then t′ = t0 .e′ .e.t1 that there must exist some Save(nS , i′ , v) before it, such that
can also be observed by the application, i.e. t′ ∈ JP K. name(i′ ) = nS . Similarly, for the cache save to have happened,
there must have been a completed request with call arguments
This holds because in microservice applications indepen- ca = ca(i′ ).
dent requests do not affect each other except through reads
and writes to the same key in the same service datastore. t = · · · .Reqi′ (ca)|σ1 . · · · .Reti′ (v)|σ2 . · · · .
We are now ready to state the main theorem that describes · · · .Save(nS , i′ , v)|σ3 . · · · |σn .CacheHiti (ca, v)
the correctness of our protocol.
Given Property 2 (extended in a straightforward way to sup-
Theorem 8 (Protocol Correctness (corresponds to Theo-
port cache events), we know that t[Σi′ ] can be produced by
rem 1)). For all traces t in a cache-enabled application JPeK,
the step relation. The inductive hypothesis and the fact that t
there exists a trace t’ in the original application without
is finite ensure the equivalence of the traces even in the pres-
caches JP K, such that their respective client events are equiv-
ence of cache-hits for subrequests of the original request. We
alent (but potentially reordered), i.e., ∀i t[ΣC(i) ] = t′ [ΣC(i) ].
will now do a case analysis on the existence of a Write(k, v1 )
This makes sense because correctness is only relevant from where k ∈ rs(i′ ) between Reqi′ (ca) and CacheHiti (ca, v).
the perspective of the clients and not all of the internal events No such write exists. If no such write exists, then σ1 |rs(i′ ) =
that an application performs. Actually, the cache implemen- σ2 |rs(i′ ) = . . . = σn |rs(i′ ) . Then, we can construct a trace
tation does not contain the same traces because some calls t1 ∈ JP K using the inductive hypothesis and by replacing
return immediately on cache-hits without triggering all the CacheHiti (ca, v) with Calli (ca, i′′ ) for some fresh i′′ (due to
internal events. In order to prove this theorem, we show that Property 6).
something stronger holds, a lemma that is stated below. Be- t1 = . . . |σn .Calli (ca, i′′ )
fore stating it, we need to define what it means for an event
in the cache-enabled event set to be equivalent to the original Then, given that σ1 |rs(i′ ) = σn |rs(i′ ) and that Properties 5 and
one. 3 hold, we can construct the same request steps tc as in the
original trace (t[Σi′ ] [i′ 7→ i′′ ]) using the step relation, ending
Definition 4 (Equivalent events). Equivalence between a up with a trace t2 ∈ P such that:
cache-enabled event ec and an original event e (denoted with
ec ≃ e) is defined as followed: t2 = t1 .tc .Respi (v, i′′ )

• if e ∈ Σ and ec ∈ Σ are the same event or Since CacheHiti (ca, v) ≃ Respi (v, i′′ ) and read-only re-
quests do not modify the state, we are done with this case.
• ec = CacheHiti (ca(i′ ), v) and e = Respi (v, i′ )
Write exists. We now need to focus on the case where a
We can lift the equivalence relation of events to account write Write(k, v1 ) with k ∈ rs(i′ ) exists between Reqi′ (ca)
for sequence of events in a straightforward way. and CacheHiti (ca, v). We can first show that the write is be-
tween Save(nS , i′ , v) and CacheHiti (ca, v), because if it was
Lemma 1. Given an arbitrary trace t ∈ JPeK we can construct earlier, it would have been processed by the cache manager,
a trace t′ ∈ JP K such that (i) the states at the end of the traces prohibiting Save(nS , i′ , v) to have happened. However, there
are the same for both traces, i.e. S(t) = S(t′ ), and (ii) for could be an invalidate between the cache save and the cache-
all i, t[Σi ] ≃ t′ [Σi ] modulo the missing events due to the hit that has originated from a previous write in another service
cache-hits. between Reqi′ (ca) and Save(nS , i′ , v). We can now use Prop-
erty 7 to move all writes together with their dependencies to
At a high-level the proof proceeds by constructing a t’ from the end of the trace to get a trace tw .
t in the missing events and also by moving some write events
later in the trace. Theorem 8 follows directly from Lemma 1 tw = · · · .Save(nS , i′ , v)|σ3 · · · |σn .CacheHiti (ca, v). · · · .twd
since client events will be the same in both traces.
Proof sketch. We will proceed by induction on the size of where twd contains all the writes and their dependencies. This
traces and for the inductive case we will focus on the only is possible because CacheHiti (ca, v) is not a dependency of
interesting scenario where the trace t ends with a cache-hit the writes between the save and the cache-hit; if it was, there
event CacheHiti (ca, v), because these are the only events for must have been a subcall to the service where the write hap-
which the effects of our cache-subsystem are observed by the pened, which would have been caught by our dependency
rest of the application. For illustrative purposes we extend tracking (see Section 4). Second, all i′ events are not depen-
traces with the state of all services σn between each event. dencies of the writes between Reqi′ (ca) and Save(nS , i′ , v)
because (1) i′ is read-only (so it cannot have performed those
t = t0 |σn .CacheHiti (ca, v) writes or subcalls that performed those writes), and (2) the

USENIX Association 21st USENIX Symposium on Networked Systems Design and Implementation 237
F IGURE 20—A bug that would occur if preReqStart does not
wait until the Start event is added in the CM workqueue.

write happened after the call to start i′ .

We can now follow the same reasoning as in the no-write-
exists case. We first use the inductive hypothesis on the prefix
until the cache-hit and Property 6 to get the following trace:

t1 = · · · |σn .Calli (ca, i′ )

We then construct the original trace that caused the save using
the step relation (like in the no-write-exists case) to get

t2 = · · · |σn .Calli (ca, i′ ). · · · .Respi (v, i′ )

Finally, given that both prefixes and states are the same for tw
and t2 , we can use Property 4 to step all the writes and their
dependencies to acquire the same exactly events as the suffix
of tw , proving that the states are the same and the traces for
each request in the end are equivalent.

B MuCache protocol design details

B.1 Waiting for events to be added in the queue
It is crucial that the caller waits until the event is added to
the queue when sending a Start message, otherwise the bug
shown in Figure 20 could occur. In this scenario, thread T1
of a service S starts processing a RO request ca before wait-
ing for the Start(ca) event to be added to the workqueue.
In the meantime, another thread T2 of S performs a write
which invalidates the results of the call ca. However, since the
Start(ca) event was added in the cache manager workqueue
after the Inv(k), the cache manager does not detect the inval-
idation.

238 21st USENIX Symposium on Networked Systems Design and Implementation USENIX Association

Documentum Server 16.4 DQL Reference Guide
No ratings yet
Documentum Server 16.4 DQL Reference Guide
412 pages
Sap Abap Cloud
No ratings yet
Sap Abap Cloud
8 pages
Service Mesh Primer 200205003805 PDF
No ratings yet
Service Mesh Primer 200205003805 PDF
43 pages
Synchronous and Asynchronous Communication
No ratings yet
Synchronous and Asynchronous Communication
6 pages
Questions In MicroServices
No ratings yet
Questions In MicroServices
8 pages
20230314 Microservices communication_ Fetching data from another service _ by Oleg Potapov _ Mar, 2023 _ Dev Genius
No ratings yet
20230314 Microservices communication_ Fetching data from another service _ by Oleg Potapov _ Mar, 2023 _ Dev Genius
9 pages
The Ultimate Guide to Unlocking the Full Potential of Cloud Services: Tips, Recommendations, and Strategies for Success
From Everand
The Ultimate Guide to Unlocking the Full Potential of Cloud Services: Tips, Recommendations, and Strategies for Success
Rick Spair
No ratings yet
Cloud Computing: Harnessing the Power of the Digital Skies: The IT Collection
From Everand
Cloud Computing: Harnessing the Power of the Digital Skies: The IT Collection
Christopher Ford
No ratings yet
Istio Service Mesh Summary1 8th April 2023
No ratings yet
Istio Service Mesh Summary1 8th April 2023
4 pages
Microservice Patterns
No ratings yet
Microservice Patterns
8 pages
The Spring Cloud Handbook: Practical Solutions for Cloud-Native Architecture
From Everand
The Spring Cloud Handbook: Practical Solutions for Cloud-Native Architecture
Robert Johnson
No ratings yet
Download Full Service Mesh The New Infrastructure for Microservices 1st Edition Eberhard Wolff PDF All Chapters
100% (6)
Download Full Service Mesh The New Infrastructure for Microservices 1st Edition Eberhard Wolff PDF All Chapters
40 pages
Cloud Computing Essentials: A Practical Guide with Examples
From Everand
Cloud Computing Essentials: A Practical Guide with Examples
William E. Clark
No ratings yet
CCSP - Certified Cloud Security Professional Exam Success
From Everand
CCSP - Certified Cloud Security Professional Exam Success
SUJAN
No ratings yet
Microservice Architecture
100% (2)
Microservice Architecture
53 pages
Immediate download (Ebook) Service Mesh: The New Infrastructure for Microservices by Eberhard Wolff; Hanna Prinz ISBN 9783982112619, 3982112613 ebooks 2024
100% (3)
Immediate download (Ebook) Service Mesh: The New Infrastructure for Microservices by Eberhard Wolff; Hanna Prinz ISBN 9783982112619, 3982112613 ebooks 2024
81 pages
Introduction To Microservices: Architecture Principles, How To, Patterns, Examples
No ratings yet
Introduction To Microservices: Architecture Principles, How To, Patterns, Examples
78 pages
Whitepaper Service Mesh and API Management
No ratings yet
Whitepaper Service Mesh and API Management
18 pages
Get Service Mesh The New Infrastructure for Microservices 1st Edition Eberhard Wolff free all chapters
100% (1)
Get Service Mesh The New Infrastructure for Microservices 1st Edition Eberhard Wolff free all chapters
65 pages
OASIcs Microservices 2017 2019 4
No ratings yet
OASIcs Microservices 2017 2019 4
17 pages
Mastering Cloud Computing With Best Practices
From Everand
Mastering Cloud Computing With Best Practices
Manish Soni
No ratings yet
ScalableMicroservicesatNetflix-1
No ratings yet
ScalableMicroservicesatNetflix-1
82 pages
Mainframe Meets Modernization: Mastering Hybrid Cloud Design: Mainframes
From Everand
Mainframe Meets Modernization: Mastering Hybrid Cloud Design: Mainframes
Ricardo Nuqui
No ratings yet
Kubernetes from basic to advanced levels
From Everand
Kubernetes from basic to advanced levels
Alex Carvalho
No ratings yet
Microservices
No ratings yet
Microservices
5 pages
3.technologies For Distributed Computing
No ratings yet
3.technologies For Distributed Computing
39 pages
Microservices PDF
No ratings yet
Microservices PDF
6 pages
Micro Services
No ratings yet
Micro Services
6 pages
Dancing on a Cloud: A Framework for Increasing Business Agility
From Everand
Dancing on a Cloud: A Framework for Increasing Business Agility
David Sterling
No ratings yet
Alibaba Summary
No ratings yet
Alibaba Summary
9 pages
911523405002.PPT
No ratings yet
911523405002.PPT
18 pages
Microservices
No ratings yet
Microservices
12 pages
Simulation of A Big Number of Microservices in A Highly Distributed Vast Network
No ratings yet
Simulation of A Big Number of Microservices in A Highly Distributed Vast Network
9 pages
Cloud Computing For Noobs
From Everand
Cloud Computing For Noobs
Silas Meadowlark
No ratings yet
The Architectural Implications of Cloud Microservices: Yu Gan and Christina Delimitrou
No ratings yet
The Architectural Implications of Cloud Microservices: Yu Gan and Christina Delimitrou
4 pages
02 Microservice Design and Architecture
No ratings yet
02 Microservice Design and Architecture
43 pages
Mastering Microservices with Java and Spring Boot: Unlock the Secrets of Expert-Level Skills
From Everand
Mastering Microservices with Java and Spring Boot: Unlock the Secrets of Expert-Level Skills
Larry Jones
No ratings yet
Real-Time Phoenix: Building Scalable Elixir Applications with Live Updates and WebSocket Streams
From Everand
Real-Time Phoenix: Building Scalable Elixir Applications with Live Updates and WebSocket Streams
Sam Stevenson
No ratings yet
AI-Driven Web Apps: Practical Machine Learning for Software Developers
From Everand
AI-Driven Web Apps: Practical Machine Learning for Software Developers
Sivaramarajalu Ramadurai Venkataraajalu
No ratings yet
Lecture 11 - Microservices
No ratings yet
Lecture 11 - Microservices
33 pages
Microservices: Build, Design And Deploy Distributed Services
From Everand
Microservices: Build, Design And Deploy Distributed Services
Rob Botwright
No ratings yet
Microf i
No ratings yet
Microf i
18 pages
Microservices FlashCard
No ratings yet
Microservices FlashCard
14 pages
Micro Services
No ratings yet
Micro Services
1 page
Chen 2017 J. Phys. Conf. Ser. 910 012016
No ratings yet
Chen 2017 J. Phys. Conf. Ser. 910 012016
7 pages
02 Microservice Design and Architecture
No ratings yet
02 Microservice Design and Architecture
43 pages
Microservice Design and Architecture: Priyanka Vergadia Developer Advocate, Google Cloud
No ratings yet
Microservice Design and Architecture: Priyanka Vergadia Developer Advocate, Google Cloud
43 pages
Edge Cloud Operations: A Systems Approach
From Everand
Edge Cloud Operations: A Systems Approach
Larry L Peterson
No ratings yet
A Concise Guide to Microservices for Executive (Now for DevOps too!)
From Everand
A Concise Guide to Microservices for Executive (Now for DevOps too!)
alasdair gilchrist
1/5 (1)
Intro To MicroServices
100% (4)
Intro To MicroServices
109 pages
Cloud Brokering
From Everand
Cloud Brokering
Felipe Díaz-Sánchez
No ratings yet
02 Microservice Design and Architecture
No ratings yet
02 Microservice Design and Architecture
43 pages
Unit 7 cc
No ratings yet
Unit 7 cc
17 pages
Essays on Infrastructure-as-code
From Everand
Essays on Infrastructure-as-code
Ravi Rajamani
No ratings yet
Microservices All in One Platform
No ratings yet
Microservices All in One Platform
20 pages
Computer Science Self Management: Fundamentals and Applications
From Everand
Computer Science Self Management: Fundamentals and Applications
Fouad Sabry
No ratings yet
NF Registration
No ratings yet
NF Registration
18 pages
CCSP - Certified Cloud Security Professional Exam Insights
From Everand
CCSP - Certified Cloud Security Professional Exam Insights
SUJAN
No ratings yet
For Speed and Agility
No ratings yet
For Speed and Agility
14 pages
The System Design
No ratings yet
The System Design
135 pages
Microservices Architecture
No ratings yet
Microservices Architecture
2 pages
Implementing Linkerd Service Mesh
From Everand
Implementing Linkerd Service Mesh
Kimiko Lee
No ratings yet
LeetCode Round 2
No ratings yet
LeetCode Round 2
3 pages
Genetic Algorithm Implementation in Python - by Ahmed Gad - Towards Data Science
No ratings yet
Genetic Algorithm Implementation in Python - by Ahmed Gad - Towards Data Science
18 pages
Apache Struts: Processing Requests With Action Objects
No ratings yet
Apache Struts: Processing Requests With Action Objects
25 pages
SQL MT SQL Server
No ratings yet
SQL MT SQL Server
8 pages
IT402
No ratings yet
IT402
4 pages
Chapter 1 – Review of Objet Oriented Programming
No ratings yet
Chapter 1 – Review of Objet Oriented Programming
51 pages
Exam Prep 310
No ratings yet
Exam Prep 310
3 pages
Alv Functionality
No ratings yet
Alv Functionality
8 pages
LR Parsers (SLR, LALR, and Canonical LR Parser)
No ratings yet
LR Parsers (SLR, LALR, and Canonical LR Parser)
4 pages
LOR and Sop Samples
No ratings yet
LOR and Sop Samples
4 pages
TAFJ tLockManager
No ratings yet
TAFJ tLockManager
24 pages
Addis Ababa University, Amist Kilo July 4, 2011 Algorithms and Programming For High Schoolers
No ratings yet
Addis Ababa University, Amist Kilo July 4, 2011 Algorithms and Programming For High Schoolers
3 pages
PHP Mysql Final
No ratings yet
PHP Mysql Final
151 pages
Programs-Recursion
No ratings yet
Programs-Recursion
5 pages
Sync and Asyc FIFO
No ratings yet
Sync and Asyc FIFO
3 pages
QConLondon2014 AdrianCockcroft MigratingtoMicroservices
No ratings yet
QConLondon2014 AdrianCockcroft MigratingtoMicroservices
38 pages
Learn Python The Right Way
No ratings yet
Learn Python The Right Way
300 pages
CSESyllabus
No ratings yet
CSESyllabus
39 pages
Operating System Module III
No ratings yet
Operating System Module III
22 pages
The Guerrilla Guide To Interviewing
No ratings yet
The Guerrilla Guide To Interviewing
6 pages
fsd_module_9
No ratings yet
fsd_module_9
8 pages
PPF Implementation Guide
82% (11)
PPF Implementation Guide
52 pages
UGRD-ITE6100B Fundamentals of Database System-Prelim LAB ExamQuiz
No ratings yet
UGRD-ITE6100B Fundamentals of Database System-Prelim LAB ExamQuiz
4 pages
Slowly Changing Dimentions (SCD) - Type 1, Type 2, Type 3
No ratings yet
Slowly Changing Dimentions (SCD) - Type 1, Type 2, Type 3
3 pages
Lecture Week 4: Algorithm Complexity
No ratings yet
Lecture Week 4: Algorithm Complexity
34 pages
Dump Raw
No ratings yet
Dump Raw
198 pages
Sapmm - Spro .Path Settings
No ratings yet
Sapmm - Spro .Path Settings
2 pages
Tib Amx BPM Development
No ratings yet
Tib Amx BPM Development
820 pages

Mu Cache

Uploaded by

Mu Cache

Uploaded by

MuCache: a General Framework for Caching

This paper is included in the

Open access to the Proceedings of the

Abstract such calls whenever possible. One way to do so is by having

implementing simple TTL-based eviction mechanisms [1, 27,

the call-graph varies per-request and is not known a priori. Page

F IGURE 10—Latency and throughput of real-world applications (described in Figure 8).

7.3.1 Real-world applications 100

6 Throughput (krps) 19.2 26.5 44.5 74.7 75.2 74.6

Fr F IGURE 19—Invalidation time for different chain sizes.

9 Discussion and Limitations Acknowledgments

write happened after the call to start i′ .

t1 = · · · |σn .Calli (ca, i′ )

t2 = · · · |σn .Calli (ca, i′ ). · · · .Respi (v, i′ )

B MuCache protocol design details

You might also like