Scaling Your Cache And Caching At Scale

Scaling Your Cache
& Caching at Scale

Alex Miller
@puredanger

Mission
• Why does caching work?
• What’s hard about caching?
• How do we make choices as we
design a caching architecture?
• How do we test a cache for
performance?

Memory Hierarchy
Clock cycles to access

Register 1

L1 cache 3

L2 cache 15

RAM 200

Disk 10000000

Remote disk 1000000000

1E+00 1E+01 1E+02 1E+03 1E+04 1E+05 1E+06 1E+07 1E+08 1E+09

Facts of Life
Register Fast Small Expensive
L1 Cache
L2 Cache
Main Memory
Local Disk
Remote Disk Slow Big Cheap

Temporal Locality
Hits: 0%
Cache:

Stream:

Temporal Locality
Hits: 0%
Cache:

Stream:

Stream:

Cache:
Hits: 65%

Non-uniform distribution
Web page hits, ordered by rank
3200 100%

2400 75%

1600 50%

800 25%

0 0%
Page views, ordered by rank

Pageviews per rank
% of total hits per rank

Temporal locality
+
Non-uniform
distribution

17000 pageviews
assume avg load = 250 ms

cache 17 pages / 80% of views
cached page load = 10 ms
new avg load = 58 ms

trade memory for
latency reduction

The hidden beneﬁt:
reduces database load

Memory Database

sio ning
i
rov
er p
o f ov
line

A brief aside...

• What is Ehcache?
• What is Terracotta?

Ehcache Example

CacheManager manager = new CacheManager();
Ehcache cache = manager.getEhcache("employees");
cache.put(new Element(employee.getId(), employee));
Element element = cache.get(employee.getId());

<cache name="employees"
maxElementsInMemory="1000"
memoryStoreEvictionPolicy="LRU"
eternal="false"
timeToIdleSeconds="600"
timeToLiveSeconds="3600"
overflowToDisk="false"/>

Terracotta

App Node App Node App Node App Node

Terracotta Terracotta
Server Server

App Node App Node App Node App Node

But things are not
always so simple...

Pain of Large
Data Sets
• How do I choose which
elements stay in memory
and which go to disk?
• How do I choose which
elements to evict when I
have too many?
• How do I balance cache size
against other memory uses?

Eviction
When cache memory is full, what do I do?
• Delete - Evict elements
• Overﬂow to disk - Move to slower,
bigger storage

• Delete local - But keep remote data

Eviction in Ehcache

Evict with “Least Recently Used” policy:
eternal="false"

Spill to Disk in Ehcache
Spill to disk:
<diskStore path="java.io.tmpdir"/>

eternal="false"

overflowToDisk="true"
maxElementsOnDisk="1000000"
diskExpiryThreadIntervalSeconds="120"
diskSpoolBufferSizeMB="30" />

Terracotta Clustering
Terracotta conﬁguration:
<terracottaConfig url="server1:9510,server2:9510"/>

eternal="false"
overflowToDisk="false">

<terracotta/>
</cache>

Pain of Stale Data
• How tolerant am I of seeing
values changed on the
underlying data source?
• How tolerant am I of seeing
values changed by another
node?

Expiration

TTI=4

0 1 2 3 4 5 6 7 8 9

TTL=4

TTI and TTL in Ehcache

eternal="false"

Replication in Ehcache
<cacheManagerPeerProviderFactory
class="net.sf.ehcache.distribution.
RMICacheManagerPeerProviderFactory"
properties="hostName=fully_qualified_hostname_or_ip,
peerDiscovery=automatic,
multicastGroupAddress=230.0.0.1,
multicastGroupPort=4446, timeToLive=32"/>

<cache name="employees" ...>
<cacheEventListenerFactory
class="net.sf.ehcache.distribution.RMICacheReplicatorFactory”
properties="replicateAsynchronously=true,
replicatePuts=true,
replicatePutsViaCopy=false,
replicateUpdates=true,
replicateUpdatesViaCopy=true,
replicateRemovals=true
asynchronousReplicationIntervalMillis=1000"/>
</cache>

Terracotta Clustering
Still use TTI and TTL to manage stale data
between cache and data source

Coherent by default but can relax with
coherentReads=”false”

Pain of Loading
• How do I pre-load the cache on startup?
• How do I avoid re-loading the data on every
node?

Persistent Disk Store
<diskStore path="java.io.tmpdir"/>

eternal="false"
overflowToDisk="true"
maxElementsOnDisk="1000000"
diskExpiryThreadIntervalSeconds="120"
diskSpoolBufferSizeMB="30"

diskPersistent="true" />

Bootstrap Cache Loader

Bootstrap a new cache node from a peer:
<bootstrapCacheLoaderFactory
class="net.sf.ehcache.distribution.
RMIBootstrapCacheLoaderFactory"
properties="bootstrapAsynchronously=true,
maximumChunkSizeBytes=5000000"
propertySeparator=",” />

On startup, create background thread to pull
the existing cache data from another peer.

Terracotta Persistence
Nothing needed beyond setting up
Terracotta clustering.

Terracotta will automatically bootstrap:
- the cache key set on startup
- cache values on demand

Pain of Duplication
• How do I get failover capability while avoiding
excessive duplication of data?

Partitioning + Terracotta
Virtual Memory
• Each node (mostly) holds data it has seen
• Use load balancer to get app-level partitioning
• Use fine-grained locking to get concurrency
• Use memory flush/fault to handle memory
overflow and availability
• Use causal ordering to guarantee coherency

Scalability Continuum
causal
ordering YES NO NO YES YES
2 or more 2 or more
2 or more JVMS JVMSmore
2 or
2 or more big JVMs 2 or more
JVMs
2 or more 2 or more 2 or more
JVMs 2 or more
JVMs
JVMs of
lots
# JVMs 1 JVM
2 or more
JVMS
JVMs
2 or more
JVMS
JVMs JVMs
JVMs of
lots
JVMs

Terracotta
runtime Ehcache
Ehcache
RMI
Ehcache
disk store OSS Terracotta FX Terracotta FX
Ehcache FX Ehcache FX

Ehcache DX Ehcache EX and FX
management management
and control and control

more scale

21

Know Your Use Case
• Is your data partitioned (sessions) or
not (reference data)?
• Do you have a hot set or uniform
access distribution?
• Do you have a very large data set?
• Do you have a high write rate (50%)?
• How much data consistency do you
need?

Types of caches
Name Communication Advantage

Broadcast multicast low latency
invalidation
Replicated multicast offloads db

Datagrid point-to-point scalable

Distributed 2-tier point-to- all of the above
point

Common Data Patterns
I/O pattern Locality Hot set Rate of
change
Catalog/ low low low
customer
Inventory high high high
Conversations high high low

Catalogs/customers Inventory Conversations
• warm all the • ﬁne-grained • sticky load
data into locking balancer
cache • write-behind to DB • disconnect
• High TTL conversations from
DB

Build a Test

• As realistic as possible
• Use real data (or good fake data)
• Verify test does what you think
• Ideal test run is 15-20 minutes

Cache Warming

• Explicitly record cache warming or
loading as a testing phase
• Possibly multiple warming phases

Things to Change
• Cache size
• Read / write / other mix
• Key distribution
• Hot set
• Key / value size and structure
• # of nodes

Things to Measure

• Application throughput (TPS)
• Application latency
• OS: CPU, Memory, Disk, Network
• JVM: Heap, Threads, GC

Benchmark and Tune

• Create a baseline
• Run and modify parameters
• Test, observe, hypothesize, verify
• Keep a run log

Pushing It
• If CPUs are not all busy...
• Can you push more load?
• Waiting for I/O or resources
• If CPUs are all busy...
• Latency analysis

I/O Waiting
• Database
• Connection pooling
• Database tuning
• Lazy connections
• Remote services

Locking and
Concurrency
Threads Locks
Key Value
1

ge
2

t2 3

get 2 4

5

6

put 8
7

8

12
9

ut
10

p 11

12

13

14

15

16

Locking and
Concurrency
Threads Locks
Key Value

get 2 1

2

get 2
3

4

5

6

put 8 7

8

9

10

put 12 11

12

13

14

15

16

Objects and GC
• Unnecessary object churn
• Tune GC
• Concurrent vs parallel collectors
• Max heap
• ...and so much more
• Watch your GC pauses!!!

Cache Efﬁciency
• Watch hit rates and latencies
• Cache hit - should be fast
• Unless concurrency issue
• Cache miss
• Miss local vs
• Miss disk / cluster

Cache Sizing
• Expiration and eviction tuning
• TTI - manage moving hot set
• TTL - manage max staleness
• Max in memory - keep hot set
resident
• Max on disk / cluster - manage total
disk / clustered cache

Cache Coherency

• No replication (fastest)
• RMI replication (loose coupling)
• Terracotta replication (causal
ordering) - way faster than strict
ordering

Latency Analysis

• Proﬁlers
• Custom timers
• Tier timings
• Tracer bullets

mumble-mumble*

It’s time to add it to Terracotta.

* lawyers won’t let me say more

Thanks!

• Twitter - @puredanger
• Blog - http://tech.puredanger.com
• Terracotta - http://terracotta.org
• Ehcache - http://ehcache.org

Scaling Your Cache And Caching At Scale

Recommended

More Related Content

What's hot (19)

Similar to Scaling Your Cache And Caching At Scale (20)

More from Alex Miller (20)

Recently uploaded (20)

Scaling Your Cache And Caching At Scale