SlideShare a Scribd company logo
8
Most read
10
Most read
24
Most read
Preventing cache
stampede with Redis &
XFetch
Jim Nelson <jnelson@archive.org>
Internet Archive
RedisConf 2017
Internet Archive
Universal Access to All Knowledge
Founded 1996, based in San Francisco
Archive of digital and physical media
Includes Web, books, music, film, software & more
Digital holdings: over 30 petabytes & counting
Key collections & services:
Wayback Machine
Grateful Dead live concert collection
Internet Archive ♡ Redis
Caching & other services backed by 10-node sharded Redis cluster
Sharding performed client-side via consistent hashing (PHP, Predis)
Each node supported by two replicated mirrors (fail-over)
Specialized Redis instances also used throughout IA’s services, including
Wayback, search, and more
Caching: Quick terminology
I assume we all know what caching is. This is the terminology I’ll use today:
Recompute: Expensive operation whose result is cached
(database query, file system read, HTTP request to remote service)
Expiration: When a cache value is considered stale or out-of-date
(time-to-live)
Evict: Removing a value from the cache
(to forcibly invalidate a value prior to expiry)
Cache stampede
Cache stampede
“A cache stampede is a type of cascading failure that can
occur when massively parallel computing systems with
caching mechanisms come under very high load. This
behaviour is sometimes also called dog-piling.”
–Wikipedia
https://en.wikipedia.org/wiki/Cache_stampede
Cache stampede: A scenario
Multiple servers, each with multiple workers serving requests, accessing a
common cached value
When the cached value expires or is evicted, all workers experience a
simultaneous cache miss
Workers recompute the missing value, causing overload of primary data
sources (e.g. database) and/or hung requests
Congestion collapse
Hung workers due to network congestion or expensive recomputes—that’s bad
Discarded user requests—that’s bad
Overloaded primary data stores (“Sources of Truth”)—that’s bad
Harmonics (peaks & valleys): brief periods of intense activity (mini-outages)
followed by lulls—that’s bad
Imagine a cached value with TTL of 1hr enjoying 10,000 hits/sec—that’s good.
Now imagine @ 1hr+1sec 10,000 cache misses —that’s bad.
Typical cache code
function fetch(name)
var data = redis.get(name)
if (!data)
data = recompute(name)
redis.set(name, expires, data)
return data
This “looks” fine, but consider tens of thousands of simultaneous workers calling this code at once:
no mutual exclusion, no upper-bound to simultaneous recomputes or writes … that’s a cache stampede
Typical stampede solutions
(a) Locking
One worker acquires lock, recomputes, and writes value to cache
Other workers wait for lock to be released, then retry cache read
Primary data source is not overloaded by requests
Redis is often used as a cluster-wide distributed lock:
https://redis.io/topics/distlock
Problems with locking
Introduces extra reads and writes into code path
Starvation: expiration / eviction can lead to blocked workers waiting for a
single worker to finish recompute
Distributed locks may be abandoned
Typical stampede solutions
(b) External recompute
Use a separate process / independent worker to recompute value
Workers never recompute
(Alternately, workers recompute as fall-back when external process fails)
Problems with external recompute
One more “moving part”—a daemon, a cron job, work stealing
Requires fall-back scheme if external recompute fails to run
External recomputation is often not easily deterministic:
caching based on a wide variety of user input
periodic external recomputation of 1,000,000 user records
External recomputation may be inefficient if cached values are never read by
XFetch
(Probabilistic early recomputation)
Probabilistic early recomputation (PER)
Recompute cache values before they expire
Before expiration, one worker “volunteers” to recompute the value
Without evicting old value, volunteer performs expensive recompute—
other workers continue reading cache
Before expiration, volunteer writes new cache value and extends its
time-to-live
Under ideal conditions, there are no cache misses
XFetch
Full paper title: “Optimal Probabilistic Cache Stampede Prevention”
Authors:
Andrea Vattani (Goodreads)
Flavio Chierichetti (Sapienza University)
Keegan Lowenstein (Bugsnag)
Archived at IA:
https://archive.org/details/xfetch
The algorithm
XFetch (“exponential fetch”) is elegant:
delta * beta * loge(rand())
where
delta – Time to recompute value
beta – control (default: 1.0, > 1.0 favors earlier recomputation, < 1.0 favors later)
rand – Random number [ 0.0 … 1.0 ]
Remember: log(0) to log(1) is negative, so XFetch produces negative value
Updated code
function fetch(name)
var data,delta,ttl = redis.get(name, delta, ttl)
if (!data or xfetch(delta, time() + ttl))
var data,recompute_time = recompute(name)
redis.set(name, expires, data), redis.set(delta, expires, recompute_time)
return data
function xfetch(delta, expiry)
/* XFetch is negative; value is being added to time() */
return time() - (delta * BETA * log(rand(0,1))) >= expiry
Can more than one volunteer recompute?
Yes. You should know this before using XFetch.
It’s possible for more than one worker to “roll” the magic number and start a
recompute. The odds of this occurring increase as the expiration deadline
approaches.
If your data source absolutely cannot be accessed by multiple workers, use a
lock or another sentinel—XFetch will minimize lock contention
How to determine delta?
XFetch must be supplied with the time required to recompute.
The easiest approach is to store the duration of the last recompute and read it
with the cached value.
What’s the deal with the beta value?
beta is the one knob you have to tweak XFetch.
beta > 1.0 favors earlier recomputation, < 1.0 favors later recomputation.
My suggestion: Start with the default (1.0), instrument your code, and change
only if necessary.
XFetch & Redis
Let’s look at some sample
code
Questions?
Redis & XFetch
Jim Nelson <jnelson@archive.org>
Internet Archive
RedisConf 2017

More Related Content

What's hot (20)

PDF
マイクロサービスに至る歴史とこれから - XP祭り2021
Yusuke Suzuki
 
PDF
忙しい人の5分で分かるMesos入門 - Mesos って何だ?
Masahito Zembutsu
 
PDF
雑なMySQLパフォーマンスチューニング
yoku0825
 
PDF
Deploying Flink on Kubernetes - David Anderson
Ververica
 
PDF
Amazon Redshift パフォーマンスチューニングテクニックと最新アップデート
Amazon Web Services Japan
 
PDF
Paris Redis Meetup Introduction
Gregory Boissinot
 
PDF
Embulk, an open-source plugin-based parallel bulk data loader
Sadayuki Furuhashi
 
PDF
Apache Spark超入門 (Hadoop / Spark Conference Japan 2016 講演資料)
NTT DATA OSS Professional Services
 
PDF
Aws glue를 통한 손쉬운 데이터 전처리 작업하기
Amazon Web Services Korea
 
PDF
Little Big Data #1. 바닥부터 시작하는 데이터 인프라
Seongyun Byeon
 
PDF
Transaction Management on Cassandra
Scalar, Inc.
 
PDF
Apache Hadoop YARNとマルチテナントにおけるリソース管理
Cloudera Japan
 
PDF
SolrとElasticsearchを比べてみよう
Shinsuke Sugaya
 
PDF
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
PDF
スペシャリストになるには
外道 父
 
PDF
DockerとPodmanの比較
Akihiro Suda
 
PDF
HDFSのスケーラビリティの限界を突破するためのさまざまな取り組み | Hadoop / Spark Conference Japan 2019 #hc...
Yahoo!デベロッパーネットワーク
 
PDF
Apache Kuduを使った分析システムの裏側
Cloudera Japan
 
PDF
ちょっと理解に自信がないな という皆さまに贈るHadoop/Sparkのキホン (IBM Datapalooza Tokyo 2016講演資料)
hamaken
 
PPTX
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
マイクロサービスに至る歴史とこれから - XP祭り2021
Yusuke Suzuki
 
忙しい人の5分で分かるMesos入門 - Mesos って何だ?
Masahito Zembutsu
 
雑なMySQLパフォーマンスチューニング
yoku0825
 
Deploying Flink on Kubernetes - David Anderson
Ververica
 
Amazon Redshift パフォーマンスチューニングテクニックと最新アップデート
Amazon Web Services Japan
 
Paris Redis Meetup Introduction
Gregory Boissinot
 
Embulk, an open-source plugin-based parallel bulk data loader
Sadayuki Furuhashi
 
Apache Spark超入門 (Hadoop / Spark Conference Japan 2016 講演資料)
NTT DATA OSS Professional Services
 
Aws glue를 통한 손쉬운 데이터 전처리 작업하기
Amazon Web Services Korea
 
Little Big Data #1. 바닥부터 시작하는 데이터 인프라
Seongyun Byeon
 
Transaction Management on Cassandra
Scalar, Inc.
 
Apache Hadoop YARNとマルチテナントにおけるリソース管理
Cloudera Japan
 
SolrとElasticsearchを比べてみよう
Shinsuke Sugaya
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
スペシャリストになるには
外道 父
 
DockerとPodmanの比較
Akihiro Suda
 
HDFSのスケーラビリティの限界を突破するためのさまざまな取り組み | Hadoop / Spark Conference Japan 2019 #hc...
Yahoo!デベロッパーネットワーク
 
Apache Kuduを使った分析システムの裏側
Cloudera Japan
 
ちょっと理解に自信がないな という皆さまに贈るHadoop/Sparkのキホン (IBM Datapalooza Tokyo 2016講演資料)
hamaken
 
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 

Similar to RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetch (20)

PPT
Sedna XML Database: Executor Internals
Ivan Shcheklein
 
PDF
A Scalable I/O Manager for GHC
Johan Tibell
 
PDF
Performance and predictability (1)
RichardWarburton
 
PDF
Performance and Predictability - Richard Warburton
JAXLondon2014
 
PDF
Work Stealing For Fun & Profit: Jim Nelson
Redis Labs
 
PPT
Leveraging Hadoop in your PostgreSQL Environment
Jim Mlodgenski
 
PDF
.NET Fest 2018. Maarten Balliauw. Let’s refresh our memory! Memory management...
NETFest
 
PDF
Openstack meetup lyon_2017-09-28
Xavier Lucas
 
PDF
Java In-Process Caching - Performance, Progress and Pitfalls
Jens Wilke
 
PDF
Java In-Process Caching - Performance, Progress and Pittfalls
cruftex
 
PDF
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
HostedbyConfluent
 
PDF
[Ruxcon Monthly Sydney 2011] Proprietary Protocols Reverse Engineering : Rese...
Moabi.com
 
PPT
GC free coding in @Java presented @Geecon
Peter Lawrey
 
PPTX
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
PPTX
DotNetFest - Let’s refresh our memory! Memory management in .NET
Maarten Balliauw
 
PPT
Servers and Processes: Behavior and Analysis
dreamwidth
 
ODP
Web program-peformance-optimization
xiaojueqq12345
 
PPTX
Zaharia spark-scala-days-2012
Skills Matter Talks
 
PPTX
Privilege Escalation with Metasploit
egypt
 
PPTX
Flink internals web
Kostas Tzoumas
 
Sedna XML Database: Executor Internals
Ivan Shcheklein
 
A Scalable I/O Manager for GHC
Johan Tibell
 
Performance and predictability (1)
RichardWarburton
 
Performance and Predictability - Richard Warburton
JAXLondon2014
 
Work Stealing For Fun & Profit: Jim Nelson
Redis Labs
 
Leveraging Hadoop in your PostgreSQL Environment
Jim Mlodgenski
 
.NET Fest 2018. Maarten Balliauw. Let’s refresh our memory! Memory management...
NETFest
 
Openstack meetup lyon_2017-09-28
Xavier Lucas
 
Java In-Process Caching - Performance, Progress and Pitfalls
Jens Wilke
 
Java In-Process Caching - Performance, Progress and Pittfalls
cruftex
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
HostedbyConfluent
 
[Ruxcon Monthly Sydney 2011] Proprietary Protocols Reverse Engineering : Rese...
Moabi.com
 
GC free coding in @Java presented @Geecon
Peter Lawrey
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
DotNetFest - Let’s refresh our memory! Memory management in .NET
Maarten Balliauw
 
Servers and Processes: Behavior and Analysis
dreamwidth
 
Web program-peformance-optimization
xiaojueqq12345
 
Zaharia spark-scala-days-2012
Skills Matter Talks
 
Privilege Escalation with Metasploit
egypt
 
Flink internals web
Kostas Tzoumas
 
Ad

More from Redis Labs (20)

PPTX
Redis Day Bangalore 2020 - Session state caching with redis
Redis Labs
 
PPTX
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Redis Labs
 
PPTX
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
Redis Labs
 
PPTX
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
Redis Labs
 
PPTX
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
Redis Labs
 
PPTX
Redis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
Redis Labs
 
PPTX
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
Redis Labs
 
PPTX
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Redis Labs
 
PPTX
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
Redis Labs
 
PPTX
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
Redis Labs
 
PPTX
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Redis Labs
 
PPTX
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Redis Labs
 
PPTX
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
Redis Labs
 
PPTX
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs
 
PPTX
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs
 
PPTX
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs
 
PPTX
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
Redis Labs
 
PPTX
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
Redis Labs
 
PDF
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
Redis Labs
 
PPTX
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
Redis Labs
 
Redis Day Bangalore 2020 - Session state caching with redis
Redis Labs
 
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Redis Labs
 
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
Redis Labs
 
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
Redis Labs
 
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
Redis Labs
 
Redis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
Redis Labs
 
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
Redis Labs
 
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Redis Labs
 
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
Redis Labs
 
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
Redis Labs
 
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Redis Labs
 
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Redis Labs
 
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
Redis Labs
 
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs
 
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs
 
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs
 
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
Redis Labs
 
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
Redis Labs
 
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
Redis Labs
 
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
Redis Labs
 
Ad

Recently uploaded (20)

PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 

RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetch

  • 1. Preventing cache stampede with Redis & XFetch Jim Nelson <[email protected]> Internet Archive RedisConf 2017
  • 2. Internet Archive Universal Access to All Knowledge Founded 1996, based in San Francisco Archive of digital and physical media Includes Web, books, music, film, software & more Digital holdings: over 30 petabytes & counting Key collections & services: Wayback Machine Grateful Dead live concert collection
  • 3. Internet Archive ♡ Redis Caching & other services backed by 10-node sharded Redis cluster Sharding performed client-side via consistent hashing (PHP, Predis) Each node supported by two replicated mirrors (fail-over) Specialized Redis instances also used throughout IA’s services, including Wayback, search, and more
  • 4. Caching: Quick terminology I assume we all know what caching is. This is the terminology I’ll use today: Recompute: Expensive operation whose result is cached (database query, file system read, HTTP request to remote service) Expiration: When a cache value is considered stale or out-of-date (time-to-live) Evict: Removing a value from the cache (to forcibly invalidate a value prior to expiry)
  • 6. Cache stampede “A cache stampede is a type of cascading failure that can occur when massively parallel computing systems with caching mechanisms come under very high load. This behaviour is sometimes also called dog-piling.” –Wikipedia https://en.wikipedia.org/wiki/Cache_stampede
  • 7. Cache stampede: A scenario Multiple servers, each with multiple workers serving requests, accessing a common cached value When the cached value expires or is evicted, all workers experience a simultaneous cache miss Workers recompute the missing value, causing overload of primary data sources (e.g. database) and/or hung requests
  • 8. Congestion collapse Hung workers due to network congestion or expensive recomputes—that’s bad Discarded user requests—that’s bad Overloaded primary data stores (“Sources of Truth”)—that’s bad Harmonics (peaks & valleys): brief periods of intense activity (mini-outages) followed by lulls—that’s bad Imagine a cached value with TTL of 1hr enjoying 10,000 hits/sec—that’s good. Now imagine @ 1hr+1sec 10,000 cache misses —that’s bad.
  • 9. Typical cache code function fetch(name) var data = redis.get(name) if (!data) data = recompute(name) redis.set(name, expires, data) return data This “looks” fine, but consider tens of thousands of simultaneous workers calling this code at once: no mutual exclusion, no upper-bound to simultaneous recomputes or writes … that’s a cache stampede
  • 10. Typical stampede solutions (a) Locking One worker acquires lock, recomputes, and writes value to cache Other workers wait for lock to be released, then retry cache read Primary data source is not overloaded by requests Redis is often used as a cluster-wide distributed lock: https://redis.io/topics/distlock
  • 11. Problems with locking Introduces extra reads and writes into code path Starvation: expiration / eviction can lead to blocked workers waiting for a single worker to finish recompute Distributed locks may be abandoned
  • 12. Typical stampede solutions (b) External recompute Use a separate process / independent worker to recompute value Workers never recompute (Alternately, workers recompute as fall-back when external process fails)
  • 13. Problems with external recompute One more “moving part”—a daemon, a cron job, work stealing Requires fall-back scheme if external recompute fails to run External recomputation is often not easily deterministic: caching based on a wide variety of user input periodic external recomputation of 1,000,000 user records External recomputation may be inefficient if cached values are never read by
  • 15. Probabilistic early recomputation (PER) Recompute cache values before they expire Before expiration, one worker “volunteers” to recompute the value Without evicting old value, volunteer performs expensive recompute— other workers continue reading cache Before expiration, volunteer writes new cache value and extends its time-to-live Under ideal conditions, there are no cache misses
  • 16. XFetch Full paper title: “Optimal Probabilistic Cache Stampede Prevention” Authors: Andrea Vattani (Goodreads) Flavio Chierichetti (Sapienza University) Keegan Lowenstein (Bugsnag) Archived at IA: https://archive.org/details/xfetch
  • 17. The algorithm XFetch (“exponential fetch”) is elegant: delta * beta * loge(rand()) where delta – Time to recompute value beta – control (default: 1.0, > 1.0 favors earlier recomputation, < 1.0 favors later) rand – Random number [ 0.0 … 1.0 ] Remember: log(0) to log(1) is negative, so XFetch produces negative value
  • 18. Updated code function fetch(name) var data,delta,ttl = redis.get(name, delta, ttl) if (!data or xfetch(delta, time() + ttl)) var data,recompute_time = recompute(name) redis.set(name, expires, data), redis.set(delta, expires, recompute_time) return data function xfetch(delta, expiry) /* XFetch is negative; value is being added to time() */ return time() - (delta * BETA * log(rand(0,1))) >= expiry
  • 19. Can more than one volunteer recompute? Yes. You should know this before using XFetch. It’s possible for more than one worker to “roll” the magic number and start a recompute. The odds of this occurring increase as the expiration deadline approaches. If your data source absolutely cannot be accessed by multiple workers, use a lock or another sentinel—XFetch will minimize lock contention
  • 20. How to determine delta? XFetch must be supplied with the time required to recompute. The easiest approach is to store the duration of the last recompute and read it with the cached value.
  • 21. What’s the deal with the beta value? beta is the one knob you have to tweak XFetch. beta > 1.0 favors earlier recomputation, < 1.0 favors later recomputation. My suggestion: Start with the default (1.0), instrument your code, and change only if necessary.
  • 22. XFetch & Redis Let’s look at some sample code
  • 24. Redis & XFetch Jim Nelson <[email protected]> Internet Archive RedisConf 2017