SlideShare a Scribd company logo
Fail-Safe Starvation-Free Durable Priority Queues in Redis
Jesse H. Willett
jhw@prosperworks.com
https://github.com/jhwillett
2
ProsperWorks is a multi-tenant
CRM-as-a-Service.
ProsperWorks was built with three
basic principles in mind:
● Keep it simple.
● Show what matters.
● Make it actionable.
Who are We?
We help businesses sell more with a CRM teams actually love to use.
3
I am a server architect focused on storage, scaling, and asynchronous workloads.
I have worked at scale on public-facing live services built on many stacks:
● ProsperWorks: Postgres/Citus+Redis+Elasticsearch Ruby on Rails
● Lyft: MongoDB+Redis Doctrine/PHP
● Zynga: Memcache+Membase PHP
I have also worked on image processing grids, feature phone games, text search
engines, PC strategy games, and desktop publishing suites.
All of these systems had queues. Queues naturally manage the impedance
mismatch between system with different time or cost signatures.
Who am I?
4
Presenting Ick, a Redis-based priority queue which we have used in our
Postgres-to-Elasticsearch pipeline since Q3 2015.
Ick extends Redis Sorted Sets with 175 LoC of Lua. The combination neatly
solves many problems in asynchronous job processing.
“Ick” was my gut reaction to the idea of closing a race condition by deploying Lua
to Redis. Once successful, we adopted the backronym “Ick == Indexing QUeue”.
Ick is available via Ruby bindings in the gem redis-ick under the MIT License.
So far only Prosperworks uses redis-ick, and I am the only maintainer.
What is This?
5
● Redis Reliable Queue Pattern
○ Does not support deduplication or reordering.
○ Ick is RPOPLPUSH for Sorted Sets with a custom score update mode.
● Redis Streams
○ Does not support deduplication or reordering.
○ Still, we might have used Streams if they had been available in 2015.
● Apache Kafka
○ Log compaction could serve our deduplication needs, no reordering.
○ Too costly to own or rent for a small team in 2015, yet another storage service.
● Amazon Kinesis
○ Does not support deduplication or reordering.
○ Cost effective, yet another storage service.
Ick Comparables
6
● Our primary store is Postgres with a normalized entity-relationship model.
● Elasticsearch hosts search over a de-normalized form of our entities.
○ ES provides scale and advanced search features.
○ Mapping from PG to ES is coupled to our business logic, lives best in our code.
● Challenges keeping ES up-to-date with live changes in PG.
○ High-frequency fast PG updates from our web layer and from asynchronous jobs.
○ Low-frequency slow ES Bulk API calls.
○ A few seconds of latency in the PG ⇒ ES pipeline is acceptable.
○ UX degrades with minutes of latency. Hours of latency is unacceptable.
● A Natural Pattern:
○ When the app writes to PG, also put ids of dirty entities in a Redis queue.
○ In some cases, we also search out dirty entities in PG directly.
○ A background consumer process takes batches of dirty ids and updates ES in bulk.
Problem Space
7
# in producer
redis.rpush(key,msg)
# in consumer
batch = batch_size.times { redis. lpop(key) }.flatten # messages no longer in Redis
process_batch_slowly(batch)
● Advantages:
○ Simple
○ Many implementations: Resque works like this w/ batch_size 1
○ Scaling to many workers is straightforward.
● Disadvantages:
○ Messages lost on failure.
○ Unconstrained backlog growth when ES falls behind.
Solution 1: Basic List Pattern
8
Sometimes we see hot data: entities which are dirtied several times per second.
Under heavy load our ES Bulk API calls can take 5s or more.
With too much hot data, our backlog can grow without bound.
To get leverage over this problem we need to deduplicate messages.
We prefer deduplication at the queue level. We considered and rejected:
● One lock per message at enqueue time - brittle and expensive.
● Version information in the message - large decrease in solution generality.
We Really Care about Deduplication!
9
In Redis, this means we prefer Sorted Sets:
Sorted sets are a data type which is similar to a mix between a Set and a Hash. Like sets, sorted
sets are composed of unique, non-repeating string elements, so in some sense a sorted set is a set
as well.
However while elements inside sets are not ordered, every element in a sorted set is associated with
a floating point value, called the score [...]
Moreover, elements in a sorted sets are taken in order [by score].
Sorted Set accesses cost O(log N) versus the O(1) of Lists, but deduplicate.
Sorted Sets support FIFO-like behavior if we use timestamps as scores.
Sorted Sets for Deduplication
10
# in producer
redis.zadd(key,Time.now.to_f,msg) # Time.now for score ==> FIFO-like
# in consumer
batch = redis. zrangebyrank(key,0,batch_size) # critical section start
process_batch_slowly(batch)
redis.zrem(key,*batch.map(&:first)) # critical section end
● Advantages:
○ Messages preserved across failure.
○ De-duplication aka write-folding constrains backlog growth.
○ 1 + 2/batch_size Redis ops per message, down from 2 ops/message.
● Disadvantages:
○ Race condition between zadd and process_batch_slowly can lead to dropped messages.
○ Hot data can starve if continually re-added with a higher score.
Solution 2: Basic Sorted Set Pattern
11
# in producer
redis.zadd(key,Time.now.to_f,msg) # variadic ZADD is an option
# in consumer
batch = redis. zrangebyrank(key,0,batch_size)
process_batch_slowly(batch)
batch2 = redis. zrangebyrank(key,0,batch_size) # critical section start
unchanged = batch & batch2 # remove msgs whose scores have changed
redis.zrem(key,*unchanged.map(&:first)) # critical section end
● Advantages:
○ Critical section is smaller.
○ Critical section is not exposed to process_batch_slowly.
○ Messages only dropped from Redis after success (i.e. ZREM as ACK)
● Disadvantages:
○ Extra Redis op per cycle.
○ Hot data can starve if continually re-added with a higher score.
Solution 3: Improved Sorted Set Pattern
12
The Sorted Set solutions have a critical section where dirty signals can be lost,
and also a more subtle problem with hot data.
Hot data is continually re-added with higher scores.
During periods of intermediate load, we might carry a steady-state backlog which
is larger than a single batch size for an extended period.
When these conditions coincide, hot data may dance out of the low-score end of
the Sorted Set for hours.
We call this is the Hot Data Starvation Problem.
We Really Care about Hot Data!
13
An Ick is a pair of Redis Sorted Sets: a producer set and a consumer set.
● ICKADD adds messages to the producer set.
● ICKRESERVE moves lowest-score messages from the pset to the cset, then returns the cset.
● ICKCOMMIT removes messages from the cset.
● On duplicates, ICKADD and ICKRESERVE both select the minimum score.
ICKADD [score,msg]* app ==> Redis pset
ICKRESERVE n Redis pset ==> Redis cset up to size N ==> app return batch
ICKCOMMIT msgs* Redis cset removed
Introducing Ick
14
# push ‘a’ and ‘b’ into the Ick
Ick.new(redis). ickadd(key,123,’a’,456,’b’) # pset [[123,’a’],[456,’b’]]
# re-push ‘b’ with higher score, nothing changes
Ick.new(redis). ickadd(key,789,’b’) # pset [[123,’a’],[456,’b’]] unchanged
# re-push ‘b’ with lower score, score changes
Ick.new(redis). ickadd(key,100,’b’) # pset [[100,’b’],[123,’a’]] move b to 100
ICKADD adds to the producer set. Duplicates are assigned the minimum score.
Almost ZADD XX but more predictable.
Assuming scores trend up over time, there is no starvation. Scores never go up,
so all messages trend toward the lowest score, where they are consumed.
ICKADD
15
# push some messages into the Ick
Ick.new(redis). ickadd(key,12,’a’,10,’b’,13,’c’) # pset [[10,’b’],[12,’a’],[13,’c’]]
# reserve a batch
batch = Ick.new(redis). ickreserve(key,2) # pset [[13,’c’]] removed b and a
# cset [[10,’b’],[12,’a’]] added b and ad
# batch [’b’,10,’a’,12] per ZRANGE w/
score
# repeated ICKRESERVE just re-fetch the consumer set
batch = Ick.new(redis). ickreserve(key,2) # pset [[13,’c’]] unchanged
# cset [[10,’b’],[12,’a’]] unchanged
# batch [’b’,10,’a’,12] unchanged
ICKRESERVE fills up the consumer set by moving the lowest-score messages from
the producer set, then returns the consumer set.
This merge respects the minimum score rule.
ICKRESERVE
16
# push some messages into the Ick
Ick.new(redis). ickadd(key,12,’a’,10,’b’,13,’c’) # pset [[10,’b’],[12,’a’],[13,’c’]]
# reserve a batch
batch = Ick.new(redis). ickreserve(key,2) # pset [[13,’c’]] removed b and a
# cset [[10,’b’],[12,’a’]] added b and a
# batch [’b’,10,’a’,12] per ZRANGE w/
score
# commit ‘a’ to acknowledge success
Ick.new(redis). ickcommit(key,’a’) # pset [[13,’c’]] unchanged
# cset [[10,’b’]] removed a
ICKCOMMIT forgets messages in the producer set.
ICKCOMMIT
17
● All Ick ops are bulk operations and support multiple messages per Redis ops.
● Duplicate messages always resolved to the minimum score.
● We use current timestamps for scores.
○ The scores of new messages tends to increase.
● Even a hot data does not lose its place in line.
● A message can be present in both the pset and the cset.
○ When it is re-added after being reserved.
○ Good: reifies the critical section where PG vs ES agreement is indeterminate.
Properties of Icks
18
# in producer
Ick.new(redis). ickadd(key,Time.now.to_f,msg) # supports variadic bulk ICKADD
# in consumer
batch = Ick.new(redis). ickreserve(key,batch_size)
process_batch_slowly(batch)
Ick.new(redis). ickcommit(key,*batch.map(&:first)) # critical section only in Redis tx
● Advantages:
○ Critical section is bundled up in a Redis transaction.
○ Hot data starvation solved by constraining scores to only decrease, never increase.
○ Messages only dropped from Redis after success (i.e. ICKCOMMIT as ACK)
● Disadvantages:
○ Must deploy Lua to your Redis.
○ Not inherently scalable.
Solution 4: Ick Pattern
19
Ick support for multiple Ick consumers was considered but rejected:
● Consumer processes would need to identify themselves somehow.
● How are messages allocated to consumers?
● How do consumers come and go?
● Will this break deduplication or other serializability guarantees?
● How can the app customize?
We scale at the app level by hashing messages over many Ick+consumer pairs.
This suffers from head-of-line blocking but keeps these hard problems in
higher-level code which we can monitor and tie to business logic more easily.
Dealing with Scale
20
We usually use the current time for score in our Icks.
This is FIFO-like: any backlog has priority over current demand has priority over
future demand.
Unfortunately, resources are finite. We alert when the scores of the current batch
get older than our service level objectives.
Unfortunately, demand is bursty. For bulk operations we offset the scores by 5
seconds plus 1 second per 100 messages.
That is, as bulk operations get bulkier they also get nicer.
Advanced Ick Patterns: Hilbert’s SLA
21
I recently added a new Ick operation which combines ICKCOMMIT of the last
batch with ICKRESERVE for the next batch:
last_batch = []
while still_going() do
next_batch = Ick.new(redis). ickexchange(key,batch_size,*last_batch.map(&:first))
process_batch_slowly(next_batch)
last_batch = next_batch
end
Ick.new(redis). ickexchange(key,0,*last_batch.map(&:first))
It is gratifying to have two-phase commit without doubling the Redis ops.
This pattern would be useful in any two-phase commit or pipeline system.
Advanced Ick Patterns: ICKEXCHANGE
22
I anticipate using Ick to schedule delayed jobs by using scores as “release date”.
To support this I added an option to ICKRESERVE:
# push messages and reserve initial batch
Ick.new(redis). ickadd(key,12,’a’,10,’b’,13,’c’) # pset [[10,’b’],[12,’a’],[13,’c’]]
Ick.new(redis). ickreserve(key,2) # pset [[13,’c’]] moved b and a
# cset [[10,’b’],[12,’a’]] moved b and a
# no commits, but a younger message is added
Ick.new(redis). ickadd(key,7,’x’) # pset [[7,’x’],[13,’c’]] 7 sorts first
# cset [[10,’b’],[12,’a’]] but cset is full
# plain reserve is wedged but backwash unblocks
Ick.new(redis). ickreserve(key,2) # pset [[7,’x’],[13,’c’]] no change
# cset [[10,’b’],[12,’a’]] full
Ick.new(redis). ickreserve(key,2,backwash: true) # pset [[12,’a’],[13,’c’]] backwashed a and
b!
# cset [[7,’x’],[10,’b’]] unblocked x!
Advanced Ick Patterns: Backwash
Thank You
Jesse H. Willett
jhw@prosperworks.com
https://github.com/jhwillett

More Related Content

Similar to RedisConf18 - Fail-Safe Starvation-Free Durable Priority Queues in Redis (20)

PPTX
Adventures in Thread-per-Core Async with Redpanda and Seastar
ScyllaDB
 
PDF
Key-Key-Value Store: Generic NoSQL Datastore with Tombstone Reduction and Aut...
ScyllaDB
 
PDF
Reliable Data Replication by Cameron Morgan
ScyllaDB
 
PDF
Amazon Redshift
Jeff Patti
 
PDF
Webinar slides: Migrating to Galera Cluster for MySQL and MariaDB
Severalnines
 
PDF
Feature Store Evolution Under Cost Constraints: When Cost is Part of the Arch...
ScyllaDB
 
PPTX
Ultimate Guide to Microservice Architecture on Kubernetes
kloia
 
PDF
Understanding and building big data Architectures - NoSQL
Hyderabad Scalability Meetup
 
PPTX
Low latency in java 8 by Peter Lawrey
J On The Beach
 
PDF
Memcached Presentation
Diana Rodriguez
 
PPT
Asko Oja Moskva Architecture Highload
Ontico
 
PDF
Faster computation with matlab
Muhammad Alli
 
PDF
Handout3o
Shahbaz Sidhu
 
PPTX
iFood on Delivering 100 Million Events a Month to Restaurants with Scylla
ScyllaDB
 
PDF
Scale search powered apps with Elastisearch, k8s and go - Maxime Boisvert
Web à Québec
 
PDF
Tweaking performance on high-load projects
Dmitriy Dumanskiy
 
PPTX
Tales from the Field
MongoDB
 
PDF
OSMC 2012 | Shinken by Jean Gabès
NETWAYS
 
PDF
mloc.js 2014 - JavaScript and the browser as a platform for game development
David Galeano
 
PDF
[Cassandra summit Tokyo, 2015] Cassandra 2015 最新情報 by ジョナサン・エリス(Jonathan Ellis)
datastaxjp
 
Adventures in Thread-per-Core Async with Redpanda and Seastar
ScyllaDB
 
Key-Key-Value Store: Generic NoSQL Datastore with Tombstone Reduction and Aut...
ScyllaDB
 
Reliable Data Replication by Cameron Morgan
ScyllaDB
 
Amazon Redshift
Jeff Patti
 
Webinar slides: Migrating to Galera Cluster for MySQL and MariaDB
Severalnines
 
Feature Store Evolution Under Cost Constraints: When Cost is Part of the Arch...
ScyllaDB
 
Ultimate Guide to Microservice Architecture on Kubernetes
kloia
 
Understanding and building big data Architectures - NoSQL
Hyderabad Scalability Meetup
 
Low latency in java 8 by Peter Lawrey
J On The Beach
 
Memcached Presentation
Diana Rodriguez
 
Asko Oja Moskva Architecture Highload
Ontico
 
Faster computation with matlab
Muhammad Alli
 
Handout3o
Shahbaz Sidhu
 
iFood on Delivering 100 Million Events a Month to Restaurants with Scylla
ScyllaDB
 
Scale search powered apps with Elastisearch, k8s and go - Maxime Boisvert
Web à Québec
 
Tweaking performance on high-load projects
Dmitriy Dumanskiy
 
Tales from the Field
MongoDB
 
OSMC 2012 | Shinken by Jean Gabès
NETWAYS
 
mloc.js 2014 - JavaScript and the browser as a platform for game development
David Galeano
 
[Cassandra summit Tokyo, 2015] Cassandra 2015 最新情報 by ジョナサン・エリス(Jonathan Ellis)
datastaxjp
 

More from Redis Labs (20)

PPTX
Redis Day Bangalore 2020 - Session state caching with redis
Redis Labs
 
PPTX
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Redis Labs
 
PPTX
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
Redis Labs
 
PPTX
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
Redis Labs
 
PPTX
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
Redis Labs
 
PPTX
Redis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
Redis Labs
 
PPTX
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
Redis Labs
 
PPTX
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Redis Labs
 
PPTX
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
Redis Labs
 
PPTX
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
Redis Labs
 
PPTX
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Redis Labs
 
PPTX
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Redis Labs
 
PPTX
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
Redis Labs
 
PPTX
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs
 
PPTX
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs
 
PPTX
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs
 
PPTX
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
Redis Labs
 
PPTX
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
Redis Labs
 
PDF
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
Redis Labs
 
PPTX
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
Redis Labs
 
Redis Day Bangalore 2020 - Session state caching with redis
Redis Labs
 
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Redis Labs
 
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
Redis Labs
 
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
Redis Labs
 
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
Redis Labs
 
Redis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
Redis Labs
 
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
Redis Labs
 
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Redis Labs
 
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
Redis Labs
 
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
Redis Labs
 
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Redis Labs
 
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Redis Labs
 
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
Redis Labs
 
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs
 
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs
 
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs
 
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
Redis Labs
 
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
Redis Labs
 
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
Redis Labs
 
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
Redis Labs
 
Ad

Recently uploaded (20)

PPTX
UI5Con 2025 - Get to Know Your UI5 Tooling
Wouter Lemaire
 
PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
PPTX
python advanced data structure dictionary with examples python advanced data ...
sprasanna11
 
PPTX
Machine Learning Benefits Across Industries
SynapseIndia
 
PPTX
Top Managed Service Providers in Los Angeles
Captain IT
 
PDF
Trading Volume Explained by CIFDAQ- Secret Of Market Trends
CIFDAQ
 
PDF
Upskill to Agentic Automation 2025 - Kickoff Meeting
DianaGray10
 
PDF
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
PPTX
UI5Con 2025 - Beyond UI5 Controls with the Rise of Web Components
Wouter Lemaire
 
PDF
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
PDF
HR agent at Mediq: Lessons learned on Agent Builder & Maestro by Tacstone Tec...
UiPathCommunity
 
PDF
Apache CloudStack 201: Let's Design & Build an IaaS Cloud
ShapeBlue
 
PDF
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
PPTX
The Yotta x CloudStack Advantage: Scalable, India-First Cloud
ShapeBlue
 
PDF
Novus-Safe Pro: Brochure-What is Novus Safe Pro?.pdf
Novus Hi-Tech
 
PDF
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
PPTX
Simplifying End-to-End Apache CloudStack Deployment with a Web-Based Automati...
ShapeBlue
 
PDF
CIFDAQ'S Token Spotlight for 16th July 2025 - ALGORAND
CIFDAQ
 
PDF
Productivity Management Software | Workstatus
Lovely Baghel
 
PDF
UiPath vs Other Automation Tools Meeting Presentation.pdf
Tracy Dixon
 
UI5Con 2025 - Get to Know Your UI5 Tooling
Wouter Lemaire
 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
python advanced data structure dictionary with examples python advanced data ...
sprasanna11
 
Machine Learning Benefits Across Industries
SynapseIndia
 
Top Managed Service Providers in Los Angeles
Captain IT
 
Trading Volume Explained by CIFDAQ- Secret Of Market Trends
CIFDAQ
 
Upskill to Agentic Automation 2025 - Kickoff Meeting
DianaGray10
 
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
UI5Con 2025 - Beyond UI5 Controls with the Rise of Web Components
Wouter Lemaire
 
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
HR agent at Mediq: Lessons learned on Agent Builder & Maestro by Tacstone Tec...
UiPathCommunity
 
Apache CloudStack 201: Let's Design & Build an IaaS Cloud
ShapeBlue
 
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
The Yotta x CloudStack Advantage: Scalable, India-First Cloud
ShapeBlue
 
Novus-Safe Pro: Brochure-What is Novus Safe Pro?.pdf
Novus Hi-Tech
 
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
Simplifying End-to-End Apache CloudStack Deployment with a Web-Based Automati...
ShapeBlue
 
CIFDAQ'S Token Spotlight for 16th July 2025 - ALGORAND
CIFDAQ
 
Productivity Management Software | Workstatus
Lovely Baghel
 
UiPath vs Other Automation Tools Meeting Presentation.pdf
Tracy Dixon
 
Ad

RedisConf18 - Fail-Safe Starvation-Free Durable Priority Queues in Redis

  • 1. Fail-Safe Starvation-Free Durable Priority Queues in Redis Jesse H. Willett [email protected] https://github.com/jhwillett
  • 2. 2 ProsperWorks is a multi-tenant CRM-as-a-Service. ProsperWorks was built with three basic principles in mind: ● Keep it simple. ● Show what matters. ● Make it actionable. Who are We? We help businesses sell more with a CRM teams actually love to use.
  • 3. 3 I am a server architect focused on storage, scaling, and asynchronous workloads. I have worked at scale on public-facing live services built on many stacks: ● ProsperWorks: Postgres/Citus+Redis+Elasticsearch Ruby on Rails ● Lyft: MongoDB+Redis Doctrine/PHP ● Zynga: Memcache+Membase PHP I have also worked on image processing grids, feature phone games, text search engines, PC strategy games, and desktop publishing suites. All of these systems had queues. Queues naturally manage the impedance mismatch between system with different time or cost signatures. Who am I?
  • 4. 4 Presenting Ick, a Redis-based priority queue which we have used in our Postgres-to-Elasticsearch pipeline since Q3 2015. Ick extends Redis Sorted Sets with 175 LoC of Lua. The combination neatly solves many problems in asynchronous job processing. “Ick” was my gut reaction to the idea of closing a race condition by deploying Lua to Redis. Once successful, we adopted the backronym “Ick == Indexing QUeue”. Ick is available via Ruby bindings in the gem redis-ick under the MIT License. So far only Prosperworks uses redis-ick, and I am the only maintainer. What is This?
  • 5. 5 ● Redis Reliable Queue Pattern ○ Does not support deduplication or reordering. ○ Ick is RPOPLPUSH for Sorted Sets with a custom score update mode. ● Redis Streams ○ Does not support deduplication or reordering. ○ Still, we might have used Streams if they had been available in 2015. ● Apache Kafka ○ Log compaction could serve our deduplication needs, no reordering. ○ Too costly to own or rent for a small team in 2015, yet another storage service. ● Amazon Kinesis ○ Does not support deduplication or reordering. ○ Cost effective, yet another storage service. Ick Comparables
  • 6. 6 ● Our primary store is Postgres with a normalized entity-relationship model. ● Elasticsearch hosts search over a de-normalized form of our entities. ○ ES provides scale and advanced search features. ○ Mapping from PG to ES is coupled to our business logic, lives best in our code. ● Challenges keeping ES up-to-date with live changes in PG. ○ High-frequency fast PG updates from our web layer and from asynchronous jobs. ○ Low-frequency slow ES Bulk API calls. ○ A few seconds of latency in the PG ⇒ ES pipeline is acceptable. ○ UX degrades with minutes of latency. Hours of latency is unacceptable. ● A Natural Pattern: ○ When the app writes to PG, also put ids of dirty entities in a Redis queue. ○ In some cases, we also search out dirty entities in PG directly. ○ A background consumer process takes batches of dirty ids and updates ES in bulk. Problem Space
  • 7. 7 # in producer redis.rpush(key,msg) # in consumer batch = batch_size.times { redis. lpop(key) }.flatten # messages no longer in Redis process_batch_slowly(batch) ● Advantages: ○ Simple ○ Many implementations: Resque works like this w/ batch_size 1 ○ Scaling to many workers is straightforward. ● Disadvantages: ○ Messages lost on failure. ○ Unconstrained backlog growth when ES falls behind. Solution 1: Basic List Pattern
  • 8. 8 Sometimes we see hot data: entities which are dirtied several times per second. Under heavy load our ES Bulk API calls can take 5s or more. With too much hot data, our backlog can grow without bound. To get leverage over this problem we need to deduplicate messages. We prefer deduplication at the queue level. We considered and rejected: ● One lock per message at enqueue time - brittle and expensive. ● Version information in the message - large decrease in solution generality. We Really Care about Deduplication!
  • 9. 9 In Redis, this means we prefer Sorted Sets: Sorted sets are a data type which is similar to a mix between a Set and a Hash. Like sets, sorted sets are composed of unique, non-repeating string elements, so in some sense a sorted set is a set as well. However while elements inside sets are not ordered, every element in a sorted set is associated with a floating point value, called the score [...] Moreover, elements in a sorted sets are taken in order [by score]. Sorted Set accesses cost O(log N) versus the O(1) of Lists, but deduplicate. Sorted Sets support FIFO-like behavior if we use timestamps as scores. Sorted Sets for Deduplication
  • 10. 10 # in producer redis.zadd(key,Time.now.to_f,msg) # Time.now for score ==> FIFO-like # in consumer batch = redis. zrangebyrank(key,0,batch_size) # critical section start process_batch_slowly(batch) redis.zrem(key,*batch.map(&:first)) # critical section end ● Advantages: ○ Messages preserved across failure. ○ De-duplication aka write-folding constrains backlog growth. ○ 1 + 2/batch_size Redis ops per message, down from 2 ops/message. ● Disadvantages: ○ Race condition between zadd and process_batch_slowly can lead to dropped messages. ○ Hot data can starve if continually re-added with a higher score. Solution 2: Basic Sorted Set Pattern
  • 11. 11 # in producer redis.zadd(key,Time.now.to_f,msg) # variadic ZADD is an option # in consumer batch = redis. zrangebyrank(key,0,batch_size) process_batch_slowly(batch) batch2 = redis. zrangebyrank(key,0,batch_size) # critical section start unchanged = batch & batch2 # remove msgs whose scores have changed redis.zrem(key,*unchanged.map(&:first)) # critical section end ● Advantages: ○ Critical section is smaller. ○ Critical section is not exposed to process_batch_slowly. ○ Messages only dropped from Redis after success (i.e. ZREM as ACK) ● Disadvantages: ○ Extra Redis op per cycle. ○ Hot data can starve if continually re-added with a higher score. Solution 3: Improved Sorted Set Pattern
  • 12. 12 The Sorted Set solutions have a critical section where dirty signals can be lost, and also a more subtle problem with hot data. Hot data is continually re-added with higher scores. During periods of intermediate load, we might carry a steady-state backlog which is larger than a single batch size for an extended period. When these conditions coincide, hot data may dance out of the low-score end of the Sorted Set for hours. We call this is the Hot Data Starvation Problem. We Really Care about Hot Data!
  • 13. 13 An Ick is a pair of Redis Sorted Sets: a producer set and a consumer set. ● ICKADD adds messages to the producer set. ● ICKRESERVE moves lowest-score messages from the pset to the cset, then returns the cset. ● ICKCOMMIT removes messages from the cset. ● On duplicates, ICKADD and ICKRESERVE both select the minimum score. ICKADD [score,msg]* app ==> Redis pset ICKRESERVE n Redis pset ==> Redis cset up to size N ==> app return batch ICKCOMMIT msgs* Redis cset removed Introducing Ick
  • 14. 14 # push ‘a’ and ‘b’ into the Ick Ick.new(redis). ickadd(key,123,’a’,456,’b’) # pset [[123,’a’],[456,’b’]] # re-push ‘b’ with higher score, nothing changes Ick.new(redis). ickadd(key,789,’b’) # pset [[123,’a’],[456,’b’]] unchanged # re-push ‘b’ with lower score, score changes Ick.new(redis). ickadd(key,100,’b’) # pset [[100,’b’],[123,’a’]] move b to 100 ICKADD adds to the producer set. Duplicates are assigned the minimum score. Almost ZADD XX but more predictable. Assuming scores trend up over time, there is no starvation. Scores never go up, so all messages trend toward the lowest score, where they are consumed. ICKADD
  • 15. 15 # push some messages into the Ick Ick.new(redis). ickadd(key,12,’a’,10,’b’,13,’c’) # pset [[10,’b’],[12,’a’],[13,’c’]] # reserve a batch batch = Ick.new(redis). ickreserve(key,2) # pset [[13,’c’]] removed b and a # cset [[10,’b’],[12,’a’]] added b and ad # batch [’b’,10,’a’,12] per ZRANGE w/ score # repeated ICKRESERVE just re-fetch the consumer set batch = Ick.new(redis). ickreserve(key,2) # pset [[13,’c’]] unchanged # cset [[10,’b’],[12,’a’]] unchanged # batch [’b’,10,’a’,12] unchanged ICKRESERVE fills up the consumer set by moving the lowest-score messages from the producer set, then returns the consumer set. This merge respects the minimum score rule. ICKRESERVE
  • 16. 16 # push some messages into the Ick Ick.new(redis). ickadd(key,12,’a’,10,’b’,13,’c’) # pset [[10,’b’],[12,’a’],[13,’c’]] # reserve a batch batch = Ick.new(redis). ickreserve(key,2) # pset [[13,’c’]] removed b and a # cset [[10,’b’],[12,’a’]] added b and a # batch [’b’,10,’a’,12] per ZRANGE w/ score # commit ‘a’ to acknowledge success Ick.new(redis). ickcommit(key,’a’) # pset [[13,’c’]] unchanged # cset [[10,’b’]] removed a ICKCOMMIT forgets messages in the producer set. ICKCOMMIT
  • 17. 17 ● All Ick ops are bulk operations and support multiple messages per Redis ops. ● Duplicate messages always resolved to the minimum score. ● We use current timestamps for scores. ○ The scores of new messages tends to increase. ● Even a hot data does not lose its place in line. ● A message can be present in both the pset and the cset. ○ When it is re-added after being reserved. ○ Good: reifies the critical section where PG vs ES agreement is indeterminate. Properties of Icks
  • 18. 18 # in producer Ick.new(redis). ickadd(key,Time.now.to_f,msg) # supports variadic bulk ICKADD # in consumer batch = Ick.new(redis). ickreserve(key,batch_size) process_batch_slowly(batch) Ick.new(redis). ickcommit(key,*batch.map(&:first)) # critical section only in Redis tx ● Advantages: ○ Critical section is bundled up in a Redis transaction. ○ Hot data starvation solved by constraining scores to only decrease, never increase. ○ Messages only dropped from Redis after success (i.e. ICKCOMMIT as ACK) ● Disadvantages: ○ Must deploy Lua to your Redis. ○ Not inherently scalable. Solution 4: Ick Pattern
  • 19. 19 Ick support for multiple Ick consumers was considered but rejected: ● Consumer processes would need to identify themselves somehow. ● How are messages allocated to consumers? ● How do consumers come and go? ● Will this break deduplication or other serializability guarantees? ● How can the app customize? We scale at the app level by hashing messages over many Ick+consumer pairs. This suffers from head-of-line blocking but keeps these hard problems in higher-level code which we can monitor and tie to business logic more easily. Dealing with Scale
  • 20. 20 We usually use the current time for score in our Icks. This is FIFO-like: any backlog has priority over current demand has priority over future demand. Unfortunately, resources are finite. We alert when the scores of the current batch get older than our service level objectives. Unfortunately, demand is bursty. For bulk operations we offset the scores by 5 seconds plus 1 second per 100 messages. That is, as bulk operations get bulkier they also get nicer. Advanced Ick Patterns: Hilbert’s SLA
  • 21. 21 I recently added a new Ick operation which combines ICKCOMMIT of the last batch with ICKRESERVE for the next batch: last_batch = [] while still_going() do next_batch = Ick.new(redis). ickexchange(key,batch_size,*last_batch.map(&:first)) process_batch_slowly(next_batch) last_batch = next_batch end Ick.new(redis). ickexchange(key,0,*last_batch.map(&:first)) It is gratifying to have two-phase commit without doubling the Redis ops. This pattern would be useful in any two-phase commit or pipeline system. Advanced Ick Patterns: ICKEXCHANGE
  • 22. 22 I anticipate using Ick to schedule delayed jobs by using scores as “release date”. To support this I added an option to ICKRESERVE: # push messages and reserve initial batch Ick.new(redis). ickadd(key,12,’a’,10,’b’,13,’c’) # pset [[10,’b’],[12,’a’],[13,’c’]] Ick.new(redis). ickreserve(key,2) # pset [[13,’c’]] moved b and a # cset [[10,’b’],[12,’a’]] moved b and a # no commits, but a younger message is added Ick.new(redis). ickadd(key,7,’x’) # pset [[7,’x’],[13,’c’]] 7 sorts first # cset [[10,’b’],[12,’a’]] but cset is full # plain reserve is wedged but backwash unblocks Ick.new(redis). ickreserve(key,2) # pset [[7,’x’],[13,’c’]] no change # cset [[10,’b’],[12,’a’]] full Ick.new(redis). ickreserve(key,2,backwash: true) # pset [[12,’a’],[13,’c’]] backwashed a and b! # cset [[7,’x’],[10,’b’]] unblocked x! Advanced Ick Patterns: Backwash
  • 23. Thank You Jesse H. Willett [email protected] https://github.com/jhwillett