SlideShare a Scribd company logo
Probabilistic Data
Structures
KYLE J. DAVIS
TECHNICAL MARKETING MANAGER
REDIS LABS
Who We Are
Open source. The leading in-memory database platform,
supporting any high performance operational, analytics or
hybrid use case.
The open source home and commercial provider of Redis
Enterprise technology, platform, products & services.
2
Stack Overflow Survey: The Most Loved Databases
3
64.8%
60.8%
55%
54.2%
49.9%
49.6%
47.2%
36.9%
Redis
PostgreSQL
MongoDB
SQL Server
Cassandra
MySQL
SQLite
Oracle
% of devs who expressed interest in continuing to develop with a language/tech
Redis Top Differentiators
Simplicity ExtensibilityPerformance
NoSQL Benchmark
1
Redis Data Structures
2 3
Redis Modules
4
Lists
Hashes
Bitmaps
Strings
Bit field
Streams
Hyperloglog
Sorted Sets
Sets
Geospatial Indexes
Simplicity: Data Structures - Redis’ Building Blocks
Lists
[ A → B → C → D → E ]
Hashes
{ A: “foo”, B: “bar”, C: “baz” }
Bitmaps
0011010101100111001010
Strings
"I'm a Plain Text String!”
Bit field
{23334}{112345569}{766538}
Key
5
2
”Retrieve the e-mail address of the user with the highest
bid in an auction that started on July 24th at 11:00pm PST” ZREVRANGE 07242015_2300 0 0=
Streams
{id1=time1.seq1(A:“xyz”, B:“cdf”),
d2=time2.seq2(D:“abc”, )}
Hyperloglog
00110101 11001110
Sorted Sets
{ A: 0.1, B: 0.3, C: 100 }
Sets
{ A , B , C , D , E }
Geospatial Indexes
{ A: (51.5, 0.12), B: (32.1, 34.7) }
• Add-ons that use a Redis API to seamlessly support additional
use cases and data structures.
• Enjoy Redis’ simplicity, super high performance, infinite
scalability and high availability.
Extensibility: Modules Extend Redis Infinitely
• Any C/C++/Rust program can become a Module and run on Redis.
• Leverage existing data structures or introduce new ones.
• Can be used by anyone; Redis Enterprise Modules are tested and certified by Redis
Labs.
• Turn Redis into a Multi-Model database
6
3
Probabla-what-its?
Deterministic
• You know how it will work.
• Data in = data out.
• Data is stored or it isn’t.
• Structure size >= data size
• Examples:
–Hash map (1953)
–Linked lists (1955)
–Heaps (1964)
–…
Data Structures:
Probabilistic
• Behaves differently in different
contexts
• Data in maybe data out.
• Provides a fuzzy view of data
• Structure size can be less than data
size.
• Examples:
–Bloom Filters (1970/1998)
–Count Min Sketch (2005)
–HyperLogLog (2007)
–Cuckoo Filter (2014)
–…
…BUT WHY?!
Sometimes speed is more
important than correctness
Sometimes compactness is more
important than correctness
Sometimes you only need certain
data guarantees
You can use both!
You will not leave tonight knowing everything about
Probabilistic data structures. But…
• Input: Anything, of any length
• Output: A (very) large number
• Properties: Any change in the input will result in a completely different output, but for
a given input, the output will always be the same. One way: Practically impossible to
reverse computationally.
• Cryptographic (SHA family, RIPEMD, etc.)
–Hard to compute,
–very low collision
• Non-Cryptographic (Murmur, spooky, xxhash, fnv, etc.)
– Easy to compute
– Low collision
–Smaller result size
Step 0: The hashing function
• Filter is a weird term for it - think storage not filtering
• Items are hashed, and the hashed items are stored in a bit field.
• Maybe or no.
• Demo
–http://llimllib.github.io/bloomfilter-tutorial/
–Not precisely how it’s done normally, but nice and visual
• Bit flipping.
• Put items in and query status
–Simplest form: Never fills, just gets bad.
–More complex: Fills to a pre-determined error rate ”grows”
• Growing
Step 1: Bloom Filters
- Username search (speed, guarantees)
- Fraud Mitigation (speed, guarantees)
- Akamai – One hit wonder problem (speed, compactness, guarantees)
- Databases - Disk lookups for non-existent data (speed, guarantees)
- Chrome – Is a URL malicious? (speed, guarantees, combined)
- Bitcoin – Transaction privacy in Simplified Payment Verification (compactness, combined)
- Venti – Only storing unique data in archival storage (speed, guarantees)
- Exim – as part of a rate limiter (speed, compactness, guarantees)
- Medium – Content freshness (speed, guarantees)
Step 1: Bloom Filter Usage (General)
• Provided by ReBloom Module
• BF.ADD [filter name] [item]
• BF.EXISTS [filter name] [item]
• Others commands for edge cases and administration: BF.RESERVE, BF.MADD,
BF.MEXISTS, BF.SCANDUMP, BF.LOADCHUNK
Step 1: Bloom Filter Redis Usage
• Funny name again. Estimates cardinality of unique items.
• Part of the the “sketch” family of data types
• Bit flipping and count
• Add, Count or Merge
–Merge is really useful
• 12kb for Redis implementation
• Standard Error
Step 2: HyperLogLog
Items are hashed. Look at the
binary of the hash value, find the
position of the first 1 (i.e. length
first run of 0s), count/increment a
table cell based on the position.
Complete multiple times with
different buckets and the
maximum is your count.
Step 2a: How does HyperLogLog work?
• Facebook Likes (speed, compactness, guarantees)
• Reddit Unique Reads (speed, compactness, guarantees)
• Network Attack Mitigation (speed, compactness, guarantees, combined)
• Neustar (Advertising Platforms) Group Intersections (compactness, guarantees, combined)
Step 2: HyperLogLog Uses (General)
• Built into Redis
• PFADD [hll name] [element… ]
• PFCOUNT [hll name(s)…]
• PFMERGE [dest] [source…]
Step 2: HyperLogLog Redis Usage
• Frequency Estimation (counting)
• “Sketch” family
• Increment, Query, Merge (with weights!)
• Hash items with multiple functions, counter for
each bit position.
–Grid counters of bit positions and depth
–Take the minimum
• Initialize with error at probability if to dial in
requirements
–0.01% error rate at probability of 0.01% = 40kb
• Overestimations are possible, especially at
small observations (underestimates are not)
Step 3: Count Min Sketch
1
Initial B1 B2 B3 B4
Hash 1 0 0 0 0
Hash 2 0 0 0 0
Hash 3 0 0 0 0
’foo’ INCRBY 1 B1 B2 B3 B4
Hash 1 = 3 0 1 0 1
Hash 2 = 5 0 1 0 0
Hash 3 = 1 0 0 0 0
‘bar’ INCRBY 99 B1 B2 B3 B4
Hash 1 = 11 0 1 0 1
Hash 2 = 5 0 100 0 0
Hash 3 = 8 99 0 0 99
Query `baz` MIN (5,1,0) = 0
• Network flows (speed, compactness, guarantees)
• Anomaly Detection (speed, guarantees, combined)
• Outliers (guarantees, combined)
• Power Saving Analytics in IoT Devices (speed, combined)
Step 3: Count Min Sketch Uses
• Provided by Count Min Sketch Module
• CMS.INCRBY [sketch name] [item] [amount to increment] […]
• CMS.QUERY [sketch name] [item] [item…]
• CMS.MERGE [dest] [sketch name] [sketch name…] [WEIGHTS weight weight…]
• CMS.INITBYDIM, CMS.INITBYERR
Step 3: Count Min Sketch Redis Usage
Cuckoo Filters
CC BY-SA 2.0 / Ltshears
• Same use patterns usage as Bloom filters
• Can delete and count items
• Larger than Bloom filters
• Hash x2, fingerprint x1, place the fingerprint in one bucket, if empty
–If full, kick it out to the next bucket.
• Look up does the same hash/fingerprint routine, looks for the finger print in any of the
buckets.
Step 4: Cuckoo Filter
• Slower to insert
• Faster to lookup
• Great for times when you don’t have a:
–Good Cardinality Estimate
–Tight storage budget
• Only viable option for delete on a probabilistic presence detection
• CF.ADD, CF.INSERT, CF.DEL, CF.EXISTS + a few options
Step 4: Cuckoo vs Bloom
Other probabilistic data structures?
Questions?
kyle@redislabs.com / mike@redislabs.com

More Related Content

What's hot (20)

PPTX
Getting Started with Geospatial Data in MongoDB
MongoDB
 
PPTX
MongoDB: Comparing WiredTiger In-Memory Engine to Redis
Jason Terpko
 
PDF
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Jonathan Katz
 
PDF
Accelerating Local Search with PostgreSQL (KNN-Search)
Jonathan Katz
 
PDF
Mongo sharding
Nik Kul
 
PPTX
MongoDB - Sharded Cluster Tutorial
Jason Terpko
 
PPT
Mongodb
Manav Prasad
 
PPT
Mongodb
SaurabhGhewari
 
KEY
Geo & capped collections with MongoDB
Rainforest QA
 
ODP
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
Nikolay Samokhvalov
 
PPTX
MongoDB Scalability Best Practices
Jason Terpko
 
PPTX
Choosing a Shard key
MongoDB
 
PPTX
Triggers In MongoDB
Jason Terpko
 
PPTX
NoSQL with MongoDB
Ikram Manseri
 
PDF
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Altinity Ltd
 
PDF
Indexing and Query Optimizer (Mongo Austin)
MongoDB
 
PDF
Elasticsearch War Stories
Arno Broekhof
 
PDF
MongoDB Performance Tuning
MongoDB
 
PPT
Gdc03 ericson memory_optimization
brettlevin
 
KEY
Cubes - Lightweight Python OLAP (EuroPython 2012 talk)
Stefan Urbanek
 
Getting Started with Geospatial Data in MongoDB
MongoDB
 
MongoDB: Comparing WiredTiger In-Memory Engine to Redis
Jason Terpko
 
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Jonathan Katz
 
Accelerating Local Search with PostgreSQL (KNN-Search)
Jonathan Katz
 
Mongo sharding
Nik Kul
 
MongoDB - Sharded Cluster Tutorial
Jason Terpko
 
Mongodb
Manav Prasad
 
Geo & capped collections with MongoDB
Rainforest QA
 
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
Nikolay Samokhvalov
 
MongoDB Scalability Best Practices
Jason Terpko
 
Choosing a Shard key
MongoDB
 
Triggers In MongoDB
Jason Terpko
 
NoSQL with MongoDB
Ikram Manseri
 
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Altinity Ltd
 
Indexing and Query Optimizer (Mongo Austin)
MongoDB
 
Elasticsearch War Stories
Arno Broekhof
 
MongoDB Performance Tuning
MongoDB
 
Gdc03 ericson memory_optimization
brettlevin
 
Cubes - Lightweight Python OLAP (EuroPython 2012 talk)
Stefan Urbanek
 

Similar to Probabilistic Data Structures (Edmonton Data Science Meetup, March 2018) (20)

PPTX
Probabilistic data structures
shrinivasvasala
 
PPTX
Probabilistic data structure
Thinh Dang
 
PPTX
Tech talk Probabilistic Data Structure
Rishabh Dugar
 
PDF
Approximate "Now" is Better Than Accurate "Later"
NUS-ISS
 
PPTX
Redis Modules - Redis India Tour - 2017
HashedIn Technologies
 
PPT
Redis and Bloom Filters - Atlanta Java Users Group 9/2014
Christopher Curtin
 
PDF
Leveraging Probabilistic Data Structures for Real Time Analytics with Redis M...
Itamar Haber
 
PDF
An introduction to probabilistic data structures
Miguel Ping
 
PDF
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Andrii Gakhov
 
PPT
New zealand bloom filter
xlight
 
PDF
Speed up your Symfony2 application and build awesome features with Redis
Ricard Clau
 
PDF
Probabilistic algorithms for fun and pseudorandom profit
Tyler Treat
 
PDF
Redis Data Structures
Md. Farhan Memon
 
PDF
Bloom filter
Hamid Feizabadi
 
PDF
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
DataStax
 
PPTX
Redis Use Patterns (DevconTLV June 2014)
Itamar Haber
 
PPTX
DA_02_algorithms.pptx
Alok Mohapatra
 
PDF
Kicking ass with redis
Dvir Volk
 
PDF
Use Redis in Odd and Unusual Ways
Itamar Haber
 
PDF
Esoteric Data structures
Mugisha Moses
 
Probabilistic data structures
shrinivasvasala
 
Probabilistic data structure
Thinh Dang
 
Tech talk Probabilistic Data Structure
Rishabh Dugar
 
Approximate "Now" is Better Than Accurate "Later"
NUS-ISS
 
Redis Modules - Redis India Tour - 2017
HashedIn Technologies
 
Redis and Bloom Filters - Atlanta Java Users Group 9/2014
Christopher Curtin
 
Leveraging Probabilistic Data Structures for Real Time Analytics with Redis M...
Itamar Haber
 
An introduction to probabilistic data structures
Miguel Ping
 
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Andrii Gakhov
 
New zealand bloom filter
xlight
 
Speed up your Symfony2 application and build awesome features with Redis
Ricard Clau
 
Probabilistic algorithms for fun and pseudorandom profit
Tyler Treat
 
Redis Data Structures
Md. Farhan Memon
 
Bloom filter
Hamid Feizabadi
 
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
DataStax
 
Redis Use Patterns (DevconTLV June 2014)
Itamar Haber
 
DA_02_algorithms.pptx
Alok Mohapatra
 
Kicking ass with redis
Dvir Volk
 
Use Redis in Odd and Unusual Ways
Itamar Haber
 
Esoteric Data structures
Mugisha Moses
 
Ad

Recently uploaded (20)

PPTX
Learning Tendency Analysis of Scratch Programming Course(Entry Class) for Upp...
ryouta039
 
PDF
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
PDF
MusicVideoProjectRubric Animation production music video.pdf
ALBERTIANCASUGA
 
PDF
Incident Response and Digital Forensics Certificate
VICTOR MAESTRE RAMIREZ
 
PPTX
apidays Munich 2025 - Streamline & Secure LLM Traffic with APISIX AI Gateway ...
apidays
 
PDF
How to Avoid 7 Costly Mainframe Migration Mistakes
JP Infra Pvt Ltd
 
PDF
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
PPTX
Green Vintage Notebook Science Subject for Middle School Climate and Weather ...
RiddhimaVarshney1
 
PPTX
Climate Action.pptx action plan for climate
justfortalabat
 
PPTX
DATA-COLLECTION METHODS, TYPES AND SOURCES
biggdaad011
 
PPTX
Rocket-Launched-PowerPoint-Template.pptx
Arden31
 
PPTX
Spark with anjbnn hfkkjn hbkjbu h jhbk.pptx
nreddyjanga
 
PPTX
things that used in cleaning of the things
drkaran1421
 
PPTX
This PowerPoint presentation titled "Data Visualization: Turning Data into In...
HemaDivyaKantamaneni
 
PPTX
materials that are required to used.pptx
drkaran1421
 
PPTX
TSM_08_0811111111111111111111111111111111111111111111111
csomonasteriomoscow
 
PDF
The X-Press God-WPS Office.pdf hdhdhdhdhd
ramifatoh4
 
PPTX
Human-Action-Recognition-Understanding-Behavior.pptx
nreddyjanga
 
PPT
Data base management system Transactions.ppt
gandhamcharan2006
 
PPTX
apidays Munich 2025 - GraphQL 101: I won't REST, until you GraphQL, Surbhi Si...
apidays
 
Learning Tendency Analysis of Scratch Programming Course(Entry Class) for Upp...
ryouta039
 
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
MusicVideoProjectRubric Animation production music video.pdf
ALBERTIANCASUGA
 
Incident Response and Digital Forensics Certificate
VICTOR MAESTRE RAMIREZ
 
apidays Munich 2025 - Streamline & Secure LLM Traffic with APISIX AI Gateway ...
apidays
 
How to Avoid 7 Costly Mainframe Migration Mistakes
JP Infra Pvt Ltd
 
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
Green Vintage Notebook Science Subject for Middle School Climate and Weather ...
RiddhimaVarshney1
 
Climate Action.pptx action plan for climate
justfortalabat
 
DATA-COLLECTION METHODS, TYPES AND SOURCES
biggdaad011
 
Rocket-Launched-PowerPoint-Template.pptx
Arden31
 
Spark with anjbnn hfkkjn hbkjbu h jhbk.pptx
nreddyjanga
 
things that used in cleaning of the things
drkaran1421
 
This PowerPoint presentation titled "Data Visualization: Turning Data into In...
HemaDivyaKantamaneni
 
materials that are required to used.pptx
drkaran1421
 
TSM_08_0811111111111111111111111111111111111111111111111
csomonasteriomoscow
 
The X-Press God-WPS Office.pdf hdhdhdhdhd
ramifatoh4
 
Human-Action-Recognition-Understanding-Behavior.pptx
nreddyjanga
 
Data base management system Transactions.ppt
gandhamcharan2006
 
apidays Munich 2025 - GraphQL 101: I won't REST, until you GraphQL, Surbhi Si...
apidays
 
Ad

Probabilistic Data Structures (Edmonton Data Science Meetup, March 2018)

  • 1. Probabilistic Data Structures KYLE J. DAVIS TECHNICAL MARKETING MANAGER REDIS LABS
  • 2. Who We Are Open source. The leading in-memory database platform, supporting any high performance operational, analytics or hybrid use case. The open source home and commercial provider of Redis Enterprise technology, platform, products & services. 2
  • 3. Stack Overflow Survey: The Most Loved Databases 3 64.8% 60.8% 55% 54.2% 49.9% 49.6% 47.2% 36.9% Redis PostgreSQL MongoDB SQL Server Cassandra MySQL SQLite Oracle % of devs who expressed interest in continuing to develop with a language/tech
  • 4. Redis Top Differentiators Simplicity ExtensibilityPerformance NoSQL Benchmark 1 Redis Data Structures 2 3 Redis Modules 4 Lists Hashes Bitmaps Strings Bit field Streams Hyperloglog Sorted Sets Sets Geospatial Indexes
  • 5. Simplicity: Data Structures - Redis’ Building Blocks Lists [ A → B → C → D → E ] Hashes { A: “foo”, B: “bar”, C: “baz” } Bitmaps 0011010101100111001010 Strings "I'm a Plain Text String!” Bit field {23334}{112345569}{766538} Key 5 2 ”Retrieve the e-mail address of the user with the highest bid in an auction that started on July 24th at 11:00pm PST” ZREVRANGE 07242015_2300 0 0= Streams {id1=time1.seq1(A:“xyz”, B:“cdf”), d2=time2.seq2(D:“abc”, )} Hyperloglog 00110101 11001110 Sorted Sets { A: 0.1, B: 0.3, C: 100 } Sets { A , B , C , D , E } Geospatial Indexes { A: (51.5, 0.12), B: (32.1, 34.7) }
  • 6. • Add-ons that use a Redis API to seamlessly support additional use cases and data structures. • Enjoy Redis’ simplicity, super high performance, infinite scalability and high availability. Extensibility: Modules Extend Redis Infinitely • Any C/C++/Rust program can become a Module and run on Redis. • Leverage existing data structures or introduce new ones. • Can be used by anyone; Redis Enterprise Modules are tested and certified by Redis Labs. • Turn Redis into a Multi-Model database 6 3
  • 8. Deterministic • You know how it will work. • Data in = data out. • Data is stored or it isn’t. • Structure size >= data size • Examples: –Hash map (1953) –Linked lists (1955) –Heaps (1964) –… Data Structures: Probabilistic • Behaves differently in different contexts • Data in maybe data out. • Provides a fuzzy view of data • Structure size can be less than data size. • Examples: –Bloom Filters (1970/1998) –Count Min Sketch (2005) –HyperLogLog (2007) –Cuckoo Filter (2014) –…
  • 9. …BUT WHY?! Sometimes speed is more important than correctness Sometimes compactness is more important than correctness Sometimes you only need certain data guarantees You can use both!
  • 10. You will not leave tonight knowing everything about Probabilistic data structures. But…
  • 11. • Input: Anything, of any length • Output: A (very) large number • Properties: Any change in the input will result in a completely different output, but for a given input, the output will always be the same. One way: Practically impossible to reverse computationally. • Cryptographic (SHA family, RIPEMD, etc.) –Hard to compute, –very low collision • Non-Cryptographic (Murmur, spooky, xxhash, fnv, etc.) – Easy to compute – Low collision –Smaller result size Step 0: The hashing function
  • 12. • Filter is a weird term for it - think storage not filtering • Items are hashed, and the hashed items are stored in a bit field. • Maybe or no. • Demo –http://llimllib.github.io/bloomfilter-tutorial/ –Not precisely how it’s done normally, but nice and visual • Bit flipping. • Put items in and query status –Simplest form: Never fills, just gets bad. –More complex: Fills to a pre-determined error rate ”grows” • Growing Step 1: Bloom Filters
  • 13. - Username search (speed, guarantees) - Fraud Mitigation (speed, guarantees) - Akamai – One hit wonder problem (speed, compactness, guarantees) - Databases - Disk lookups for non-existent data (speed, guarantees) - Chrome – Is a URL malicious? (speed, guarantees, combined) - Bitcoin – Transaction privacy in Simplified Payment Verification (compactness, combined) - Venti – Only storing unique data in archival storage (speed, guarantees) - Exim – as part of a rate limiter (speed, compactness, guarantees) - Medium – Content freshness (speed, guarantees) Step 1: Bloom Filter Usage (General)
  • 14. • Provided by ReBloom Module • BF.ADD [filter name] [item] • BF.EXISTS [filter name] [item] • Others commands for edge cases and administration: BF.RESERVE, BF.MADD, BF.MEXISTS, BF.SCANDUMP, BF.LOADCHUNK Step 1: Bloom Filter Redis Usage
  • 15. • Funny name again. Estimates cardinality of unique items. • Part of the the “sketch” family of data types • Bit flipping and count • Add, Count or Merge –Merge is really useful • 12kb for Redis implementation • Standard Error Step 2: HyperLogLog
  • 16. Items are hashed. Look at the binary of the hash value, find the position of the first 1 (i.e. length first run of 0s), count/increment a table cell based on the position. Complete multiple times with different buckets and the maximum is your count. Step 2a: How does HyperLogLog work?
  • 17. • Facebook Likes (speed, compactness, guarantees) • Reddit Unique Reads (speed, compactness, guarantees) • Network Attack Mitigation (speed, compactness, guarantees, combined) • Neustar (Advertising Platforms) Group Intersections (compactness, guarantees, combined) Step 2: HyperLogLog Uses (General)
  • 18. • Built into Redis • PFADD [hll name] [element… ] • PFCOUNT [hll name(s)…] • PFMERGE [dest] [source…] Step 2: HyperLogLog Redis Usage
  • 19. • Frequency Estimation (counting) • “Sketch” family • Increment, Query, Merge (with weights!) • Hash items with multiple functions, counter for each bit position. –Grid counters of bit positions and depth –Take the minimum • Initialize with error at probability if to dial in requirements –0.01% error rate at probability of 0.01% = 40kb • Overestimations are possible, especially at small observations (underestimates are not) Step 3: Count Min Sketch 1 Initial B1 B2 B3 B4 Hash 1 0 0 0 0 Hash 2 0 0 0 0 Hash 3 0 0 0 0 ’foo’ INCRBY 1 B1 B2 B3 B4 Hash 1 = 3 0 1 0 1 Hash 2 = 5 0 1 0 0 Hash 3 = 1 0 0 0 0 ‘bar’ INCRBY 99 B1 B2 B3 B4 Hash 1 = 11 0 1 0 1 Hash 2 = 5 0 100 0 0 Hash 3 = 8 99 0 0 99 Query `baz` MIN (5,1,0) = 0
  • 20. • Network flows (speed, compactness, guarantees) • Anomaly Detection (speed, guarantees, combined) • Outliers (guarantees, combined) • Power Saving Analytics in IoT Devices (speed, combined) Step 3: Count Min Sketch Uses
  • 21. • Provided by Count Min Sketch Module • CMS.INCRBY [sketch name] [item] [amount to increment] […] • CMS.QUERY [sketch name] [item] [item…] • CMS.MERGE [dest] [sketch name] [sketch name…] [WEIGHTS weight weight…] • CMS.INITBYDIM, CMS.INITBYERR Step 3: Count Min Sketch Redis Usage
  • 22. Cuckoo Filters CC BY-SA 2.0 / Ltshears
  • 23. • Same use patterns usage as Bloom filters • Can delete and count items • Larger than Bloom filters • Hash x2, fingerprint x1, place the fingerprint in one bucket, if empty –If full, kick it out to the next bucket. • Look up does the same hash/fingerprint routine, looks for the finger print in any of the buckets. Step 4: Cuckoo Filter
  • 24. • Slower to insert • Faster to lookup • Great for times when you don’t have a: –Good Cardinality Estimate –Tight storage budget • Only viable option for delete on a probabilistic presence detection • CF.ADD, CF.INSERT, CF.DEL, CF.EXISTS + a few options Step 4: Cuckoo vs Bloom