SlideShare a Scribd company logo
Hayden Marchant, Software Engineer
haydenm@wix.com
Bloom Filters
Agenda
A Bloom what?
Why would I want one?
Practical examples
How it works
Next step
A
Bloom-what?
A Bloom what?
Why would I want one?
Practical examples
How it works
Next step
Just a Data Structure
‘Is an element in a set
or not’
Probabilistic
Definitely not in set
Probably in set
OR
ILLUSTRATION
Why would I
want one?
A Bloom what?
Why would I want one?
Practical examples
How it works
Next step
Why not just
store all used
keys?
Element
Key size:
# elements:
Memory
required:
100 bytes
10 million
100 * 10M
= 1GB
SIMPLE MATHS
Bloom Filters:
Tiny footprint
~ 10 bits per element
>98%
reduction
in space
Element
Key size:
# elements:
Memory
required:
10 bits
10 million
100M bits
~= 12MB
SIMPLE MATHS
(PART 2)
Footprints:
Traditional = 1GB
Bloom Filter <=12MB
Practical
Examples
A Bloom what?
Why would I want one?
Practical examples
How it works
Next step
Akamai
Prevent caching of
One-Hit-Wonders
75% of web requests are
only ever called once
- Called One-Hit-Wonders
Use Bloom Filters to prevent
one-hit-wonders to be stored
in Disk Cache
1. Saves disk-access for
these 75% of requests
2. reducing workload
Medium
Avoid recommending
articles a user has
read
- Add article recommended
to bloom filter -
$userid-$articleid
- Before recommending,
check in Bloom Filter if
(user,article) exists.
- If not, then recommend
and then add to bloom
filter
Cassandra
Reduce disk lookup for
non-existent rows in file
storage
- Shard responsible over
range of data
- Each shard stored in
immutable shard-files
- Create Bloom Filter in
server for each shard-file
- Do not access shard-files
if Bloom Filter returns
false for key
Chrome
Identifying
Malicious URLs
Store malicious
URLs in Bloom
Filter
In-depth check on
positive response
How it works
A Bloom what?
Why would I want one?
Practical examples
How it works
Next step
Basics of
Bloom
Filter
● Array of m bits
● initially set to 0
● Hash functions
● k hash functions
defined
● maps or hashes some
set element to one of
the m array positions
Bloom Filter
in
slow motion
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
{ }m = 20
k = 3
0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0
{ cat}m = 20
k = 3
0 0 1 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 0 1
{ cat , dog }m = 20
k = 3
0 1 1 0 1 0 0 0 1 0 1 1 0 0 0 0 1 1 0 1
{cat , dog , mouse }m = 20
k = 3
0 1 1 0 1 0 0 0 1 0 1 1 0 0 0 0 1 1 0 1
{cat , dog , mouse }
owl
m = 20
k = 3
owl is definitely NOT in set
0 1 1 0 1 0 0 0 1 0 1 1 0 0 0 0 1 1 0 1
rat
m = 20
k = 3
rat might be in set
{cat , dog , mouse }
Constant memory
space
Extremely Fast
Processing
Low Error Rate
The maths
‘Under-the-hood’
1
2
3
4
5
PROBABILITIES
101
Probability of slot not selected for a single hash function
Probability of slot not selected for all k hashes
Probability of slot not selected for all k hashes of n elements
Probability of slot selected for all k hashes of n elements
Probability of k slots selected for n elements === False positive
GETTING A LITTLE
TRICKIER
If you want to read
more on the maths,
take a look at
https://en.wikipedia.org/
wiki/Bloom_filter
A Simple
Calculator
Click here: https://hur.st/bloomfilter
Next step
A Bloom what?
Why would I want one?
Practical examples
How it works
Next step
Bloom
Filters in
Code
● Libraries exist in most
languages
○ https://github.com/alexan
drnikitin/bloom-filter-scala
○ https://www.npmjs.com/p
ackage/bloom-filters
● Distributed Bloom Filters
in Redis
○ Perfect for multiple
instance of a service
using single bloom filter
Now, go think how using
Bloom Filters in your
services could reduce
payload and speed things
up
Other
probabilistic
structures
● Counting Bloom Filters
○ Allow deletes
● TopK
○ Keep track of Top
K counts of
elements
● Count-Min Sketch
○ Frequency table of
elements
Q&A
Ad

More Related Content

Similar to Introduction to Bloom Filters (20)

Unit 5 Streams2.pptx
Unit 5 Streams2.pptxUnit 5 Streams2.pptx
Unit 5 Streams2.pptx
SonaliAjankar
 
It Probably Works - QCon 2015
It Probably Works - QCon 2015It Probably Works - QCon 2015
It Probably Works - QCon 2015
Fastly
 
#GDC15 Code Clinic
#GDC15 Code Clinic#GDC15 Code Clinic
#GDC15 Code Clinic
Mike Acton
 
VISUG - Approaches for application request throttling
VISUG - Approaches for application request throttlingVISUG - Approaches for application request throttling
VISUG - Approaches for application request throttling
Maarten Balliauw
 
Primer to Machine Learning
Primer to Machine LearningPrimer to Machine Learning
Primer to Machine Learning
Jeff Tanner
 
Tokens, Complex Systems, and Nature
Tokens, Complex Systems, and NatureTokens, Complex Systems, and Nature
Tokens, Complex Systems, and Nature
Trent McConaghy
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++
Mike Acton
 
Machine Learning in Python - PyLadies Stockholm
Machine Learning in Python - PyLadies StockholmMachine Learning in Python - PyLadies Stockholm
Machine Learning in Python - PyLadies Stockholm
Dhiana Deva
 
Gc Atomic
Gc AtomicGc Atomic
Gc Atomic
bufanliu
 
Lunch session: Quantum Computing
Lunch session: Quantum ComputingLunch session: Quantum Computing
Lunch session: Quantum Computing
Rolf Huisman
 
Algorithms 101 for Data Scientists
Algorithms 101 for Data ScientistsAlgorithms 101 for Data Scientists
Algorithms 101 for Data Scientists
Christopher Conlan
 
Lecture 2 - Bit vs Qubits.pptx
Lecture 2 - Bit vs Qubits.pptxLecture 2 - Bit vs Qubits.pptx
Lecture 2 - Bit vs Qubits.pptx
NatKell
 
Practical Deep Learning Using Tensor Flow - Sandeep Kath
Practical Deep Learning Using Tensor Flow - Sandeep KathPractical Deep Learning Using Tensor Flow - Sandeep Kath
Practical Deep Learning Using Tensor Flow - Sandeep Kath
Sandeep Kath
 
streamingalgo88585858585858585pppppp.pptx
streamingalgo88585858585858585pppppp.pptxstreamingalgo88585858585858585pppppp.pptx
streamingalgo88585858585858585pppppp.pptx
GopiNathVelivela
 
A Gentle Introduction to Coding ... with Python
A Gentle Introduction to Coding ... with PythonA Gentle Introduction to Coding ... with Python
A Gentle Introduction to Coding ... with Python
Tariq Rashid
 
Evolving Search Relevancy: Presented by James Strassburg, Direct Supply
Evolving Search Relevancy: Presented by James Strassburg, Direct SupplyEvolving Search Relevancy: Presented by James Strassburg, Direct Supply
Evolving Search Relevancy: Presented by James Strassburg, Direct Supply
Lucidworks
 
Streaming Algorithms
Streaming AlgorithmsStreaming Algorithms
Streaming Algorithms
Joe Kelley
 
Performance is a Feature!
Performance is a Feature!Performance is a Feature!
Performance is a Feature!
PostSharp Technologies
 
Beyond lists - Copenhagen 2015
Beyond lists - Copenhagen 2015Beyond lists - Copenhagen 2015
Beyond lists - Copenhagen 2015
Phillip Trelford
 
2018 Global Azure Bootcamp Azure Machine Learning for neural networks
2018 Global Azure Bootcamp Azure Machine Learning for neural networks2018 Global Azure Bootcamp Azure Machine Learning for neural networks
2018 Global Azure Bootcamp Azure Machine Learning for neural networks
Setu Chokshi
 
Unit 5 Streams2.pptx
Unit 5 Streams2.pptxUnit 5 Streams2.pptx
Unit 5 Streams2.pptx
SonaliAjankar
 
It Probably Works - QCon 2015
It Probably Works - QCon 2015It Probably Works - QCon 2015
It Probably Works - QCon 2015
Fastly
 
#GDC15 Code Clinic
#GDC15 Code Clinic#GDC15 Code Clinic
#GDC15 Code Clinic
Mike Acton
 
VISUG - Approaches for application request throttling
VISUG - Approaches for application request throttlingVISUG - Approaches for application request throttling
VISUG - Approaches for application request throttling
Maarten Balliauw
 
Primer to Machine Learning
Primer to Machine LearningPrimer to Machine Learning
Primer to Machine Learning
Jeff Tanner
 
Tokens, Complex Systems, and Nature
Tokens, Complex Systems, and NatureTokens, Complex Systems, and Nature
Tokens, Complex Systems, and Nature
Trent McConaghy
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++
Mike Acton
 
Machine Learning in Python - PyLadies Stockholm
Machine Learning in Python - PyLadies StockholmMachine Learning in Python - PyLadies Stockholm
Machine Learning in Python - PyLadies Stockholm
Dhiana Deva
 
Lunch session: Quantum Computing
Lunch session: Quantum ComputingLunch session: Quantum Computing
Lunch session: Quantum Computing
Rolf Huisman
 
Algorithms 101 for Data Scientists
Algorithms 101 for Data ScientistsAlgorithms 101 for Data Scientists
Algorithms 101 for Data Scientists
Christopher Conlan
 
Lecture 2 - Bit vs Qubits.pptx
Lecture 2 - Bit vs Qubits.pptxLecture 2 - Bit vs Qubits.pptx
Lecture 2 - Bit vs Qubits.pptx
NatKell
 
Practical Deep Learning Using Tensor Flow - Sandeep Kath
Practical Deep Learning Using Tensor Flow - Sandeep KathPractical Deep Learning Using Tensor Flow - Sandeep Kath
Practical Deep Learning Using Tensor Flow - Sandeep Kath
Sandeep Kath
 
streamingalgo88585858585858585pppppp.pptx
streamingalgo88585858585858585pppppp.pptxstreamingalgo88585858585858585pppppp.pptx
streamingalgo88585858585858585pppppp.pptx
GopiNathVelivela
 
A Gentle Introduction to Coding ... with Python
A Gentle Introduction to Coding ... with PythonA Gentle Introduction to Coding ... with Python
A Gentle Introduction to Coding ... with Python
Tariq Rashid
 
Evolving Search Relevancy: Presented by James Strassburg, Direct Supply
Evolving Search Relevancy: Presented by James Strassburg, Direct SupplyEvolving Search Relevancy: Presented by James Strassburg, Direct Supply
Evolving Search Relevancy: Presented by James Strassburg, Direct Supply
Lucidworks
 
Streaming Algorithms
Streaming AlgorithmsStreaming Algorithms
Streaming Algorithms
Joe Kelley
 
Beyond lists - Copenhagen 2015
Beyond lists - Copenhagen 2015Beyond lists - Copenhagen 2015
Beyond lists - Copenhagen 2015
Phillip Trelford
 
2018 Global Azure Bootcamp Azure Machine Learning for neural networks
2018 Global Azure Bootcamp Azure Machine Learning for neural networks2018 Global Azure Bootcamp Azure Machine Learning for neural networks
2018 Global Azure Bootcamp Azure Machine Learning for neural networks
Setu Chokshi
 

Recently uploaded (20)

Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Ad

Introduction to Bloom Filters