SlideShare a Scribd company logo
FEBRUARY 15, 2018 | BELL HARBOR
#MDBlocal
Sizing MongoDB Clusters
#MDBlocal
Master Solutions
Architect
Jay Runkel
MongoDB @jayrunkel
#MDBlocal
MongoDB: 4 years
NoSQL: 9 years
Solution Architect: engineer assigned to sales
Significant experience sizing MongoDB clusters
About me
#MDBlocal
Objective
#MDBlocal
Do I need to shard?
What size servers should I use?
What will my monthly Atlas/AWS/Azure/Google costs be?
When will I need to add a new shard or upgrade my servers?
How much data can my servers support?
How many queries can my servers support?
Will we be able to meet our query latency requirements?
Sizing
#MDBlocal
• Large coffee chain: PlanetDollar
• Collect mobile app performance
• Every tap, click, gesture will generate an event
• 2 Year History
• Perform analytics
• Historical
• Near real-time (executive dashboards)
• Support usage
• 3000 – 5000 events per second
Your boss comes to you…
I need a budget for the monthly Atlas costs?
#MDBlocal
• Build a prototype
• Run performance tests using actual data and queries on hardware with
specs similar to production servers
The only accurate way to size a cluster
• EVERY OTHER APPROACH IS A GUESS
• Including the one I am presenting today
#MDBlocal
• Early in project, but
• Need to order hardware
• Estimate costs to determine “Go/No Go” decision
• Schema design
• Compare the hardware requirements for different schemas
Sometimes, it is necessary to guess ☹
#MDBlocal
MongoDB Clusters Look Like This
Config
Config
Config
Application
Driver
Primary
Secondary
Secondary
#MDBlocal
• # of shards
• Specifications of each server
• CPU
• Storage
• Size
• Performance: IOPS
• Memory
• Network
Our Solution Will Consist Of
#MDBlocal
• Sizing Objective ✓
• IOPS, Query Processing, Working Set
• Sizing Methodology with an Example
Agenda
#MDBlocal
IOPS, Query Processing,
Working Set
#MDBlocal
• # of shards
• Specifications of each server
• CPU
• Storage
• Size
• Performance: IOPS
• Memory
• Network
Our Solution Will Consist Of
#MDBlocal
• IOPS – input output units per second
• Throughput
• Random access
• Most workloads “randomly” access documents
IOPS
collection
#MDBlocal
Storage Performance
Type IOPS
7200 rpm SATA ~ 75 – 100
15000 rpm SAS ~ 175 – 210
RAID-10 (24 x 7200 RPM SAS) 2000
Amazon EBS 250 – 500
Amazon EBS Provisioned IOPS 10000 - 20000
SSD 50000
Flash Storage 100K – 400K (or more)
http://en.wikipedia.org/wiki/IOPS
#MDBlocal
• How many IOPS do we need?
• Want the real answer, run a test
• How to estimate?
Hardest Part of Sizing is IOPS
#MDBlocal
Processing a Query
Select Index
Load relevant index
entries from disk
Identify documents
using index
Retrieve documents
from disk
Filter documents
Return Documents
#MDBlocal
Processing a Query
IO
Select Index
Load relevant index
entries from disk
Identify documents
using index
Retrieve documents
from disk
Filter documents
Return Documents
#MDBlocal
But MongoDB Has a Cache
File System
indexes collections
CPU
Memory
indexes
documents
Disk access is only necessary if indexes
or documents are not in cache
Select Index
Load relevant index
entries from disk
Identify documents
using index
Retrieve documents
from disk
Filter documents
Return Documents
#MDBlocal
Working Set
Working Set = indexes plus frequently accessed
documents
If RAM greater than working set then reduced IO
Select Index
Load relevant index
entries from disk
Identify documents
using index
Retrieve documents
from disk
Filter documents
Return Documents
File System
indexes collections
CPU
Memory
indexes
documents
#MDBlocal
This is all great, but how do we estimate IOPS?
#MDBlocal
Assume
• Working Set < RAM < Data Size
• Memory contains indexes only
MongoDB Simplified Model
File System
indexes collections
CPU
Memory
indexes
documents
#MDBlocal
Assume appropriate indexes
To resolve find:
• Navigate in-memory indexes
• Retrieve document from disk
1 IOP per document returned
Find Queries With Simplified Model
File System
indexes collections
CPU
Memory
indexes
documents
#MDBlocal
Assume appropriate indexes
To resolve find:
• Navigate in-memory indexes
• Retrieve document from disk
1 IOP per document returned
Find Queries With Simplified Model
File System
indexes collections
CPU
Memory
indexes
documents
#MDBlocal
To resolve insert:
• Write document to disk
• Update each index file
IOPS = 1 + # of indexes
Inserts With Simplified Model
File System
indexes collections
CPU
Memory
indexes
documents
#MDBlocal
To resolve delete:
• Navigate in-memory indexes
• Mark document deleted
• Update each index file
IOPS = 1 + # of indexes
Deletes With Simplified Model
File System
indexes collections
CPU
Memory
indexes
documents
#MDBlocal
To resolve delete:
• Navigate in-memory indexes
• Mark document deleted
• Insert new document version
• Update each index file
IOPS = 2 + # of indexes
Updates With Simplified Model
File System
indexes collections
CPU
Memory
indexes
documents
#MDBlocal
• Working Set
• Checkpoints
• Document size relative to block size
• Indexed Arrays
• Journal, Log
The Simplified Model is too simplistic
#MDBlocal
• WiredTiger write process:
1. Update document in RAM (cache)
2. Write to journal (disk)
3. Periodically, write dirty documents to disk (checkpoint)
• 60 seconds or 2 GB (whichever comes first)
Checkpoints
Checkpoint 1 Checkpoint 2 Checkpoint 3
B C A A C A
3 writes
3 documents written
3 writes
2 documents written
#MDBlocal
• Estimate total requirements (using simplified model):
• RAM
• CPU
• Disk Space
• IOPS
• Adjust based upon working set, checkpoints, etc.
• Design (sharded) cluster that provides these totals
How are we going to get there?
#MDBlocal
Sizing Process
#MDBlocal
Methodology
Application
Requirements
Cluster Sizing
• Number of
shards
• Server specs
Magic
Happens
#MDBlocal
1.Collection Size
2.Working Set
3.Queries -> IOPS
4.Adjust based upon working set, checkpoints, etc.
5.Calculate # of shards
6.Review, iterate, repeat
Methodology (cont.)
Build a spread
sheet
Multiple
iterations may
be required
#MDBlocal
1.Assumptions
1.Data Size
1.Working Set
• Index Size
• Frequently Accessed Documents
1.Queries – IOPS
1.Shard Calculations
Sizing Spreadsheet
#MDBlocal
1.Assumptions
1.Data Size
1.Working Set
• Index Size
• Frequently Accessed Documents
2. Queries – IOPS
1. Shard Calculations
Sizing Spreadsheet
#MDBlocal
https://github.com/jayrunkel/mdbw2017Sizing
Sizing Spreadsheet Download
#MDBlocal
Assumptions
#MDBlocal
Sizing Spreadsheet
1.Assumptions
1.Data Size
1.Working Set
• Index Size
• Frequently Accessed Documents
2. Queries – IOPS
1. Shard Calculations
#MDBlocal
• # of documents
• Data size
• Index size
• WT compression
Collection Analysis
#MDBlocal
Calculate The Number of Documents
Application Description # of Documents in Collection
There will be 20M documents in the
collection by the end of 2017
20,000,000
We expect to insert 10K documents
per day with 1 year retention period
365*10,000 = 3,655,000
We have 3000 devices each
producing 1 event per minute and we
need to keep a 90 day history
3000 * 60 * 24 * 90 = 388,800,000
#MDBlocal
Calculate The Number of Documents
Application Description # of Documents in Collection
There will be 20M documents in the collection
by the end of 2017
20,000,000
We expect to insert 10K documents per day
with 1 year retention period
365*10,000 = 3,655,000
We have 3000 devices each producing 1 event
per minute and we need to keep a 90 day
history
3000 * 60 * 24 * 90 = 388,800,000
PlanetDollar:
2 year history. Each day 5000 inserts per
second for 5 hours and 3000 inserts per second
for 19 hours
2*365*(5000*5*3600 +
3000*19*3600) = 215,496,000,000
#MDBlocal
• Data Size = # of documents * Average document size
• This information is available in db.stats(), Compass, Ops Manager, Cloud
Manager, Atlas, etc.
Calculate the Data Size
#MDBlocal
• Write some code
• Programmatically generate a large data set
• 5-10% of expected size
• Measure
• Collection size
• Index size
• Compression
What if there aren’t any documents?
#MDBlocal
• Use db.collection.stats()
• Take data size, index size and extrapolate to production size
• Calculate compression ratio
db.collection.stats()
{
count: 10000
size: 70,388,956
avgObjSize: 7038
storageSize: 25341952
…
totalIndexSize: 147456
}
Determine collection and data size
Parameter Formula Value
# of documents 2.5B
avgObjSize 7038
Collection Size =2.5B * 7038 1.760E13 Bytes
WT Compression = 25341952/70388956 .36
Collection Storage =2.5B * 7038 * .36 6.33E12 Bytes
Index Size Per Doc = 147456 / 10000 15 Bytes
Collection Index Size =2.5B * 15 /1024^3 35 GB
#MDBlocal
PlanetDollar - Collection Size
#MDBlocal
Sizing Spreadsheet
1.Assumptions
1.Data Size
1.Working Set
• Index Size
• Frequently Accessed Documents
1.Queries – IOPS
1.Shard Calculations
#MDBlocal
WorkSet = Indexes plus the set of documents accessed frequently
We know the index size from previous analysis
Estimate the frequently
accessed documents
Given the queries
What are the frequently
accessed docs?
Working Set
File System
collections indexes
CPU
Memory
indexes
documents
#MDBlocal
Query Analysis
Dashboards – last minute of data
Customer support - last hour of data
Reports (run once per day) inspect last years worth of data
Active Documents = 1 hours worth of data
PlanetDollar Working Set
#MDBlocal
PlanetDollar - Working Set
#MDBlocal
1.Assumptions
1.Data Size
1.Working Set
• Index Size
• Frequently Accessed Documents
1.Queries – IOPS
1.Shard Calculations
Sizing Spreadsheet
#MDBlocal
+ # of documents returned per second
+ # of documents updated per second
+ # of indexes impacted by each update
+ # of inserts per second
+ # of indexes impacted by each insert
+ # of deletes per second (x2)
+ # of indexes impacted by each delete
- Multiple updates occurring within checkpoint
- % of find query results in cache
Total IOPS
IOPS Calculation
#MDBlocal
• 5000 inserts per second
• 5000 deletes per second
Dashboards (aggregations: 100 per minute)
• Total events per minute across all users (current minute)
• Total events per minute per region (current minute)
• Total events per store per minute (current minute)
Debugging Tool (ad hoc – 5 per second)
• Find all events for a user in last 60 minutes (100 events returned, on average)
PlanetDollar Queries
#MDBlocal
• Each insert:
• Update collection
• Update each index (3 indexes)
• Each Delete:
• Update collection
• Update each index (3 indexes)
• 5000 inserts/sec
• 5000 deletes/sec
IOPS for inserts and deletes
4 IOPS
5 IOPS
(4 * 5000) + (5 * 5000) = 45000 IOPS
#MDBlocal
• Example: Total events per minute across all users (current minute)
• How many documents will be read from disk?
IOPS for PlanetDollar Aggregations
05000 per second * 60 seconds = 300,000
Most data in cache
Some IOPS will likely be require
#MDBlocal
• Find all events for a user in last 60 minutes
• 5 per second
• 100 documents per query
• # IOPS = 5 * 100 = 500 IOPS
IOPS For Find
#MDBlocal
PlanetDollar - Queries/IOPS
#MDBlocal
• CPU utilized for:
• Compress/decompress
• Encrypt/Decrypt
• Aggregation queries
• General query processing
• In most cases, RAM requirements → large servers → many cores
• Possible exception: aggregation queries
• One core per query
• # cores >> # of simultaneous aggregation queries
How Many CPUs Do I Need?
#MDBlocal
1.Assumptions
1.Data Size
1.Working Set
• Index Size
• Frequently Accessed Documents
1.Queries – IOPS
1.Shard Calculations
Sizing Spreadsheet
#MDBlocal
• At this point you have:
1. Required storage capacity
2. Working Set Size
3. IOPS Estimate
4. Some idea about class of server (or VM) the customer plans to deploy
• Determine number of required shards
Shard Calculations
#MDBlocal
• Sum of disk space across shards
greater than required storage size
Disk Space: How Many Shards Do I Need?
Example
Data Size = 9 TB
WiredTiger Compression Ratio: .33
Storage size = 3 TB
Req Disk: 6 TB
Server disk capacity = 2 TB
3 Shards Required
Recommend providing 2X
the compressed data size in
disk
#MDBlocal
RAM: How Many Shards Do I Need?
Example
Working Set = 428 GB
Server RAM = 128 GB
428/128 = 3.34
4 Shards Required
#MDBlocal
IOPS: How Many Shards Do I Need?
Example
Require: 50K IOPS
AWS Instance: 20K IOPS
3 Shards Required
#MDBlocal
PlanetDollar - Shard Calculations
#MDBlocal
1. Calculate:
• Collection size
• Index size
2. Estimate Working Set
3. Use simplified model to estimate IOPS
4. Revise (working set coverage, checkpoints, etc.)
5. Calculate shards
Sizing Summary
#MDBlocal
Questions?
https://github.com/jayrunkel/mdbw2017Sizing
Ad

Recommended

RedHat Certification Track
RedHat Certification Track
ssuser113f26
 
MongoDB Atlas
MongoDB Atlas
MongoDB
 
Digital Preservation with Archivematica: An Introduction
Digital Preservation with Archivematica: An Introduction
Artefactual Systems - Archivematica
 
Seven building blocks for MDM
Seven building blocks for MDM
Kousik Mukherjee
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
DATAVERSITY
 
Couchbase 101
Couchbase 101
Dipti Borkar
 
Why is Customer Data Platform (CDP) ?
Why is Customer Data Platform (CDP) ?
Trieu Nguyen
 
CDP - 101 Everything you need to know about Customer Data Platforms
CDP - 101 Everything you need to know about Customer Data Platforms
Eddy Widerker
 
Cassandra Introduction & Features
Cassandra Introduction & Features
DataStax Academy
 
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
RomanKhavronenko
 
Handle Large Messages In Apache Kafka
Handle Large Messages In Apache Kafka
Jiangjie Qin
 
Introduction to MongoDB
Introduction to MongoDB
Mike Dirolf
 
MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks
EDB
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
ScyllaDB
 
Introduction to Apache Kafka
Introduction to Apache Kafka
Jeff Holoman
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
Ryan Blue
 
Apache Spark Architecture
Apache Spark Architecture
Alexey Grishchenko
 
MongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To Transactions
Mydbops
 
Apache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data Transport
Wes McKinney
 
Redo log improvements MYSQL 8.0
Redo log improvements MYSQL 8.0
Mydbops
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
Jiangjie Qin
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
Cloudera, Inc.
 
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Mydbops
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and Grafana
Arvind Kumar G.S
 
Spark streaming , Spark SQL
Spark streaming , Spark SQL
Yousun Jeong
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
Dongwon Kim
 
Introduction to memcached
Introduction to memcached
Jurriaan Persyn
 
Sizing MongoDB Clusters
Sizing MongoDB Clusters
MongoDB
 
Sizing MongoDB Clusters
Sizing MongoDB Clusters
MongoDB
 

More Related Content

What's hot (20)

Cassandra Introduction & Features
Cassandra Introduction & Features
DataStax Academy
 
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
RomanKhavronenko
 
Handle Large Messages In Apache Kafka
Handle Large Messages In Apache Kafka
Jiangjie Qin
 
Introduction to MongoDB
Introduction to MongoDB
Mike Dirolf
 
MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks
EDB
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
ScyllaDB
 
Introduction to Apache Kafka
Introduction to Apache Kafka
Jeff Holoman
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
Ryan Blue
 
Apache Spark Architecture
Apache Spark Architecture
Alexey Grishchenko
 
MongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To Transactions
Mydbops
 
Apache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data Transport
Wes McKinney
 
Redo log improvements MYSQL 8.0
Redo log improvements MYSQL 8.0
Mydbops
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
Jiangjie Qin
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
Cloudera, Inc.
 
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Mydbops
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and Grafana
Arvind Kumar G.S
 
Spark streaming , Spark SQL
Spark streaming , Spark SQL
Yousun Jeong
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
Dongwon Kim
 
Introduction to memcached
Introduction to memcached
Jurriaan Persyn
 
Cassandra Introduction & Features
Cassandra Introduction & Features
DataStax Academy
 
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
RomanKhavronenko
 
Handle Large Messages In Apache Kafka
Handle Large Messages In Apache Kafka
Jiangjie Qin
 
Introduction to MongoDB
Introduction to MongoDB
Mike Dirolf
 
MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks
EDB
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
ScyllaDB
 
Introduction to Apache Kafka
Introduction to Apache Kafka
Jeff Holoman
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
Ryan Blue
 
MongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To Transactions
Mydbops
 
Apache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data Transport
Wes McKinney
 
Redo log improvements MYSQL 8.0
Redo log improvements MYSQL 8.0
Mydbops
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
Jiangjie Qin
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
Cloudera, Inc.
 
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Mydbops
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and Grafana
Arvind Kumar G.S
 
Spark streaming , Spark SQL
Spark streaming , Spark SQL
Yousun Jeong
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
Dongwon Kim
 
Introduction to memcached
Introduction to memcached
Jurriaan Persyn
 

Similar to Sizing Your MongoDB Cluster (20)

Sizing MongoDB Clusters
Sizing MongoDB Clusters
MongoDB
 
Sizing MongoDB Clusters
Sizing MongoDB Clusters
MongoDB
 
Webinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDB
MongoDB
 
MongoDB Best Practices
MongoDB Best Practices
Lewis Lin 🦊
 
Capacity Planning
Capacity Planning
MongoDB
 
MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB
 
TechEd AU 2014: Microsoft Azure DocumentDB Deep Dive
TechEd AU 2014: Microsoft Azure DocumentDB Deep Dive
Intergen
 
MongoDB .local Toronto 2019: Finding the Right Atlas Cluster Size: Does this ...
MongoDB .local Toronto 2019: Finding the Right Atlas Cluster Size: Does this ...
MongoDB
 
Time Series Databases for IoT (On-premises and Azure)
Time Series Databases for IoT (On-premises and Azure)
Ivo Andreev
 
Realtime Analytics on AWS
Realtime Analytics on AWS
Sungmin Kim
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDB
MongoDB
 
Add Redis to Postgres to Make Your Microservices Go Boom!
Add Redis to Postgres to Make Your Microservices Go Boom!
Dave Nielsen
 
Mongo DB
Mongo DB
Tata Consultancy Services
 
MongoDB Evening Austin, TX 2017
MongoDB Evening Austin, TX 2017
MongoDB
 
MongoDB Aggregation Performance
MongoDB Aggregation Performance
MongoDB
 
CCI2017 - Considerations for Migrating Databases to Azure - Gianluca Sartori
CCI2017 - Considerations for Migrating Databases to Azure - Gianluca Sartori
walk2talk srl
 
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
Solving your Backup Needs - Ben Cefalo mdbe18
Solving your Backup Needs - Ben Cefalo mdbe18
MongoDB
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
Mark Kromer
 
MongoDB: What, why, when
MongoDB: What, why, when
Eugenio Minardi
 
Sizing MongoDB Clusters
Sizing MongoDB Clusters
MongoDB
 
Sizing MongoDB Clusters
Sizing MongoDB Clusters
MongoDB
 
Webinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDB
MongoDB
 
MongoDB Best Practices
MongoDB Best Practices
Lewis Lin 🦊
 
Capacity Planning
Capacity Planning
MongoDB
 
MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB
 
TechEd AU 2014: Microsoft Azure DocumentDB Deep Dive
TechEd AU 2014: Microsoft Azure DocumentDB Deep Dive
Intergen
 
MongoDB .local Toronto 2019: Finding the Right Atlas Cluster Size: Does this ...
MongoDB .local Toronto 2019: Finding the Right Atlas Cluster Size: Does this ...
MongoDB
 
Time Series Databases for IoT (On-premises and Azure)
Time Series Databases for IoT (On-premises and Azure)
Ivo Andreev
 
Realtime Analytics on AWS
Realtime Analytics on AWS
Sungmin Kim
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDB
MongoDB
 
Add Redis to Postgres to Make Your Microservices Go Boom!
Add Redis to Postgres to Make Your Microservices Go Boom!
Dave Nielsen
 
MongoDB Evening Austin, TX 2017
MongoDB Evening Austin, TX 2017
MongoDB
 
MongoDB Aggregation Performance
MongoDB Aggregation Performance
MongoDB
 
CCI2017 - Considerations for Migrating Databases to Azure - Gianluca Sartori
CCI2017 - Considerations for Migrating Databases to Azure - Gianluca Sartori
walk2talk srl
 
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
Solving your Backup Needs - Ben Cefalo mdbe18
Solving your Backup Needs - Ben Cefalo mdbe18
MongoDB
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
Mark Kromer
 
MongoDB: What, why, when
MongoDB: What, why, when
Eugenio Minardi
 
Ad

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
Ad

Recently uploaded (20)

WebdriverIO & JavaScript: The Perfect Duo for Web Automation
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
digitaljignect
 
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Priyanka Aash
 
AI vs Human Writing: Can You Tell the Difference?
AI vs Human Writing: Can You Tell the Difference?
Shashi Sathyanarayana, Ph.D
 
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
Priyanka Aash
 
Security Tips for Enterprise Azure Solutions
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
 
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC
 
Cyber Defense Matrix Workshop - RSA Conference
Cyber Defense Matrix Workshop - RSA Conference
Priyanka Aash
 
PyCon SG 25 - Firecracker Made Easy with Python.pdf
PyCon SG 25 - Firecracker Made Easy with Python.pdf
Muhammad Yuga Nugraha
 
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
caoyixuan2019
 
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
 
UserCon Belgium: Honey, VMware increased my bill
UserCon Belgium: Honey, VMware increased my bill
stijn40
 
Curietech AI in action - Accelerate MuleSoft development
Curietech AI in action - Accelerate MuleSoft development
shyamraj55
 
10 Key Challenges for AI within the EU Data Protection Framework.pdf
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
 
The Future of Product Management in AI ERA.pdf
The Future of Product Management in AI ERA.pdf
Alyona Owens
 
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik
 
You are not excused! How to avoid security blind spots on the way to production
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
 
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Josef Weingand
 
"Scaling in space and time with Temporal", Andriy Lupa.pdf
"Scaling in space and time with Temporal", Andriy Lupa.pdf
Fwdays
 
Securing AI - There Is No Try, Only Do!.pdf
Securing AI - There Is No Try, Only Do!.pdf
Priyanka Aash
 
OWASP Barcelona 2025 Threat Model Library
OWASP Barcelona 2025 Threat Model Library
PetraVukmirovic
 
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
digitaljignect
 
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Priyanka Aash
 
AI vs Human Writing: Can You Tell the Difference?
AI vs Human Writing: Can You Tell the Difference?
Shashi Sathyanarayana, Ph.D
 
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
Priyanka Aash
 
Security Tips for Enterprise Azure Solutions
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
 
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC
 
Cyber Defense Matrix Workshop - RSA Conference
Cyber Defense Matrix Workshop - RSA Conference
Priyanka Aash
 
PyCon SG 25 - Firecracker Made Easy with Python.pdf
PyCon SG 25 - Firecracker Made Easy with Python.pdf
Muhammad Yuga Nugraha
 
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
caoyixuan2019
 
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
 
UserCon Belgium: Honey, VMware increased my bill
UserCon Belgium: Honey, VMware increased my bill
stijn40
 
Curietech AI in action - Accelerate MuleSoft development
Curietech AI in action - Accelerate MuleSoft development
shyamraj55
 
10 Key Challenges for AI within the EU Data Protection Framework.pdf
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
 
The Future of Product Management in AI ERA.pdf
The Future of Product Management in AI ERA.pdf
Alyona Owens
 
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik
 
You are not excused! How to avoid security blind spots on the way to production
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
 
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Josef Weingand
 
"Scaling in space and time with Temporal", Andriy Lupa.pdf
"Scaling in space and time with Temporal", Andriy Lupa.pdf
Fwdays
 
Securing AI - There Is No Try, Only Do!.pdf
Securing AI - There Is No Try, Only Do!.pdf
Priyanka Aash
 
OWASP Barcelona 2025 Threat Model Library
OWASP Barcelona 2025 Threat Model Library
PetraVukmirovic
 

Sizing Your MongoDB Cluster

  • 1. FEBRUARY 15, 2018 | BELL HARBOR #MDBlocal Sizing MongoDB Clusters
  • 3. #MDBlocal MongoDB: 4 years NoSQL: 9 years Solution Architect: engineer assigned to sales Significant experience sizing MongoDB clusters About me
  • 5. #MDBlocal Do I need to shard? What size servers should I use? What will my monthly Atlas/AWS/Azure/Google costs be? When will I need to add a new shard or upgrade my servers? How much data can my servers support? How many queries can my servers support? Will we be able to meet our query latency requirements? Sizing
  • 6. #MDBlocal • Large coffee chain: PlanetDollar • Collect mobile app performance • Every tap, click, gesture will generate an event • 2 Year History • Perform analytics • Historical • Near real-time (executive dashboards) • Support usage • 3000 – 5000 events per second Your boss comes to you… I need a budget for the monthly Atlas costs?
  • 7. #MDBlocal • Build a prototype • Run performance tests using actual data and queries on hardware with specs similar to production servers The only accurate way to size a cluster • EVERY OTHER APPROACH IS A GUESS • Including the one I am presenting today
  • 8. #MDBlocal • Early in project, but • Need to order hardware • Estimate costs to determine “Go/No Go” decision • Schema design • Compare the hardware requirements for different schemas Sometimes, it is necessary to guess ☹
  • 9. #MDBlocal MongoDB Clusters Look Like This Config Config Config Application Driver Primary Secondary Secondary
  • 10. #MDBlocal • # of shards • Specifications of each server • CPU • Storage • Size • Performance: IOPS • Memory • Network Our Solution Will Consist Of
  • 11. #MDBlocal • Sizing Objective ✓ • IOPS, Query Processing, Working Set • Sizing Methodology with an Example Agenda
  • 13. #MDBlocal • # of shards • Specifications of each server • CPU • Storage • Size • Performance: IOPS • Memory • Network Our Solution Will Consist Of
  • 14. #MDBlocal • IOPS – input output units per second • Throughput • Random access • Most workloads “randomly” access documents IOPS collection
  • 15. #MDBlocal Storage Performance Type IOPS 7200 rpm SATA ~ 75 – 100 15000 rpm SAS ~ 175 – 210 RAID-10 (24 x 7200 RPM SAS) 2000 Amazon EBS 250 – 500 Amazon EBS Provisioned IOPS 10000 - 20000 SSD 50000 Flash Storage 100K – 400K (or more) http://en.wikipedia.org/wiki/IOPS
  • 16. #MDBlocal • How many IOPS do we need? • Want the real answer, run a test • How to estimate? Hardest Part of Sizing is IOPS
  • 17. #MDBlocal Processing a Query Select Index Load relevant index entries from disk Identify documents using index Retrieve documents from disk Filter documents Return Documents
  • 18. #MDBlocal Processing a Query IO Select Index Load relevant index entries from disk Identify documents using index Retrieve documents from disk Filter documents Return Documents
  • 19. #MDBlocal But MongoDB Has a Cache File System indexes collections CPU Memory indexes documents Disk access is only necessary if indexes or documents are not in cache Select Index Load relevant index entries from disk Identify documents using index Retrieve documents from disk Filter documents Return Documents
  • 20. #MDBlocal Working Set Working Set = indexes plus frequently accessed documents If RAM greater than working set then reduced IO Select Index Load relevant index entries from disk Identify documents using index Retrieve documents from disk Filter documents Return Documents File System indexes collections CPU Memory indexes documents
  • 21. #MDBlocal This is all great, but how do we estimate IOPS?
  • 22. #MDBlocal Assume • Working Set < RAM < Data Size • Memory contains indexes only MongoDB Simplified Model File System indexes collections CPU Memory indexes documents
  • 23. #MDBlocal Assume appropriate indexes To resolve find: • Navigate in-memory indexes • Retrieve document from disk 1 IOP per document returned Find Queries With Simplified Model File System indexes collections CPU Memory indexes documents
  • 24. #MDBlocal Assume appropriate indexes To resolve find: • Navigate in-memory indexes • Retrieve document from disk 1 IOP per document returned Find Queries With Simplified Model File System indexes collections CPU Memory indexes documents
  • 25. #MDBlocal To resolve insert: • Write document to disk • Update each index file IOPS = 1 + # of indexes Inserts With Simplified Model File System indexes collections CPU Memory indexes documents
  • 26. #MDBlocal To resolve delete: • Navigate in-memory indexes • Mark document deleted • Update each index file IOPS = 1 + # of indexes Deletes With Simplified Model File System indexes collections CPU Memory indexes documents
  • 27. #MDBlocal To resolve delete: • Navigate in-memory indexes • Mark document deleted • Insert new document version • Update each index file IOPS = 2 + # of indexes Updates With Simplified Model File System indexes collections CPU Memory indexes documents
  • 28. #MDBlocal • Working Set • Checkpoints • Document size relative to block size • Indexed Arrays • Journal, Log The Simplified Model is too simplistic
  • 29. #MDBlocal • WiredTiger write process: 1. Update document in RAM (cache) 2. Write to journal (disk) 3. Periodically, write dirty documents to disk (checkpoint) • 60 seconds or 2 GB (whichever comes first) Checkpoints Checkpoint 1 Checkpoint 2 Checkpoint 3 B C A A C A 3 writes 3 documents written 3 writes 2 documents written
  • 30. #MDBlocal • Estimate total requirements (using simplified model): • RAM • CPU • Disk Space • IOPS • Adjust based upon working set, checkpoints, etc. • Design (sharded) cluster that provides these totals How are we going to get there?
  • 33. #MDBlocal 1.Collection Size 2.Working Set 3.Queries -> IOPS 4.Adjust based upon working set, checkpoints, etc. 5.Calculate # of shards 6.Review, iterate, repeat Methodology (cont.) Build a spread sheet Multiple iterations may be required
  • 34. #MDBlocal 1.Assumptions 1.Data Size 1.Working Set • Index Size • Frequently Accessed Documents 1.Queries – IOPS 1.Shard Calculations Sizing Spreadsheet
  • 35. #MDBlocal 1.Assumptions 1.Data Size 1.Working Set • Index Size • Frequently Accessed Documents 2. Queries – IOPS 1. Shard Calculations Sizing Spreadsheet
  • 38. #MDBlocal Sizing Spreadsheet 1.Assumptions 1.Data Size 1.Working Set • Index Size • Frequently Accessed Documents 2. Queries – IOPS 1. Shard Calculations
  • 39. #MDBlocal • # of documents • Data size • Index size • WT compression Collection Analysis
  • 40. #MDBlocal Calculate The Number of Documents Application Description # of Documents in Collection There will be 20M documents in the collection by the end of 2017 20,000,000 We expect to insert 10K documents per day with 1 year retention period 365*10,000 = 3,655,000 We have 3000 devices each producing 1 event per minute and we need to keep a 90 day history 3000 * 60 * 24 * 90 = 388,800,000
  • 41. #MDBlocal Calculate The Number of Documents Application Description # of Documents in Collection There will be 20M documents in the collection by the end of 2017 20,000,000 We expect to insert 10K documents per day with 1 year retention period 365*10,000 = 3,655,000 We have 3000 devices each producing 1 event per minute and we need to keep a 90 day history 3000 * 60 * 24 * 90 = 388,800,000 PlanetDollar: 2 year history. Each day 5000 inserts per second for 5 hours and 3000 inserts per second for 19 hours 2*365*(5000*5*3600 + 3000*19*3600) = 215,496,000,000
  • 42. #MDBlocal • Data Size = # of documents * Average document size • This information is available in db.stats(), Compass, Ops Manager, Cloud Manager, Atlas, etc. Calculate the Data Size
  • 43. #MDBlocal • Write some code • Programmatically generate a large data set • 5-10% of expected size • Measure • Collection size • Index size • Compression What if there aren’t any documents?
  • 44. #MDBlocal • Use db.collection.stats() • Take data size, index size and extrapolate to production size • Calculate compression ratio db.collection.stats() { count: 10000 size: 70,388,956 avgObjSize: 7038 storageSize: 25341952 … totalIndexSize: 147456 } Determine collection and data size Parameter Formula Value # of documents 2.5B avgObjSize 7038 Collection Size =2.5B * 7038 1.760E13 Bytes WT Compression = 25341952/70388956 .36 Collection Storage =2.5B * 7038 * .36 6.33E12 Bytes Index Size Per Doc = 147456 / 10000 15 Bytes Collection Index Size =2.5B * 15 /1024^3 35 GB
  • 46. #MDBlocal Sizing Spreadsheet 1.Assumptions 1.Data Size 1.Working Set • Index Size • Frequently Accessed Documents 1.Queries – IOPS 1.Shard Calculations
  • 47. #MDBlocal WorkSet = Indexes plus the set of documents accessed frequently We know the index size from previous analysis Estimate the frequently accessed documents Given the queries What are the frequently accessed docs? Working Set File System collections indexes CPU Memory indexes documents
  • 48. #MDBlocal Query Analysis Dashboards – last minute of data Customer support - last hour of data Reports (run once per day) inspect last years worth of data Active Documents = 1 hours worth of data PlanetDollar Working Set
  • 50. #MDBlocal 1.Assumptions 1.Data Size 1.Working Set • Index Size • Frequently Accessed Documents 1.Queries – IOPS 1.Shard Calculations Sizing Spreadsheet
  • 51. #MDBlocal + # of documents returned per second + # of documents updated per second + # of indexes impacted by each update + # of inserts per second + # of indexes impacted by each insert + # of deletes per second (x2) + # of indexes impacted by each delete - Multiple updates occurring within checkpoint - % of find query results in cache Total IOPS IOPS Calculation
  • 52. #MDBlocal • 5000 inserts per second • 5000 deletes per second Dashboards (aggregations: 100 per minute) • Total events per minute across all users (current minute) • Total events per minute per region (current minute) • Total events per store per minute (current minute) Debugging Tool (ad hoc – 5 per second) • Find all events for a user in last 60 minutes (100 events returned, on average) PlanetDollar Queries
  • 53. #MDBlocal • Each insert: • Update collection • Update each index (3 indexes) • Each Delete: • Update collection • Update each index (3 indexes) • 5000 inserts/sec • 5000 deletes/sec IOPS for inserts and deletes 4 IOPS 5 IOPS (4 * 5000) + (5 * 5000) = 45000 IOPS
  • 54. #MDBlocal • Example: Total events per minute across all users (current minute) • How many documents will be read from disk? IOPS for PlanetDollar Aggregations 05000 per second * 60 seconds = 300,000 Most data in cache Some IOPS will likely be require
  • 55. #MDBlocal • Find all events for a user in last 60 minutes • 5 per second • 100 documents per query • # IOPS = 5 * 100 = 500 IOPS IOPS For Find
  • 57. #MDBlocal • CPU utilized for: • Compress/decompress • Encrypt/Decrypt • Aggregation queries • General query processing • In most cases, RAM requirements → large servers → many cores • Possible exception: aggregation queries • One core per query • # cores >> # of simultaneous aggregation queries How Many CPUs Do I Need?
  • 58. #MDBlocal 1.Assumptions 1.Data Size 1.Working Set • Index Size • Frequently Accessed Documents 1.Queries – IOPS 1.Shard Calculations Sizing Spreadsheet
  • 59. #MDBlocal • At this point you have: 1. Required storage capacity 2. Working Set Size 3. IOPS Estimate 4. Some idea about class of server (or VM) the customer plans to deploy • Determine number of required shards Shard Calculations
  • 60. #MDBlocal • Sum of disk space across shards greater than required storage size Disk Space: How Many Shards Do I Need? Example Data Size = 9 TB WiredTiger Compression Ratio: .33 Storage size = 3 TB Req Disk: 6 TB Server disk capacity = 2 TB 3 Shards Required Recommend providing 2X the compressed data size in disk
  • 61. #MDBlocal RAM: How Many Shards Do I Need? Example Working Set = 428 GB Server RAM = 128 GB 428/128 = 3.34 4 Shards Required
  • 62. #MDBlocal IOPS: How Many Shards Do I Need? Example Require: 50K IOPS AWS Instance: 20K IOPS 3 Shards Required
  • 64. #MDBlocal 1. Calculate: • Collection size • Index size 2. Estimate Working Set 3. Use simplified model to estimate IOPS 4. Revise (working set coverage, checkpoints, etc.) 5. Calculate shards Sizing Summary