SlideShare a Scribd company logo
Production
MongoDB
in the Cloud
From Essentials
to Corner Cases
Who are we?
Mike Hobbs & Bridget Kromhout
Social Commerce
&
Brand Interest Graph Analytics
Why MongoDB?
● Scalable, high-performance, open source
● Dynamic schemas for unstructured data
● Query language close to SQL in power
● "Eventually consistent" is hard to program right
Our configuration
12-node cluster (4 shards x 3 replica sets)
Several other non-sharded replica sets
Desired webapp response time is < 10ms
Total data size: 110 GB
Total index size: 28 GB
Largest collection: 49 GB
Largest index: 8.1 GB
EC2: EBS, instance size, replication
MongoDB: right for only some data sets
Memory & iowait
Working set needs to fit in memory
● Indexes
● Frequently accessed records
Avoid swapping!!!
EBS latency in EC2 is an issue.
Fragmentation
Fragmentation steals from your most precious resource by
reserving memory that is not used.
Run a compaction when your storageSize significantly
exceeds your data size
mongos> db.widgets.stats()
...
"size" : 5097988,
"storageSize" : 22507520,
Padding can reduce fragmentation and I/O
db.widgets.insert({widg_id: "72120", padding: "XXXX...XXX"})
db.widgets.update({widg_id: "72120"}, {
$unset: {padding: ""},
$set: {desc: "Grout remover", price: "13.39", instock:
true}
})
Replica sets
"optime" : { "t" : 1365165841000 , "i" : 1 },
"optimeDate" : { "$date" : "Fri Apr 5 07:44:01 2013" },
test-3-1.yourdomain
test-3-2.yourdomain
test-3-3.yourdomain
test-3-1.yourdomain
test
Elections
08:52:06 [rsMgr] can't see a majority of the set, relinquishing
primary
08:52:06 [rsMgr] replSet relinquishing primary state
08:52:06 [rsMgr] replSet SECONDARY
08:52:12 [rsMgr] replSet can't see a majority, will not try to
elect self
Primary always determined by an election.
2-member replSet without an arbiter: if the secondary goes
offline, the primary will step down:
Priorities can rig elections.
Ensure availability of an odd number of voting members.
Manual primary changes
No "become primary now" command.
Manual stepdowns with recusal timeout are best option.
test-1:PRIMARY> rs.stepDown(300)
Wed Apr 3 11:45:36 DBClientCursor::init call() failed
Wed Apr 3 11:45:36 query failed : admin.$cmd { replSetStepDown: 300.0 } to:
127.0.0.1:27017
Wed Apr 3 11:45:36 Error: error doing query: failed
src/mongo/shell/collection.js:155
Wed Apr 3 11:45:36 trying reconnect to 127.0.0.1:27017
Wed Apr 3 11:45:36 reconnect 127.0.0.1:27017 ok
test-1:SECONDARY>
This triggers an election.
(Obviously, make sure your preferred candidate(s) can win.)
States: down (initializing), startup2, secondary, primary
replSet back to standalone? No.
Test server: replicaset of 1, shard of 1. removed --replSet but
shard configuration needed manual update:
db.shards.update({host:"testreplset/test.domain.net"}, {$set:
{host:"test.domain.net"}})
UpdatedExisting values no longer returned by mongos, but
visible when connected to mongod:
> db.schedule.update({_id:...}, {$set:{lock:true}}, false, true);
db.runCommand("getlasterror")
{
"updatedExisting" : true,
"n" : 1,
"connectionId" : 73,
"err" : null,
"ok" : 1
}
Solution: re-adding --replSet to the mongod startup line and
reverting shard configs. (Bug open with 10gen.)
Sharding
Can increase parallelization of CPU & I/O
Carefully choose a shard key (nontrivial to change)
Must run config servers & mongos
Doesn't ensure high availability
Doesn't help if you're already out of memory
256GB collection max
for initial sharding
Rebalancing data across shards
Queries block while servers
negotiate final hand-off.
Updating indexes after hand-
off can be slow.
Best run off-peak
mongos> use config
switched to db config
mongos> db.settings.find()
{ "_id" : "balancer", "activeWindow" :
{ "start" : "23:00", "stop" : "6:00" }
}
Mongos & replSet primary changes
Application-level errors talking to mongos after an election:
pymongo.errors.AutoReconnect: could not connect to localhost:
27020: [Errno 111] Connection refused
pymongo.errors.OperationFailure: database error: error
querying server
Mongos errors talking to mongod on original primary:
Tue Apr 2 09:01:05 [conn3288] Socket say send() errno:110
Connection timed out 10.141.131.214:27017
Tue Apr 2 09:01:05 [conn3288] DBException in process: socket
exception [SEND_ERROR] for 10.141.131.214:27017
Connection pool checked lazily; invalid connections can persist
for days, depending on load. Can clear manually:
mongos> db.adminCommand({connPoolSync:1});
{ "ok" : 1 }
mongos>
Failure handling
Applications must handle fail-over outages:
AutoReconnect & OperationFailure in pymongo
def auto_reconnect(func, *args, **kwargs):
""" Executes func, retrying on AutoReconnect """
for _ in range(100):
try:
return func(self, *args, **kwargs)
except pymongo.errors.AutoReconnect:
pass
except pymongo.errors.OperationFailure:
pass
time.sleep(0.1)
raise TimeoutError()
MMS (MongoDB Monitoring Service)
● free; hosted by 10gen
● need to run agent locally
● 10gen's commercial support relies on MMS
Profiling queries [1]
Finding bad queries that are actively running:
$ mongo | tee mongo.log
> db.currentOp()
...
bye
$ grep numYields mongo.log
"numYields" : 0,
"numYields" : 62247,
"numYields" : 0,
...
# Use your favorite viewer to find the op with 62247 yields
Helpful to get server back to a responsive state:
$ mongo
> db.killOp(10883898)
Profiling queries [2]
Using nscanned to find queries that likely aren't
using indexes:
$ grep -P 'nscanned:dd' /var/log/mongodb.log
... or in real-time:
$ tail -f /var/log/mongodb.log | grep -P 'nscanned:dd'
MongoDB also provides the setProfilingLevel()
command which can log all queries to system.profile
collection.
> db.system.profile.find({nscanned:{$gte:10}})
system.profile does incur some performance
overhead, though.
Nagios
● plugin uses pymongo
● set up service groups
Ideas for the future
● Better reconnect handling in applications
● Lose the EBS? Ephemeral disk faster; rely
on replication to keep data persistent.
● Intelligent use of mongo profiling (reduce
observer effect of setProfilingLevel)
● Use more MMS alerts
● Going to 2.4.x (fast counts, hashed
sharding)
Production MongoDB in the Cloud
Ad

More Related Content

What's hot (19)

Advanced Postgres Monitoring
Advanced Postgres MonitoringAdvanced Postgres Monitoring
Advanced Postgres Monitoring
Denish Patel
 
MongoDB Database Replication
MongoDB Database ReplicationMongoDB Database Replication
MongoDB Database Replication
Mehdi Valikhani
 
Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp KrennJavantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
HUJAK - Hrvatska udruga Java korisnika / Croatian Java User Association
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
Alexey Lesovsky
 
Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)
Denish Patel
 
Mysql data replication
Mysql data replicationMysql data replication
Mysql data replication
Tuấn Ngô
 
Managing PostgreSQL with PgCenter
Managing PostgreSQL with PgCenterManaging PostgreSQL with PgCenter
Managing PostgreSQL with PgCenter
Alexey Lesovsky
 
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
Alexey Lesovsky
 
Advanced VCL: how to use restart
Advanced VCL: how to use restartAdvanced VCL: how to use restart
Advanced VCL: how to use restart
Fastly
 
Basic Knowledge on MySql Replication
Basic Knowledge on MySql ReplicationBasic Knowledge on MySql Replication
Basic Knowledge on MySql Replication
Tasawr Interactive
 
Query logging with proxysql
Query logging with proxysqlQuery logging with proxysql
Query logging with proxysql
YoungHeon (Roy) Kim
 
Justin Corbin Portfolio Labs
Justin Corbin Portfolio LabsJustin Corbin Portfolio Labs
Justin Corbin Portfolio Labs
Justin Corbin
 
My sql failover test using orchestrator
My sql failover test  using orchestratorMy sql failover test  using orchestrator
My sql failover test using orchestrator
YoungHeon (Roy) Kim
 
What is new in PostgreSQL 14?
What is new in PostgreSQL 14?What is new in PostgreSQL 14?
What is new in PostgreSQL 14?
Mydbops
 
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
ronwarshawsky
 
Sge
SgeSge
Sge
Chris Roeder
 
Intro to MySQL Master Slave Replication
Intro to MySQL Master Slave ReplicationIntro to MySQL Master Slave Replication
Intro to MySQL Master Slave Replication
satejsahu
 
PostgreSQL Replication Tutorial
PostgreSQL Replication TutorialPostgreSQL Replication Tutorial
PostgreSQL Replication Tutorial
Hans-Jürgen Schönig
 
Database Tools by Skype
Database Tools by SkypeDatabase Tools by Skype
Database Tools by Skype
elliando dias
 
Advanced Postgres Monitoring
Advanced Postgres MonitoringAdvanced Postgres Monitoring
Advanced Postgres Monitoring
Denish Patel
 
MongoDB Database Replication
MongoDB Database ReplicationMongoDB Database Replication
MongoDB Database Replication
Mehdi Valikhani
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
Alexey Lesovsky
 
Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)
Denish Patel
 
Mysql data replication
Mysql data replicationMysql data replication
Mysql data replication
Tuấn Ngô
 
Managing PostgreSQL with PgCenter
Managing PostgreSQL with PgCenterManaging PostgreSQL with PgCenter
Managing PostgreSQL with PgCenter
Alexey Lesovsky
 
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
Alexey Lesovsky
 
Advanced VCL: how to use restart
Advanced VCL: how to use restartAdvanced VCL: how to use restart
Advanced VCL: how to use restart
Fastly
 
Basic Knowledge on MySql Replication
Basic Knowledge on MySql ReplicationBasic Knowledge on MySql Replication
Basic Knowledge on MySql Replication
Tasawr Interactive
 
Justin Corbin Portfolio Labs
Justin Corbin Portfolio LabsJustin Corbin Portfolio Labs
Justin Corbin Portfolio Labs
Justin Corbin
 
My sql failover test using orchestrator
My sql failover test  using orchestratorMy sql failover test  using orchestrator
My sql failover test using orchestrator
YoungHeon (Roy) Kim
 
What is new in PostgreSQL 14?
What is new in PostgreSQL 14?What is new in PostgreSQL 14?
What is new in PostgreSQL 14?
Mydbops
 
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
ronwarshawsky
 
Intro to MySQL Master Slave Replication
Intro to MySQL Master Slave ReplicationIntro to MySQL Master Slave Replication
Intro to MySQL Master Slave Replication
satejsahu
 
Database Tools by Skype
Database Tools by SkypeDatabase Tools by Skype
Database Tools by Skype
elliando dias
 

Similar to Production MongoDB in the Cloud (20)

Performance Optimization of Rails Applications
Performance Optimization of Rails ApplicationsPerformance Optimization of Rails Applications
Performance Optimization of Rails Applications
Serge Smetana
 
MongoDB World 2019: Becoming an Ops Manager Backup Superhero!
MongoDB World 2019: Becoming an Ops Manager Backup Superhero!MongoDB World 2019: Becoming an Ops Manager Backup Superhero!
MongoDB World 2019: Becoming an Ops Manager Backup Superhero!
MongoDB
 
Kubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard wayKubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard way
Laurent Bernaille
 
Java one 2015 [con3339]
Java one 2015 [con3339]Java one 2015 [con3339]
Java one 2015 [con3339]
Arshal Ameen
 
Getting started with replica set in MongoDB
Getting started with replica set in MongoDBGetting started with replica set in MongoDB
Getting started with replica set in MongoDB
Kishor Parkhe
 
Velocity 2018 preetha appan final
Velocity 2018   preetha appan finalVelocity 2018   preetha appan final
Velocity 2018 preetha appan final
preethaappan
 
Hidden Gems of Performance Tuning: Hierarchical Profiler and DML Trigger Opti...
Hidden Gems of Performance Tuning: Hierarchical Profiler and DML Trigger Opti...Hidden Gems of Performance Tuning: Hierarchical Profiler and DML Trigger Opti...
Hidden Gems of Performance Tuning: Hierarchical Profiler and DML Trigger Opti...
Michael Rosenblum
 
From crash to testcase
From crash to testcaseFrom crash to testcase
From crash to testcase
Roel Van de Paar
 
Ceph Day Shanghai - Ceph Performance Tools
Ceph Day Shanghai - Ceph Performance Tools Ceph Day Shanghai - Ceph Performance Tools
Ceph Day Shanghai - Ceph Performance Tools
Ceph Community
 
Introduction to performance tuning perl web applications
Introduction to performance tuning perl web applicationsIntroduction to performance tuning perl web applications
Introduction to performance tuning perl web applications
Perrin Harkins
 
MongoDB .local Bengaluru 2019: Becoming an Ops Manager Backup Superhero!
MongoDB .local Bengaluru 2019: Becoming an Ops Manager Backup Superhero!MongoDB .local Bengaluru 2019: Becoming an Ops Manager Backup Superhero!
MongoDB .local Bengaluru 2019: Becoming an Ops Manager Backup Superhero!
MongoDB
 
Introduction to MySQL InnoDB Cluster
Introduction to MySQL InnoDB ClusterIntroduction to MySQL InnoDB Cluster
Introduction to MySQL InnoDB Cluster
I Goo Lee
 
Optimize the obvious
Optimize the obviousOptimize the obvious
Optimize the obvious
drhenner
 
Mongodb replication
Mongodb replicationMongodb replication
Mongodb replication
PoguttuezhiniVP
 
MySQL Cluster 7.3 Performance Tuning - Severalnines Slides
MySQL Cluster 7.3 Performance Tuning - Severalnines SlidesMySQL Cluster 7.3 Performance Tuning - Severalnines Slides
MySQL Cluster 7.3 Performance Tuning - Severalnines Slides
Severalnines
 
Mock cli with Python unittest
Mock cli with Python unittestMock cli with Python unittest
Mock cli with Python unittest
Song Jin
 
2013 london advanced-replication
2013 london advanced-replication2013 london advanced-replication
2013 london advanced-replication
Marc Schwering
 
Top Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your PipelineTop Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your Pipeline
Andreas Grabner
 
基于Mongodb的压力评测工具 ycsb的一些概括
基于Mongodb的压力评测工具 ycsb的一些概括基于Mongodb的压力评测工具 ycsb的一些概括
基于Mongodb的压力评测工具 ycsb的一些概括
Louis liu
 
Windows Azure Acid Test
Windows Azure Acid TestWindows Azure Acid Test
Windows Azure Acid Test
expanz
 
Performance Optimization of Rails Applications
Performance Optimization of Rails ApplicationsPerformance Optimization of Rails Applications
Performance Optimization of Rails Applications
Serge Smetana
 
MongoDB World 2019: Becoming an Ops Manager Backup Superhero!
MongoDB World 2019: Becoming an Ops Manager Backup Superhero!MongoDB World 2019: Becoming an Ops Manager Backup Superhero!
MongoDB World 2019: Becoming an Ops Manager Backup Superhero!
MongoDB
 
Kubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard wayKubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard way
Laurent Bernaille
 
Java one 2015 [con3339]
Java one 2015 [con3339]Java one 2015 [con3339]
Java one 2015 [con3339]
Arshal Ameen
 
Getting started with replica set in MongoDB
Getting started with replica set in MongoDBGetting started with replica set in MongoDB
Getting started with replica set in MongoDB
Kishor Parkhe
 
Velocity 2018 preetha appan final
Velocity 2018   preetha appan finalVelocity 2018   preetha appan final
Velocity 2018 preetha appan final
preethaappan
 
Hidden Gems of Performance Tuning: Hierarchical Profiler and DML Trigger Opti...
Hidden Gems of Performance Tuning: Hierarchical Profiler and DML Trigger Opti...Hidden Gems of Performance Tuning: Hierarchical Profiler and DML Trigger Opti...
Hidden Gems of Performance Tuning: Hierarchical Profiler and DML Trigger Opti...
Michael Rosenblum
 
Ceph Day Shanghai - Ceph Performance Tools
Ceph Day Shanghai - Ceph Performance Tools Ceph Day Shanghai - Ceph Performance Tools
Ceph Day Shanghai - Ceph Performance Tools
Ceph Community
 
Introduction to performance tuning perl web applications
Introduction to performance tuning perl web applicationsIntroduction to performance tuning perl web applications
Introduction to performance tuning perl web applications
Perrin Harkins
 
MongoDB .local Bengaluru 2019: Becoming an Ops Manager Backup Superhero!
MongoDB .local Bengaluru 2019: Becoming an Ops Manager Backup Superhero!MongoDB .local Bengaluru 2019: Becoming an Ops Manager Backup Superhero!
MongoDB .local Bengaluru 2019: Becoming an Ops Manager Backup Superhero!
MongoDB
 
Introduction to MySQL InnoDB Cluster
Introduction to MySQL InnoDB ClusterIntroduction to MySQL InnoDB Cluster
Introduction to MySQL InnoDB Cluster
I Goo Lee
 
Optimize the obvious
Optimize the obviousOptimize the obvious
Optimize the obvious
drhenner
 
MySQL Cluster 7.3 Performance Tuning - Severalnines Slides
MySQL Cluster 7.3 Performance Tuning - Severalnines SlidesMySQL Cluster 7.3 Performance Tuning - Severalnines Slides
MySQL Cluster 7.3 Performance Tuning - Severalnines Slides
Severalnines
 
Mock cli with Python unittest
Mock cli with Python unittestMock cli with Python unittest
Mock cli with Python unittest
Song Jin
 
2013 london advanced-replication
2013 london advanced-replication2013 london advanced-replication
2013 london advanced-replication
Marc Schwering
 
Top Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your PipelineTop Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your Pipeline
Andreas Grabner
 
基于Mongodb的压力评测工具 ycsb的一些概括
基于Mongodb的压力评测工具 ycsb的一些概括基于Mongodb的压力评测工具 ycsb的一些概括
基于Mongodb的压力评测工具 ycsb的一些概括
Louis liu
 
Windows Azure Acid Test
Windows Azure Acid TestWindows Azure Acid Test
Windows Azure Acid Test
expanz
 
Ad

More from bridgetkromhout (20)

An introduction to Helm - KubeCon EU 2020
An introduction to Helm - KubeCon EU 2020An introduction to Helm - KubeCon EU 2020
An introduction to Helm - KubeCon EU 2020
bridgetkromhout
 
Join Our Party: The Cloud Native Adventure Brigade (Kubernetes Belgium 2019)
Join Our Party: The Cloud Native Adventure Brigade (Kubernetes Belgium 2019)Join Our Party: The Cloud Native Adventure Brigade (Kubernetes Belgium 2019)
Join Our Party: The Cloud Native Adventure Brigade (Kubernetes Belgium 2019)
bridgetkromhout
 
devops, distributed (devopsdays Ghent 2019)
devops, distributed (devopsdays Ghent 2019)devops, distributed (devopsdays Ghent 2019)
devops, distributed (devopsdays Ghent 2019)
bridgetkromhout
 
Join Our Party: The Cloud Native Adventure Brigade (devopsdays Philly 2019)
Join Our Party: The Cloud Native Adventure Brigade (devopsdays Philly 2019)Join Our Party: The Cloud Native Adventure Brigade (devopsdays Philly 2019)
Join Our Party: The Cloud Native Adventure Brigade (devopsdays Philly 2019)
bridgetkromhout
 
Join Our Party: The Cloud Native Adventure Brigade (TCSW 2019)
Join Our Party: The Cloud Native Adventure Brigade (TCSW 2019)Join Our Party: The Cloud Native Adventure Brigade (TCSW 2019)
Join Our Party: The Cloud Native Adventure Brigade (TCSW 2019)
bridgetkromhout
 
Increasing Reliability via Helm Pre-Release Checks (Helm Summit 2019)
Increasing Reliability via Helm Pre-Release Checks (Helm Summit 2019)Increasing Reliability via Helm Pre-Release Checks (Helm Summit 2019)
Increasing Reliability via Helm Pre-Release Checks (Helm Summit 2019)
bridgetkromhout
 
Kubernetes for the Impatient (devopsdays Cape Town 2019)
Kubernetes for the Impatient (devopsdays Cape Town 2019)Kubernetes for the Impatient (devopsdays Cape Town 2019)
Kubernetes for the Impatient (devopsdays Cape Town 2019)
bridgetkromhout
 
Join Our Party: The Cloud Native Adventure Brigade (OSS 2019)
Join Our Party: The Cloud Native Adventure Brigade (OSS 2019)Join Our Party: The Cloud Native Adventure Brigade (OSS 2019)
Join Our Party: The Cloud Native Adventure Brigade (OSS 2019)
bridgetkromhout
 
Helm 3: Navigating To Distant Shores (OSS NA 2019)
Helm 3: Navigating To Distant Shores (OSS NA 2019)Helm 3: Navigating To Distant Shores (OSS NA 2019)
Helm 3: Navigating To Distant Shores (OSS NA 2019)
bridgetkromhout
 
Helm 3: Navigating to Distant Shores (OSCON 2019)
Helm 3: Navigating to Distant Shores (OSCON 2019)Helm 3: Navigating to Distant Shores (OSCON 2019)
Helm 3: Navigating to Distant Shores (OSCON 2019)
bridgetkromhout
 
Kubernetes for the Impatient (Velocity San Jose 2019)
Kubernetes for the Impatient (Velocity San Jose 2019)Kubernetes for the Impatient (Velocity San Jose 2019)
Kubernetes for the Impatient (Velocity San Jose 2019)
bridgetkromhout
 
Community projects inform enterprise products (Velocity San Jose 2019)
Community projects inform enterprise products (Velocity San Jose 2019)Community projects inform enterprise products (Velocity San Jose 2019)
Community projects inform enterprise products (Velocity San Jose 2019)
bridgetkromhout
 
Helm 3: Navigating to Distant Shores (KubeCon EU 2019)
Helm 3: Navigating to Distant Shores (KubeCon EU 2019)Helm 3: Navigating to Distant Shores (KubeCon EU 2019)
Helm 3: Navigating to Distant Shores (KubeCon EU 2019)
bridgetkromhout
 
Kubernetes Operability Tooling (GOTO Chicago 2019)
Kubernetes Operability Tooling (GOTO Chicago 2019)Kubernetes Operability Tooling (GOTO Chicago 2019)
Kubernetes Operability Tooling (GOTO Chicago 2019)
bridgetkromhout
 
Kubernetes Operability Tooling (Minnebar 2019)
Kubernetes Operability Tooling (Minnebar 2019)Kubernetes Operability Tooling (Minnebar 2019)
Kubernetes Operability Tooling (Minnebar 2019)
bridgetkromhout
 
Livetweeting Tech Conferences - SREcon Americas 2019
Livetweeting Tech Conferences - SREcon Americas 2019Livetweeting Tech Conferences - SREcon Americas 2019
Livetweeting Tech Conferences - SREcon Americas 2019
bridgetkromhout
 
Kubernetes Operability Tooling (devopsdays Seattle 2019)
Kubernetes Operability Tooling (devopsdays Seattle 2019)Kubernetes Operability Tooling (devopsdays Seattle 2019)
Kubernetes Operability Tooling (devopsdays Seattle 2019)
bridgetkromhout
 
Kubernetes Operability Tooling (LEAP 2019)
Kubernetes Operability Tooling (LEAP 2019)Kubernetes Operability Tooling (LEAP 2019)
Kubernetes Operability Tooling (LEAP 2019)
bridgetkromhout
 
Day 2 Kubernetes - Tools for Operability (KubeCon)
Day 2 Kubernetes - Tools for Operability (KubeCon)Day 2 Kubernetes - Tools for Operability (KubeCon)
Day 2 Kubernetes - Tools for Operability (KubeCon)
bridgetkromhout
 
Cloud, Containers, Kubernetes (YOW Melbourne 2018)
Cloud, Containers, Kubernetes (YOW Melbourne 2018)Cloud, Containers, Kubernetes (YOW Melbourne 2018)
Cloud, Containers, Kubernetes (YOW Melbourne 2018)
bridgetkromhout
 
An introduction to Helm - KubeCon EU 2020
An introduction to Helm - KubeCon EU 2020An introduction to Helm - KubeCon EU 2020
An introduction to Helm - KubeCon EU 2020
bridgetkromhout
 
Join Our Party: The Cloud Native Adventure Brigade (Kubernetes Belgium 2019)
Join Our Party: The Cloud Native Adventure Brigade (Kubernetes Belgium 2019)Join Our Party: The Cloud Native Adventure Brigade (Kubernetes Belgium 2019)
Join Our Party: The Cloud Native Adventure Brigade (Kubernetes Belgium 2019)
bridgetkromhout
 
devops, distributed (devopsdays Ghent 2019)
devops, distributed (devopsdays Ghent 2019)devops, distributed (devopsdays Ghent 2019)
devops, distributed (devopsdays Ghent 2019)
bridgetkromhout
 
Join Our Party: The Cloud Native Adventure Brigade (devopsdays Philly 2019)
Join Our Party: The Cloud Native Adventure Brigade (devopsdays Philly 2019)Join Our Party: The Cloud Native Adventure Brigade (devopsdays Philly 2019)
Join Our Party: The Cloud Native Adventure Brigade (devopsdays Philly 2019)
bridgetkromhout
 
Join Our Party: The Cloud Native Adventure Brigade (TCSW 2019)
Join Our Party: The Cloud Native Adventure Brigade (TCSW 2019)Join Our Party: The Cloud Native Adventure Brigade (TCSW 2019)
Join Our Party: The Cloud Native Adventure Brigade (TCSW 2019)
bridgetkromhout
 
Increasing Reliability via Helm Pre-Release Checks (Helm Summit 2019)
Increasing Reliability via Helm Pre-Release Checks (Helm Summit 2019)Increasing Reliability via Helm Pre-Release Checks (Helm Summit 2019)
Increasing Reliability via Helm Pre-Release Checks (Helm Summit 2019)
bridgetkromhout
 
Kubernetes for the Impatient (devopsdays Cape Town 2019)
Kubernetes for the Impatient (devopsdays Cape Town 2019)Kubernetes for the Impatient (devopsdays Cape Town 2019)
Kubernetes for the Impatient (devopsdays Cape Town 2019)
bridgetkromhout
 
Join Our Party: The Cloud Native Adventure Brigade (OSS 2019)
Join Our Party: The Cloud Native Adventure Brigade (OSS 2019)Join Our Party: The Cloud Native Adventure Brigade (OSS 2019)
Join Our Party: The Cloud Native Adventure Brigade (OSS 2019)
bridgetkromhout
 
Helm 3: Navigating To Distant Shores (OSS NA 2019)
Helm 3: Navigating To Distant Shores (OSS NA 2019)Helm 3: Navigating To Distant Shores (OSS NA 2019)
Helm 3: Navigating To Distant Shores (OSS NA 2019)
bridgetkromhout
 
Helm 3: Navigating to Distant Shores (OSCON 2019)
Helm 3: Navigating to Distant Shores (OSCON 2019)Helm 3: Navigating to Distant Shores (OSCON 2019)
Helm 3: Navigating to Distant Shores (OSCON 2019)
bridgetkromhout
 
Kubernetes for the Impatient (Velocity San Jose 2019)
Kubernetes for the Impatient (Velocity San Jose 2019)Kubernetes for the Impatient (Velocity San Jose 2019)
Kubernetes for the Impatient (Velocity San Jose 2019)
bridgetkromhout
 
Community projects inform enterprise products (Velocity San Jose 2019)
Community projects inform enterprise products (Velocity San Jose 2019)Community projects inform enterprise products (Velocity San Jose 2019)
Community projects inform enterprise products (Velocity San Jose 2019)
bridgetkromhout
 
Helm 3: Navigating to Distant Shores (KubeCon EU 2019)
Helm 3: Navigating to Distant Shores (KubeCon EU 2019)Helm 3: Navigating to Distant Shores (KubeCon EU 2019)
Helm 3: Navigating to Distant Shores (KubeCon EU 2019)
bridgetkromhout
 
Kubernetes Operability Tooling (GOTO Chicago 2019)
Kubernetes Operability Tooling (GOTO Chicago 2019)Kubernetes Operability Tooling (GOTO Chicago 2019)
Kubernetes Operability Tooling (GOTO Chicago 2019)
bridgetkromhout
 
Kubernetes Operability Tooling (Minnebar 2019)
Kubernetes Operability Tooling (Minnebar 2019)Kubernetes Operability Tooling (Minnebar 2019)
Kubernetes Operability Tooling (Minnebar 2019)
bridgetkromhout
 
Livetweeting Tech Conferences - SREcon Americas 2019
Livetweeting Tech Conferences - SREcon Americas 2019Livetweeting Tech Conferences - SREcon Americas 2019
Livetweeting Tech Conferences - SREcon Americas 2019
bridgetkromhout
 
Kubernetes Operability Tooling (devopsdays Seattle 2019)
Kubernetes Operability Tooling (devopsdays Seattle 2019)Kubernetes Operability Tooling (devopsdays Seattle 2019)
Kubernetes Operability Tooling (devopsdays Seattle 2019)
bridgetkromhout
 
Kubernetes Operability Tooling (LEAP 2019)
Kubernetes Operability Tooling (LEAP 2019)Kubernetes Operability Tooling (LEAP 2019)
Kubernetes Operability Tooling (LEAP 2019)
bridgetkromhout
 
Day 2 Kubernetes - Tools for Operability (KubeCon)
Day 2 Kubernetes - Tools for Operability (KubeCon)Day 2 Kubernetes - Tools for Operability (KubeCon)
Day 2 Kubernetes - Tools for Operability (KubeCon)
bridgetkromhout
 
Cloud, Containers, Kubernetes (YOW Melbourne 2018)
Cloud, Containers, Kubernetes (YOW Melbourne 2018)Cloud, Containers, Kubernetes (YOW Melbourne 2018)
Cloud, Containers, Kubernetes (YOW Melbourne 2018)
bridgetkromhout
 
Ad

Recently uploaded (20)

Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 

Production MongoDB in the Cloud

  • 1. Production MongoDB in the Cloud From Essentials to Corner Cases
  • 2. Who are we? Mike Hobbs & Bridget Kromhout Social Commerce & Brand Interest Graph Analytics
  • 3. Why MongoDB? ● Scalable, high-performance, open source ● Dynamic schemas for unstructured data ● Query language close to SQL in power ● "Eventually consistent" is hard to program right
  • 4. Our configuration 12-node cluster (4 shards x 3 replica sets) Several other non-sharded replica sets Desired webapp response time is < 10ms Total data size: 110 GB Total index size: 28 GB Largest collection: 49 GB Largest index: 8.1 GB EC2: EBS, instance size, replication MongoDB: right for only some data sets
  • 5. Memory & iowait Working set needs to fit in memory ● Indexes ● Frequently accessed records Avoid swapping!!! EBS latency in EC2 is an issue.
  • 6. Fragmentation Fragmentation steals from your most precious resource by reserving memory that is not used. Run a compaction when your storageSize significantly exceeds your data size mongos> db.widgets.stats() ... "size" : 5097988, "storageSize" : 22507520, Padding can reduce fragmentation and I/O db.widgets.insert({widg_id: "72120", padding: "XXXX...XXX"}) db.widgets.update({widg_id: "72120"}, { $unset: {padding: ""}, $set: {desc: "Grout remover", price: "13.39", instock: true} })
  • 7. Replica sets "optime" : { "t" : 1365165841000 , "i" : 1 }, "optimeDate" : { "$date" : "Fri Apr 5 07:44:01 2013" }, test-3-1.yourdomain test-3-2.yourdomain test-3-3.yourdomain test-3-1.yourdomain test
  • 8. Elections 08:52:06 [rsMgr] can't see a majority of the set, relinquishing primary 08:52:06 [rsMgr] replSet relinquishing primary state 08:52:06 [rsMgr] replSet SECONDARY 08:52:12 [rsMgr] replSet can't see a majority, will not try to elect self Primary always determined by an election. 2-member replSet without an arbiter: if the secondary goes offline, the primary will step down: Priorities can rig elections. Ensure availability of an odd number of voting members.
  • 9. Manual primary changes No "become primary now" command. Manual stepdowns with recusal timeout are best option. test-1:PRIMARY> rs.stepDown(300) Wed Apr 3 11:45:36 DBClientCursor::init call() failed Wed Apr 3 11:45:36 query failed : admin.$cmd { replSetStepDown: 300.0 } to: 127.0.0.1:27017 Wed Apr 3 11:45:36 Error: error doing query: failed src/mongo/shell/collection.js:155 Wed Apr 3 11:45:36 trying reconnect to 127.0.0.1:27017 Wed Apr 3 11:45:36 reconnect 127.0.0.1:27017 ok test-1:SECONDARY> This triggers an election. (Obviously, make sure your preferred candidate(s) can win.) States: down (initializing), startup2, secondary, primary
  • 10. replSet back to standalone? No. Test server: replicaset of 1, shard of 1. removed --replSet but shard configuration needed manual update: db.shards.update({host:"testreplset/test.domain.net"}, {$set: {host:"test.domain.net"}}) UpdatedExisting values no longer returned by mongos, but visible when connected to mongod: > db.schedule.update({_id:...}, {$set:{lock:true}}, false, true); db.runCommand("getlasterror") { "updatedExisting" : true, "n" : 1, "connectionId" : 73, "err" : null, "ok" : 1 } Solution: re-adding --replSet to the mongod startup line and reverting shard configs. (Bug open with 10gen.)
  • 11. Sharding Can increase parallelization of CPU & I/O Carefully choose a shard key (nontrivial to change) Must run config servers & mongos Doesn't ensure high availability Doesn't help if you're already out of memory 256GB collection max for initial sharding
  • 12. Rebalancing data across shards Queries block while servers negotiate final hand-off. Updating indexes after hand- off can be slow. Best run off-peak mongos> use config switched to db config mongos> db.settings.find() { "_id" : "balancer", "activeWindow" : { "start" : "23:00", "stop" : "6:00" } }
  • 13. Mongos & replSet primary changes Application-level errors talking to mongos after an election: pymongo.errors.AutoReconnect: could not connect to localhost: 27020: [Errno 111] Connection refused pymongo.errors.OperationFailure: database error: error querying server Mongos errors talking to mongod on original primary: Tue Apr 2 09:01:05 [conn3288] Socket say send() errno:110 Connection timed out 10.141.131.214:27017 Tue Apr 2 09:01:05 [conn3288] DBException in process: socket exception [SEND_ERROR] for 10.141.131.214:27017 Connection pool checked lazily; invalid connections can persist for days, depending on load. Can clear manually: mongos> db.adminCommand({connPoolSync:1}); { "ok" : 1 } mongos>
  • 14. Failure handling Applications must handle fail-over outages: AutoReconnect & OperationFailure in pymongo def auto_reconnect(func, *args, **kwargs): """ Executes func, retrying on AutoReconnect """ for _ in range(100): try: return func(self, *args, **kwargs) except pymongo.errors.AutoReconnect: pass except pymongo.errors.OperationFailure: pass time.sleep(0.1) raise TimeoutError()
  • 15. MMS (MongoDB Monitoring Service) ● free; hosted by 10gen ● need to run agent locally ● 10gen's commercial support relies on MMS
  • 16. Profiling queries [1] Finding bad queries that are actively running: $ mongo | tee mongo.log > db.currentOp() ... bye $ grep numYields mongo.log "numYields" : 0, "numYields" : 62247, "numYields" : 0, ... # Use your favorite viewer to find the op with 62247 yields Helpful to get server back to a responsive state: $ mongo > db.killOp(10883898)
  • 17. Profiling queries [2] Using nscanned to find queries that likely aren't using indexes: $ grep -P 'nscanned:dd' /var/log/mongodb.log ... or in real-time: $ tail -f /var/log/mongodb.log | grep -P 'nscanned:dd' MongoDB also provides the setProfilingLevel() command which can log all queries to system.profile collection. > db.system.profile.find({nscanned:{$gte:10}}) system.profile does incur some performance overhead, though.
  • 18. Nagios ● plugin uses pymongo ● set up service groups
  • 19. Ideas for the future ● Better reconnect handling in applications ● Lose the EBS? Ephemeral disk faster; rely on replication to keep data persistent. ● Intelligent use of mongo profiling (reduce observer effect of setProfilingLevel) ● Use more MMS alerts ● Going to 2.4.x (fast counts, hashed sharding)