SlideShare a Scribd company logo
StudyBlue




Databases at Scale:
A MongoDB Case Study


August 23, 2012




StudyBlue, Inc.
Overview


  •      About Me

  •      About StudyBlue

  •      Why MongoDB?

  •      Leveraging MongoDB

  •      Key Issues

  •      Q&A




StudyBlue, Inc.
Who am I?


  •      Sean Laurent

  •      sean@studyblue.com

  •      Head of Operations at StudyBlue, Inc.




StudyBlue, Inc.
studyblue.com



StudyBlue, Inc.
About StudyBlue

  •     Online service for storing, studying, sharing
        and ultimately mastering course material


  •     Digital backpack for students




StudyBlue, Inc.
StudyBlue Usage

  •     Many simultaneous users


  •     Rapid growth


  •     Cyclical usage




StudyBlue, Inc.
Initial Use Case



StudyBlue, Inc.
Flashcard Scoring


  •      Track flashcard scoring over time

       •      Every single card

       •      Every single user

       •      Forever


  •      Provide aggregate statistics

       •      Flashcard deck

       •      Folder

       •      Overall


  •      Focus on content mastery



StudyBlue, Inc.
Scoring Results
StudyBlue, Inc.
The Problem


  •      Reasonably large number of cards

  •      Large number of users

  •      Users base increasing rapidly

  •      Shift in usage - increasing faster than users

       •      Time on site

       •      Decks per user

       •      Average deck size

       •      Study sessions per user




StudyBlue, Inc.
StudyBlue Database Problems

  •     Amazon EC2


  •     Large number of simultaneous users


  •     High write volume


  •     Single PostgreSQL database


  •     Large tables




StudyBlue, Inc.
Why Mongo?



StudyBlue, Inc.
Alternatives


  •      Amazon Simple DB

       •      Far too simple


  •      Cassandra

       •      Difficult to add nodes and rebalance

       •      Column families cannot be modified w/out restart


  •      CouchDB

       •      Difficult to add nodes and rebalance


  •      Redis

       •      No native support for sharding/partitioning

       •      Master/slave only - no automatic failover

StudyBlue, Inc.
MongoDB for the Win


  •      Highly available

       •      Replica sets

       •      Automatic failover


  •     Horizontal scaling across shards

       •     Improved write performance


       •     Improved availability during failures


       •      Easy to add additional shards


  •     Easier maintenance


StudyBlue, Inc.
Implementation:
Phase 1


StudyBlue, Inc.
Development

  •     100% Java


  •     Existing PostgreSQL
        database

       •     System of record


       •     Synchronization issues




StudyBlue, Inc.
SQL Integration & Synchronization


  •      PostgreSQL considered system of record

  •      Asynchronous event driven

  •      Web servers queue change events

  •      Scoring servers process events

       •      Query PostgreSQL

       •      Update MongoDB




StudyBlue, Inc.
Architecture v1
StudyBlue, Inc.
MongoDB Schema


  •      Many shallow collections vs monolithic deep collection

  •      Leverage existing SQL knowledge

  •      Simplify SQL integration




StudyBlue, Inc.
Implementation:
Phase 2


StudyBlue, Inc.
DevOps


  •      Amazon EC2

       •      Separate dev, test and production environments


  •      Scripting & automation

       •      Creation

       •      Cloning

       •      Configuration management with Chef




StudyBlue, Inc.
Even More Data


  •     Moved existing tables from PostgreSQL to MongoDB

       •     Four PostgreSQL tables with millions of rows combined into single collection


  •     New development uses MongoDB:

       •     Analytics data with 300+ million documents




StudyBlue, Inc.
SQL Integration Part 2


  •      MongoDB considered system of record

  •      Web servers interact with MongoDB directly

  •      More complex structures, fewer shallow collections




StudyBlue, Inc.
Key Issues



StudyBlue, Inc.
Summary

  •     NoSQL vs SQL


  •     Design challenges


  •     Amazon EC2/EBS


  •     Partitioning & sharding


  •     Replication Lag




StudyBlue, Inc.
NoSQL vs SQL

  •     NoSQL != SQL


  •     Document database != RDBMS


  •     No joins


  •     Requires new mindset


  •     Store related data together


  •     Duplicate data as necessary




StudyBlue, Inc.
Design Challenges

  •     Multiple tables to single collections with complex objects


  •     Avoid growing objects

       •     Padding


       •     In-place update vs move


  •     Challenges with array elements




StudyBlue, Inc.
Amazon EC2 & EBS

  •     Plan for failure

       •     “When” not “if”


  •     EBS performance

       •     Inconsistent


       •     Limited by bandwidth


       •     100 IOPS / volume


       •     RAID-0




StudyBlue, Inc.
Instance Sizing

  •     Memory is king


  •     Keep working set in RAM

       •     Indexes


       •     Working data


  •     Spread horizontally instead of vertically

       •     Increased write performance




StudyBlue, Inc.
Data Routing with Shards




StudyBlue, Inc.
Partitioning in the Cloud


  •      Operations perspective

       •      Dynamic changes in machines

            •     Config servers track machines

            •     Each node in replica set knows other nodes

            •     Avoids restarting applications when Mongo servers change

       •      Easy scaling

            •     Local shard servers

            •     Config servers store redundant copies

                  •   Two-phase commit




StudyBlue, Inc.
Picking a shard key

  •     Shard key selection critical for proper distribution

       •     Spread writes across cluster


  •     Depends on usage

       •     Single document vs aggregation


  •     Examples all time-series data


  •     Cannot be changed




StudyBlue, Inc.
Sharding - Gritty Details

  •     Chunks

       •     64 MB blocks of data


  •     Splits

       •     1 chunk turns into 2 chunks


  •     Rebalance

       •     Move chunks to different nodes


       •     Maintain even distribution of chunks




StudyBlue, Inc.
Rebalancing Challenges

  •     Splits have to find mid point of chunk


  •     Very I/O expensive for collections with small documents

       •     Decreased chunk size


       •     Made documents larger & more complex


  •     Can be a drain on system


  •     Needs to run frequently




StudyBlue, Inc.
Replication Lag

  •     Eventual consistency


  •     No guarantees about lag


  •     Replica safe writes

       •     Data committed to at least 2 nodes


       •     Can cause problems with high replication lag


       •     Security vs time




StudyBlue, Inc.
Q&A



StudyBlue, Inc.
Contact us
Web: http://www.studyblue.com
Twitter: @StudyBlue
Email: sean@studyblue.com




   StudyBlue, Inc.
Ad

More Related Content

What's hot (20)

OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB Tutorial
Steven Francia
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
Tuan Luong
 
Dissecting Scalable Database Architectures
Dissecting Scalable Database ArchitecturesDissecting Scalable Database Architectures
Dissecting Scalable Database Architectures
hypertable
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at Scale
MongoDB
 
Challenges with MongoDB
Challenges with MongoDBChallenges with MongoDB
Challenges with MongoDB
Stone Gao
 
Papers We Love Too, June 2015: Haystack
Papers We Love Too, June 2015: Haystack Papers We Love Too, June 2015: Haystack
Papers We Love Too, June 2015: Haystack
Sargun Dhillon
 
MongoDB, E-commerce and Transactions
MongoDB, E-commerce and TransactionsMongoDB, E-commerce and Transactions
MongoDB, E-commerce and Transactions
Steven Francia
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity Planning
Norberto Leite
 
Divide and conquer in the cloud
Divide and conquer in the cloudDivide and conquer in the cloud
Divide and conquer in the cloud
Justin Swanhart
 
Study Notes: Facebook Haystack
Study Notes: Facebook HaystackStudy Notes: Facebook Haystack
Study Notes: Facebook Haystack
Gao Yunzhong
 
Postgres NoSQL - Delivering Apps Faster
Postgres NoSQL - Delivering Apps FasterPostgres NoSQL - Delivering Apps Faster
Postgres NoSQL - Delivering Apps Faster
EDB
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Justin Smestad
 
Mongo DB
Mongo DBMongo DB
Mongo DB
Edureka!
 
Where Is My Data - ILTAM Session
Where Is My Data - ILTAM SessionWhere Is My Data - ILTAM Session
Where Is My Data - ILTAM Session
Tamir Dresher
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
MongoDB
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Java
sunnygleason
 
Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0
Kai Sasaki
 
Power BI with Essbase in the Oracle Cloud
Power BI with Essbase in the Oracle CloudPower BI with Essbase in the Oracle Cloud
Power BI with Essbase in the Oracle Cloud
Kellyn Pot'Vin-Gorman
 
Securing Your MongoDB Deployment
Securing Your MongoDB DeploymentSecuring Your MongoDB Deployment
Securing Your MongoDB Deployment
MongoDB
 
OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB Tutorial
Steven Francia
 
Dissecting Scalable Database Architectures
Dissecting Scalable Database ArchitecturesDissecting Scalable Database Architectures
Dissecting Scalable Database Architectures
hypertable
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at Scale
MongoDB
 
Challenges with MongoDB
Challenges with MongoDBChallenges with MongoDB
Challenges with MongoDB
Stone Gao
 
Papers We Love Too, June 2015: Haystack
Papers We Love Too, June 2015: Haystack Papers We Love Too, June 2015: Haystack
Papers We Love Too, June 2015: Haystack
Sargun Dhillon
 
MongoDB, E-commerce and Transactions
MongoDB, E-commerce and TransactionsMongoDB, E-commerce and Transactions
MongoDB, E-commerce and Transactions
Steven Francia
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity Planning
Norberto Leite
 
Divide and conquer in the cloud
Divide and conquer in the cloudDivide and conquer in the cloud
Divide and conquer in the cloud
Justin Swanhart
 
Study Notes: Facebook Haystack
Study Notes: Facebook HaystackStudy Notes: Facebook Haystack
Study Notes: Facebook Haystack
Gao Yunzhong
 
Postgres NoSQL - Delivering Apps Faster
Postgres NoSQL - Delivering Apps FasterPostgres NoSQL - Delivering Apps Faster
Postgres NoSQL - Delivering Apps Faster
EDB
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Justin Smestad
 
Where Is My Data - ILTAM Session
Where Is My Data - ILTAM SessionWhere Is My Data - ILTAM Session
Where Is My Data - ILTAM Session
Tamir Dresher
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
MongoDB
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Java
sunnygleason
 
Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0
Kai Sasaki
 
Power BI with Essbase in the Oracle Cloud
Power BI with Essbase in the Oracle CloudPower BI with Essbase in the Oracle Cloud
Power BI with Essbase in the Oracle Cloud
Kellyn Pot'Vin-Gorman
 
Securing Your MongoDB Deployment
Securing Your MongoDB DeploymentSecuring Your MongoDB Deployment
Securing Your MongoDB Deployment
MongoDB
 

Viewers also liked (10)

MongoDB Case Study in Healthcare
MongoDB Case Study in HealthcareMongoDB Case Study in Healthcare
MongoDB Case Study in Healthcare
MongoDB
 
Learning To Relax
Learning To RelaxLearning To Relax
Learning To Relax
Cloudant
 
Mongodb open source_high_performance_database
Mongodb open source_high_performance_databaseMongodb open source_high_performance_database
Mongodb open source_high_performance_database
Murat Çakal
 
Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...
Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...
Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...
DataStax
 
Storage Area Networks Unit 2 Notes
Storage Area Networks Unit 2 NotesStorage Area Networks Unit 2 Notes
Storage Area Networks Unit 2 Notes
Sudarshan Dhondaley
 
Storage Area Networks Unit 4 Notes
Storage Area Networks Unit 4 NotesStorage Area Networks Unit 4 Notes
Storage Area Networks Unit 4 Notes
Sudarshan Dhondaley
 
MongoDB on Financial Services Sector
MongoDB on Financial Services SectorMongoDB on Financial Services Sector
MongoDB on Financial Services Sector
Norberto Leite
 
Storage Area Networks Unit 3 Notes
Storage Area Networks Unit 3 NotesStorage Area Networks Unit 3 Notes
Storage Area Networks Unit 3 Notes
Sudarshan Dhondaley
 
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
Alexandre Morgaut
 
Real World CouchDB
Real World CouchDBReal World CouchDB
Real World CouchDB
John Wood
 
MongoDB Case Study in Healthcare
MongoDB Case Study in HealthcareMongoDB Case Study in Healthcare
MongoDB Case Study in Healthcare
MongoDB
 
Learning To Relax
Learning To RelaxLearning To Relax
Learning To Relax
Cloudant
 
Mongodb open source_high_performance_database
Mongodb open source_high_performance_databaseMongodb open source_high_performance_database
Mongodb open source_high_performance_database
Murat Çakal
 
Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...
Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...
Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...
DataStax
 
Storage Area Networks Unit 2 Notes
Storage Area Networks Unit 2 NotesStorage Area Networks Unit 2 Notes
Storage Area Networks Unit 2 Notes
Sudarshan Dhondaley
 
Storage Area Networks Unit 4 Notes
Storage Area Networks Unit 4 NotesStorage Area Networks Unit 4 Notes
Storage Area Networks Unit 4 Notes
Sudarshan Dhondaley
 
MongoDB on Financial Services Sector
MongoDB on Financial Services SectorMongoDB on Financial Services Sector
MongoDB on Financial Services Sector
Norberto Leite
 
Storage Area Networks Unit 3 Notes
Storage Area Networks Unit 3 NotesStorage Area Networks Unit 3 Notes
Storage Area Networks Unit 3 Notes
Sudarshan Dhondaley
 
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
Alexandre Morgaut
 
Real World CouchDB
Real World CouchDBReal World CouchDB
Real World CouchDB
John Wood
 
Ad

Similar to MongoDB Case Study at NoSQL Now 2012 (20)

Store
StoreStore
Store
ESUG
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
Don Demcsak
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Bob Pusateri
 
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless DreamsRainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
Josh Carlisle
 
noSQL choices
noSQL choicesnoSQL choices
noSQL choices
lugiamaster4
 
Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2
Marco Tusa
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
DATAVERSITY
 
Scaling Databases On The Cloud
Scaling Databases On The CloudScaling Databases On The Cloud
Scaling Databases On The Cloud
Imaginea
 
Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloud
Imaginea
 
Five Years of EC2 Distilled
Five Years of EC2 DistilledFive Years of EC2 Distilled
Five Years of EC2 Distilled
Grig Gheorghiu
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
Ruben Badaró
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
Andriy Zabavskyy
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
Jonas Bonér
 
Drop acid
Drop acidDrop acid
Drop acid
Mike Feltman
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation
Yahoo Developer Network
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
Tony Tam
 
SpringPeople Introduction to MongoDB Administration
SpringPeople Introduction to MongoDB AdministrationSpringPeople Introduction to MongoDB Administration
SpringPeople Introduction to MongoDB Administration
SpringPeople
 
Managing storage on Prem and in Cloud
Managing storage on Prem and in CloudManaging storage on Prem and in Cloud
Managing storage on Prem and in Cloud
Howard Marks
 
High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2
ScribbleLive
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
elliando dias
 
Store
StoreStore
Store
ESUG
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
Don Demcsak
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Bob Pusateri
 
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless DreamsRainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
Josh Carlisle
 
Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2
Marco Tusa
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
DATAVERSITY
 
Scaling Databases On The Cloud
Scaling Databases On The CloudScaling Databases On The Cloud
Scaling Databases On The Cloud
Imaginea
 
Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloud
Imaginea
 
Five Years of EC2 Distilled
Five Years of EC2 DistilledFive Years of EC2 Distilled
Five Years of EC2 Distilled
Grig Gheorghiu
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
Ruben Badaró
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
Andriy Zabavskyy
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
Jonas Bonér
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation
Yahoo Developer Network
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
Tony Tam
 
SpringPeople Introduction to MongoDB Administration
SpringPeople Introduction to MongoDB AdministrationSpringPeople Introduction to MongoDB Administration
SpringPeople Introduction to MongoDB Administration
SpringPeople
 
Managing storage on Prem and in Cloud
Managing storage on Prem and in CloudManaging storage on Prem and in Cloud
Managing storage on Prem and in Cloud
Howard Marks
 
High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2
ScribbleLive
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
elliando dias
 
Ad

Recently uploaded (20)

Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 

MongoDB Case Study at NoSQL Now 2012

  • 1. StudyBlue Databases at Scale: A MongoDB Case Study August 23, 2012 StudyBlue, Inc.
  • 2. Overview • About Me • About StudyBlue • Why MongoDB? • Leveraging MongoDB • Key Issues • Q&A StudyBlue, Inc.
  • 3. Who am I? • Sean Laurent • [email protected] • Head of Operations at StudyBlue, Inc. StudyBlue, Inc.
  • 5. About StudyBlue • Online service for storing, studying, sharing and ultimately mastering course material • Digital backpack for students StudyBlue, Inc.
  • 6. StudyBlue Usage • Many simultaneous users • Rapid growth • Cyclical usage StudyBlue, Inc.
  • 8. Flashcard Scoring • Track flashcard scoring over time • Every single card • Every single user • Forever • Provide aggregate statistics • Flashcard deck • Folder • Overall • Focus on content mastery StudyBlue, Inc.
  • 10. The Problem • Reasonably large number of cards • Large number of users • Users base increasing rapidly • Shift in usage - increasing faster than users • Time on site • Decks per user • Average deck size • Study sessions per user StudyBlue, Inc.
  • 11. StudyBlue Database Problems • Amazon EC2 • Large number of simultaneous users • High write volume • Single PostgreSQL database • Large tables StudyBlue, Inc.
  • 13. Alternatives • Amazon Simple DB • Far too simple • Cassandra • Difficult to add nodes and rebalance • Column families cannot be modified w/out restart • CouchDB • Difficult to add nodes and rebalance • Redis • No native support for sharding/partitioning • Master/slave only - no automatic failover StudyBlue, Inc.
  • 14. MongoDB for the Win • Highly available • Replica sets • Automatic failover • Horizontal scaling across shards • Improved write performance • Improved availability during failures • Easy to add additional shards • Easier maintenance StudyBlue, Inc.
  • 16. Development • 100% Java • Existing PostgreSQL database • System of record • Synchronization issues StudyBlue, Inc.
  • 17. SQL Integration & Synchronization • PostgreSQL considered system of record • Asynchronous event driven • Web servers queue change events • Scoring servers process events • Query PostgreSQL • Update MongoDB StudyBlue, Inc.
  • 19. MongoDB Schema • Many shallow collections vs monolithic deep collection • Leverage existing SQL knowledge • Simplify SQL integration StudyBlue, Inc.
  • 21. DevOps • Amazon EC2 • Separate dev, test and production environments • Scripting & automation • Creation • Cloning • Configuration management with Chef StudyBlue, Inc.
  • 22. Even More Data • Moved existing tables from PostgreSQL to MongoDB • Four PostgreSQL tables with millions of rows combined into single collection • New development uses MongoDB: • Analytics data with 300+ million documents StudyBlue, Inc.
  • 23. SQL Integration Part 2 • MongoDB considered system of record • Web servers interact with MongoDB directly • More complex structures, fewer shallow collections StudyBlue, Inc.
  • 25. Summary • NoSQL vs SQL • Design challenges • Amazon EC2/EBS • Partitioning & sharding • Replication Lag StudyBlue, Inc.
  • 26. NoSQL vs SQL • NoSQL != SQL • Document database != RDBMS • No joins • Requires new mindset • Store related data together • Duplicate data as necessary StudyBlue, Inc.
  • 27. Design Challenges • Multiple tables to single collections with complex objects • Avoid growing objects • Padding • In-place update vs move • Challenges with array elements StudyBlue, Inc.
  • 28. Amazon EC2 & EBS • Plan for failure • “When” not “if” • EBS performance • Inconsistent • Limited by bandwidth • 100 IOPS / volume • RAID-0 StudyBlue, Inc.
  • 29. Instance Sizing • Memory is king • Keep working set in RAM • Indexes • Working data • Spread horizontally instead of vertically • Increased write performance StudyBlue, Inc.
  • 30. Data Routing with Shards StudyBlue, Inc.
  • 31. Partitioning in the Cloud • Operations perspective • Dynamic changes in machines • Config servers track machines • Each node in replica set knows other nodes • Avoids restarting applications when Mongo servers change • Easy scaling • Local shard servers • Config servers store redundant copies • Two-phase commit StudyBlue, Inc.
  • 32. Picking a shard key • Shard key selection critical for proper distribution • Spread writes across cluster • Depends on usage • Single document vs aggregation • Examples all time-series data • Cannot be changed StudyBlue, Inc.
  • 33. Sharding - Gritty Details • Chunks • 64 MB blocks of data • Splits • 1 chunk turns into 2 chunks • Rebalance • Move chunks to different nodes • Maintain even distribution of chunks StudyBlue, Inc.
  • 34. Rebalancing Challenges • Splits have to find mid point of chunk • Very I/O expensive for collections with small documents • Decreased chunk size • Made documents larger & more complex • Can be a drain on system • Needs to run frequently StudyBlue, Inc.
  • 35. Replication Lag • Eventual consistency • No guarantees about lag • Replica safe writes • Data committed to at least 2 nodes • Can cause problems with high replication lag • Security vs time StudyBlue, Inc.

Editor's Notes

  • #2: \n
  • #3: \n
  • #4: - Developer at heart\n- 15 years experience\n- Responsible for selecting Mongo\n\n
  • #5: \n
  • #6: - 15 person startup\n- Bottom-up attempt to improve student outcomes through disruptive change outside of the education system. \n- Allows students to create and store lecture notes and flashcards and access them online and via mobile apps (iOS and Android)\n
  • #7: - No public numbers (low millions)\n- 4000 simultaneous users (peak)\n- 120+ countries\n- Daily cycle slowly flattening\n
  • #8: \n
  • #9: \n
  • #10: \n
  • #11: - 20 million cards at the time\n- Over 60 million cards now\n- Expect 100 million cards in next 6 months\n
  • #12: - EC2 limits vertical scaling\n- Postgres tuning extremely beneficial\n- Tables > 70 million rows\n
  • #13: \n
  • #14: Cassandra & Redis have since improved \nAmazon Dynamo didn’t exist\n\n
  • #15: \n
  • #16: \n
  • #17: \n
  • #18: \n
  • #19: \n
  • #20: \n
  • #21: \n
  • #22: Launch replacement Mongo server in < 10 mins\nClone entire production Mongo cluster in < 60 mins\n
  • #23: - Not huge by BigData standards - Couple terabytes\n- Big by startup standards\n
  • #24: \n
  • #25: \n
  • #26: \n
  • #27: \n
  • #28: \n
  • #29: Provisioned IOPS\n
  • #30: - Working set is ~20% for SB, mostly recently created data\n
  • #31: \n
  • #32: \n
  • #33: http://www.snailinaturtleneck.com/blog/2011/01/04/how-to-choose-a-shard-key-the-card-game/\n
  • #34: \n
  • #35: Ran nightly - backlog causes really high load\n
  • #36: \n
  • #37: \n
  • #38: \n