SlideShare a Scribd company logo
Got Documents?
AN EXPLORATION OF DOCUMENT DATABASES IN SOFTWARE
ARCHITECTURE
About Me
www.maggiepint.com
maggiepint@gmail.com
@maggiepinthttps://www.tempworks.com
Got documents Code Mash Revision
Flavors
MONGODB, COUCHDB, RAVENDB, AND MORE
MongoDB
•Dominant player in document databases
•Runs on nearly all platforms
•Strongly Consistent in default configuration
•Indexes are similar to traditional SQL indexes in nature
•Stores data in customized Binary JSON (BSON) format that allows typing
•Limit support for cross-collection querying in latest release
•Client API’s available in tons of languages
•Must use a third party provider like SOLR for advanced search capabilities
CouchDB
•Stores documents in plain JSON format
•Eventually consistent
•Indexes are map-reduce and defined in Javascript
•Clients in many languages
•Runs on Linux, OSX and Windows
•CouchDB-Lucene provides a Lucene integration for search
RavenDB
•Stores documents in plain JSON format
•Eventually consistent
•Indexes are built on Lucene. Lucene search is native to RavenDB.
•Server only runs on Windows
•.NET, Java, and HTTP Clients
•Limited support for cross-collection querying
Other Players
•Azure DocumentDB
• Very new product from Microsoft
•ReactDB
• Open source project that integrates push notifications into the database
•Cloudant
• IBM proprietary implementation of CouchDB
•DynamoDB
• Mixed model key value and document database
Architectural
Considerations
How do document databases work?
•Stores related data in a single document
•Usually uses JSON format for documents
•Enables the storage of complex object graphs together, instead of normalizing data out into
tables
•Stores documents in collections of the same type
•Allows querying within collections
•Does not typically allow querying across collections
•Offers high availability at the cost of consistency
Consideration: Schema Free
PROS
Easy to add properties
Simple migrations
Tolerant of differing data
CONS
Have to account for properties being missing
ACID
Atomicity
◦ Each transaction is all or nothing
Consistency
◦ Any transaction brings the database from one valid state to another
Isolation
◦ System ensures that transactions operated concurrently bring the database to the same state as if they
had been operated serially
Durability
◦ Once a transaction is committed, it remains so even in the event of power loss, etc
ACID in Document Databases
•Traditional transaction support is not available in any document database
•Document databases do support something like transactions within the scope of a document
•This makes document databases generally inappropriate for a wide variety of applications
Consideration: Non-Acid
PROS
Performance Gain
CONS
No way to guarantee that operations across
succeed or fail together
No isolation when sharded
Various implementation specific issues
Case Study: Survey
System
Requirements
•An administration area is used to define ‘Surveys’.
• Surveys have Questions
• Questions have answers
•Surveys can be administrated in sets called workflows
•When a survey changes, this change can only apply to surveys moving forward
• Because of this, each user must receive a survey ‘instance’ to track the version of the survey he/she got
A Traditional SQL Schema
•With various other requirements not described here, this schema came out to 83 tables
•For one of our heaviest usage clients, the average user would have 119 answers in the ‘Saved
Answer’ table
•With over 200,000 users after two years of use, the ‘Saved Answer’ table had 24,014,330 rows
•This table was both read and write heavy, so it was extremely difficult to define effective SQL
indexes
•The hardware cost for these SQL servers was astronomical
•This sucked
Designing Documents
•An aggregate is a collection of objects that can be treated as one
•An aggregate root is the object that contains all other objects inside of it
•When designing document schema, find your aggregates and create documents around them
•If you have an entity, it should be persisted as it’s own document because you will likely have to
store references to it
Survey System Design
•A combination SQL and Document DB design was used
•Survey Templates (one type of entity) were put into the SQL Database
•When a survey was assigned to a user as part of a workflow (another entity, and also an
aggregate), it’s data at that time was put into the document database
•The user’s responses were saved as part of the workflow document
•Reading a user’s application data became as simple as making one request for her workflow
document
Consideration: Models Aggregates Well
PROS
Improves performance by reducing lookups
Allows for easy persistence of object oriented
designs
CONS
none
Sharding
•Sharding is the practice of distributing data across multiple servers
•All major document database providers support sharding natively
•Document Databases are ideal for sharding because document data is self contained (less need
to worry about a query having to run on two servers)
•Sharding is usually accomplished by selecting a shard key for a collection, and allowing the
collection to be distributed to different nodes based on that key
•Tenant Id and geographic regions are typical choices for shard keys
Replication
•All major document database providers support replication
•In most replication setups, a primary node takes all write operations, and a secondary node
asynchronously replicates these write operations
•In the event of a failure of the primary, the secondary begins to take write operations
•MongoDB can be configured to allow reads from secondaries as a performance optimization,
resulting in eventual instead of strong consistency
Consideration: Scaling Out
PROS
Allows hardware to be scaled horizontally
Ensures very high availability
CONS
Consistency is sacrificed
Survey System: End Result
•Each user is associated with about 20 documents
•Documents are distributed across multiple databases using sharding
•Master/Master replication is used to ensure extremely high availability
•There have been no database performance issues in the year and a half the app has been in
production
•Because there is no schema migration concern, deploying updates has been drastically
simplified
•Hardware cost is reasonable (but not cheap)
Got documents Code Mash Revision
Indexes
•All document databases support some form of indexing to improve query performance
•Some document databases do not allow querying without an index
•In general, you shouldn’t query without an index anyways
Consideration: Indexes
PROS
Improve performance of queries
CONS
Queries cannot reasonably be issued without
an index so indexes must frequently be
defined and deployed
Got documents Code Mash Revision
Consideration: Eventual Consistency
PROS
Optimizes performance by allowing data
transfer to be a background process
CONS
Requires entire team to be aware of eventual
consistency implications
Case Study 2: CRM
CRM Requirements
•Track customers and basic information about them
•Track contacts and basic information about them
•Track sales deals and where they are in the pipeline
•Track orders generated from sales deals
•Track user tasks
Customers and Their Deals
•Customers and Deals are both entities, which is to say that they have distinct identity
•For this reason, Deals and Customer should be two separate collections
•There is no native support for cross-collection querying in most Document Databases
• The cross-collection querying support in RavenDB doesn’t perform well
Consideration: One document per
interaction
PROS
Improves performance
Encourages modeling aggregates well
CONS
Not actually achievable in most cases
Searching Deals by Customer Name
•The deal document must contain a denormalized customer object with the customer’s ID and
name
•We have a choice to make with this denormalization
• Allow the denormalization to just be wrong in the event the customer name is changed
• Maintain the denormalization when the customer name is changed
Denormalization Considerations
•Is stale data acceptable? This is the best option in all cases where it is possible.
•If stale data is unacceptable, how many documents are likely to need update when a change is
made? How often are changes going to be made?
•Using an event bus to move denormalization updates to a background process can be very
beneficial if failure of an update isn’t critical for the user to know
Consideration: Models Relationships
Poorly
PROS
None
CONS
Stale (out of date) data must be accepted in
the system
Large amounts of boilerplate code must be
written to maintain denormalizations
In certain circumstances a queuing/eventing
system is unavoidable
Got documents Code Mash Revision
Consideration: Administration
PROS
Generally less involved than SQL
CONS
Server performance must be monitored
Hardware must be maintained
Index processes must be tuned
Settings must be tweaked
Consideration Recap
•Schema Free
•Non-Acid
•Models Aggregates Well
•Scales out well
•All queries must be indexed
•Eventual Consistency
•One document per interaction
•Models relationships poorly
•Requires administration
…nerds like us are allowed to be unironically
enthusiastic about stuff… Nerds are allowed to
love stuff, like jump-up-and-down-in-the-chair-
can’t-control-yourself love it.
-John Green

More Related Content

What's hot (20)

Polyglot Persistence - Two Great Tastes That Taste Great Together
Polyglot Persistence - Two Great Tastes That Taste Great TogetherPolyglot Persistence - Two Great Tastes That Taste Great Together
Polyglot Persistence - Two Great Tastes That Taste Great Together
John Wood
 
MongoDB at eBay
MongoDB at eBayMongoDB at eBay
MongoDB at eBay
MongoDB
 
What SharePoint Admins need to know about SQL-Cinncinati
What SharePoint Admins need to know about SQL-CinncinatiWhat SharePoint Admins need to know about SQL-Cinncinati
What SharePoint Admins need to know about SQL-Cinncinati
J.D. Wade
 
NoSql - mayank singh
NoSql - mayank singhNoSql - mayank singh
NoSql - mayank singh
Mayank Singh
 
Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018
Josh Carlisle
 
Introduction to SharePoint for SQLserver DBAs
Introduction to SharePoint for SQLserver DBAsIntroduction to SharePoint for SQLserver DBAs
Introduction to SharePoint for SQLserver DBAs
Steve Knutson
 
Connected at the hip for MS BI: SharePoint and SQL
Connected at the hip for MS BI: SharePoint and SQLConnected at the hip for MS BI: SharePoint and SQL
Connected at the hip for MS BI: SharePoint and SQL
J.D. Wade
 
SPS Kansas City: What SharePoint Admin need to know about SQL
SPS Kansas City: What SharePoint Admin need to know about SQLSPS Kansas City: What SharePoint Admin need to know about SQL
SPS Kansas City: What SharePoint Admin need to know about SQL
J.D. Wade
 
What SQL DBA's need to know about SharePoint
What SQL DBA's need to know about SharePointWhat SQL DBA's need to know about SharePoint
What SQL DBA's need to know about SharePoint
J.D. Wade
 
RavenDB 4.0
RavenDB 4.0RavenDB 4.0
RavenDB 4.0
Oren Eini
 
Webinar: Best Practices for Upgrading to MongoDB 3.2
Webinar: Best Practices for Upgrading to MongoDB 3.2Webinar: Best Practices for Upgrading to MongoDB 3.2
Webinar: Best Practices for Upgrading to MongoDB 3.2
Dana Elisabeth Groce
 
Scaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLScaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQL
Richard Schneeman
 
Hardware planning & sizing for sql server
Hardware planning & sizing for sql serverHardware planning & sizing for sql server
Hardware planning & sizing for sql server
Davide Mauri
 
NoSQL benchmarking
NoSQL benchmarkingNoSQL benchmarking
NoSQL benchmarking
Prasoon Kumar
 
RavenDB embedded at massive scales
RavenDB embedded at massive scalesRavenDB embedded at massive scales
RavenDB embedded at massive scales
Oren Eini
 
MS DevDay - SQLServer 2014 for Developers
MS DevDay - SQLServer 2014 for DevelopersMS DevDay - SQLServer 2014 for Developers
MS DevDay - SQLServer 2014 for Developers
Денис Резник
 
Boost the Performance of SharePoint Today!
Boost the Performance of SharePoint Today!Boost the Performance of SharePoint Today!
Boost the Performance of SharePoint Today!
Brian Culver
 
CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & Developers
Niko Neugebauer
 
What SQL DBAs need to know about SharePoint-Kansas City, Sept 2013
What SQL DBAs need to know about SharePoint-Kansas City, Sept 2013What SQL DBAs need to know about SharePoint-Kansas City, Sept 2013
What SQL DBAs need to know about SharePoint-Kansas City, Sept 2013
J.D. Wade
 
Zapping ever faster: how Zap sped up by two orders of magnitude using RavenDB
Zapping ever faster: how Zap sped up by two orders of magnitude using RavenDBZapping ever faster: how Zap sped up by two orders of magnitude using RavenDB
Zapping ever faster: how Zap sped up by two orders of magnitude using RavenDB
Oren Eini
 
Polyglot Persistence - Two Great Tastes That Taste Great Together
Polyglot Persistence - Two Great Tastes That Taste Great TogetherPolyglot Persistence - Two Great Tastes That Taste Great Together
Polyglot Persistence - Two Great Tastes That Taste Great Together
John Wood
 
MongoDB at eBay
MongoDB at eBayMongoDB at eBay
MongoDB at eBay
MongoDB
 
What SharePoint Admins need to know about SQL-Cinncinati
What SharePoint Admins need to know about SQL-CinncinatiWhat SharePoint Admins need to know about SQL-Cinncinati
What SharePoint Admins need to know about SQL-Cinncinati
J.D. Wade
 
NoSql - mayank singh
NoSql - mayank singhNoSql - mayank singh
NoSql - mayank singh
Mayank Singh
 
Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018
Josh Carlisle
 
Introduction to SharePoint for SQLserver DBAs
Introduction to SharePoint for SQLserver DBAsIntroduction to SharePoint for SQLserver DBAs
Introduction to SharePoint for SQLserver DBAs
Steve Knutson
 
Connected at the hip for MS BI: SharePoint and SQL
Connected at the hip for MS BI: SharePoint and SQLConnected at the hip for MS BI: SharePoint and SQL
Connected at the hip for MS BI: SharePoint and SQL
J.D. Wade
 
SPS Kansas City: What SharePoint Admin need to know about SQL
SPS Kansas City: What SharePoint Admin need to know about SQLSPS Kansas City: What SharePoint Admin need to know about SQL
SPS Kansas City: What SharePoint Admin need to know about SQL
J.D. Wade
 
What SQL DBA's need to know about SharePoint
What SQL DBA's need to know about SharePointWhat SQL DBA's need to know about SharePoint
What SQL DBA's need to know about SharePoint
J.D. Wade
 
Webinar: Best Practices for Upgrading to MongoDB 3.2
Webinar: Best Practices for Upgrading to MongoDB 3.2Webinar: Best Practices for Upgrading to MongoDB 3.2
Webinar: Best Practices for Upgrading to MongoDB 3.2
Dana Elisabeth Groce
 
Scaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLScaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQL
Richard Schneeman
 
Hardware planning & sizing for sql server
Hardware planning & sizing for sql serverHardware planning & sizing for sql server
Hardware planning & sizing for sql server
Davide Mauri
 
RavenDB embedded at massive scales
RavenDB embedded at massive scalesRavenDB embedded at massive scales
RavenDB embedded at massive scales
Oren Eini
 
MS DevDay - SQLServer 2014 for Developers
MS DevDay - SQLServer 2014 for DevelopersMS DevDay - SQLServer 2014 for Developers
MS DevDay - SQLServer 2014 for Developers
Денис Резник
 
Boost the Performance of SharePoint Today!
Boost the Performance of SharePoint Today!Boost the Performance of SharePoint Today!
Boost the Performance of SharePoint Today!
Brian Culver
 
CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & Developers
Niko Neugebauer
 
What SQL DBAs need to know about SharePoint-Kansas City, Sept 2013
What SQL DBAs need to know about SharePoint-Kansas City, Sept 2013What SQL DBAs need to know about SharePoint-Kansas City, Sept 2013
What SQL DBAs need to know about SharePoint-Kansas City, Sept 2013
J.D. Wade
 
Zapping ever faster: how Zap sped up by two orders of magnitude using RavenDB
Zapping ever faster: how Zap sped up by two orders of magnitude using RavenDBZapping ever faster: how Zap sped up by two orders of magnitude using RavenDB
Zapping ever faster: how Zap sped up by two orders of magnitude using RavenDB
Oren Eini
 

Similar to Got documents Code Mash Revision (20)

Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Lucas Jellema
 
Mongo db 3.4 Overview
Mongo db 3.4 OverviewMongo db 3.4 Overview
Mongo db 3.4 Overview
Norberto Leite
 
Microsoft Azure DocumentDB - Global Azure Bootcamp 2016
Microsoft Azure DocumentDB -  Global Azure Bootcamp 2016Microsoft Azure DocumentDB -  Global Azure Bootcamp 2016
Microsoft Azure DocumentDB - Global Azure Bootcamp 2016
Sunny Sharma
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
Alessandro Melchiori
 
Make Text Search "Work" for Your Apps - JavaOne 2013
Make Text Search "Work" for Your Apps - JavaOne 2013Make Text Search "Work" for Your Apps - JavaOne 2013
Make Text Search "Work" for Your Apps - JavaOne 2013
javagroup2006
 
MONGODB VASUDEV PRAJAPATI DOCUMENTBASE DATABASE
MONGODB VASUDEV PRAJAPATI DOCUMENTBASE DATABASEMONGODB VASUDEV PRAJAPATI DOCUMENTBASE DATABASE
MONGODB VASUDEV PRAJAPATI DOCUMENTBASE DATABASE
vasustudy176
 
Intro Duction of Database and its fundamentals .ppt
Intro Duction of Database and its fundamentals .pptIntro Duction of Database and its fundamentals .ppt
Intro Duction of Database and its fundamentals .ppt
Faisal Khan
 
dbms introduction.pptx
dbms introduction.pptxdbms introduction.pptx
dbms introduction.pptx
ATISHAYJAIN847270
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
MongoDB
 
ГАННА КАПЛУН «noSQL vs SQL: порівняння використання реляційних та нереляційни...
ГАННА КАПЛУН «noSQL vs SQL: порівняння використання реляційних та нереляційни...ГАННА КАПЛУН «noSQL vs SQL: порівняння використання реляційних та нереляційни...
ГАННА КАПЛУН «noSQL vs SQL: порівняння використання реляційних та нереляційни...
QADay
 
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Bob Pusateri
 
cours database pour etudiant NoSQL (1).pptx
cours database pour etudiant NoSQL (1).pptxcours database pour etudiant NoSQL (1).pptx
cours database pour etudiant NoSQL (1).pptx
ssuser1fde9c
 
Database Technologies
Database TechnologiesDatabase Technologies
Database Technologies
Michel de Goede
 
NoSQL and MongoDB
NoSQL and MongoDBNoSQL and MongoDB
NoSQL and MongoDB
Rajesh Menon
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
Adi Challa
 
Relational databases store data in tables
Relational databases store data in tablesRelational databases store data in tables
Relational databases store data in tables
HELLOWorld889594
 
History and Introduction to NoSQL over Traditional Rdbms
History and Introduction to NoSQL over Traditional RdbmsHistory and Introduction to NoSQL over Traditional Rdbms
History and Introduction to NoSQL over Traditional Rdbms
vinayh902
 
System design for video streaming service
System design for video streaming serviceSystem design for video streaming service
System design for video streaming service
Nirmik Kale
 
NOSQL PRESENTATION ON INTRRODUCTION Intro.pptx
NOSQL PRESENTATION ON INTRRODUCTION Intro.pptxNOSQL PRESENTATION ON INTRRODUCTION Intro.pptx
NOSQL PRESENTATION ON INTRRODUCTION Intro.pptx
plvdravikumarit
 
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
Dropping ACID: Wrapping Your Mind Around NoSQL DatabasesDropping ACID: Wrapping Your Mind Around NoSQL Databases
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
Kyle Banerjee
 
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Lucas Jellema
 
Microsoft Azure DocumentDB - Global Azure Bootcamp 2016
Microsoft Azure DocumentDB -  Global Azure Bootcamp 2016Microsoft Azure DocumentDB -  Global Azure Bootcamp 2016
Microsoft Azure DocumentDB - Global Azure Bootcamp 2016
Sunny Sharma
 
Make Text Search "Work" for Your Apps - JavaOne 2013
Make Text Search "Work" for Your Apps - JavaOne 2013Make Text Search "Work" for Your Apps - JavaOne 2013
Make Text Search "Work" for Your Apps - JavaOne 2013
javagroup2006
 
MONGODB VASUDEV PRAJAPATI DOCUMENTBASE DATABASE
MONGODB VASUDEV PRAJAPATI DOCUMENTBASE DATABASEMONGODB VASUDEV PRAJAPATI DOCUMENTBASE DATABASE
MONGODB VASUDEV PRAJAPATI DOCUMENTBASE DATABASE
vasustudy176
 
Intro Duction of Database and its fundamentals .ppt
Intro Duction of Database and its fundamentals .pptIntro Duction of Database and its fundamentals .ppt
Intro Duction of Database and its fundamentals .ppt
Faisal Khan
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
MongoDB
 
ГАННА КАПЛУН «noSQL vs SQL: порівняння використання реляційних та нереляційни...
ГАННА КАПЛУН «noSQL vs SQL: порівняння використання реляційних та нереляційни...ГАННА КАПЛУН «noSQL vs SQL: порівняння використання реляційних та нереляційни...
ГАННА КАПЛУН «noSQL vs SQL: порівняння використання реляційних та нереляційни...
QADay
 
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Bob Pusateri
 
cours database pour etudiant NoSQL (1).pptx
cours database pour etudiant NoSQL (1).pptxcours database pour etudiant NoSQL (1).pptx
cours database pour etudiant NoSQL (1).pptx
ssuser1fde9c
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
Adi Challa
 
Relational databases store data in tables
Relational databases store data in tablesRelational databases store data in tables
Relational databases store data in tables
HELLOWorld889594
 
History and Introduction to NoSQL over Traditional Rdbms
History and Introduction to NoSQL over Traditional RdbmsHistory and Introduction to NoSQL over Traditional Rdbms
History and Introduction to NoSQL over Traditional Rdbms
vinayh902
 
System design for video streaming service
System design for video streaming serviceSystem design for video streaming service
System design for video streaming service
Nirmik Kale
 
NOSQL PRESENTATION ON INTRRODUCTION Intro.pptx
NOSQL PRESENTATION ON INTRRODUCTION Intro.pptxNOSQL PRESENTATION ON INTRRODUCTION Intro.pptx
NOSQL PRESENTATION ON INTRRODUCTION Intro.pptx
plvdravikumarit
 
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
Dropping ACID: Wrapping Your Mind Around NoSQL DatabasesDropping ACID: Wrapping Your Mind Around NoSQL Databases
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
Kyle Banerjee
 

More from Maggie Pint (8)

Programming in the 4th Dimension
Programming in the 4th DimensionProgramming in the 4th Dimension
Programming in the 4th Dimension
Maggie Pint
 
Maintaining maintainers(copy)
Maintaining maintainers(copy)Maintaining maintainers(copy)
Maintaining maintainers(copy)
Maggie Pint
 
MomentJS at SeattleJS
MomentJS at SeattleJSMomentJS at SeattleJS
MomentJS at SeattleJS
Maggie Pint
 
That Conference Date and Time
That Conference Date and TimeThat Conference Date and Time
That Conference Date and Time
Maggie Pint
 
Date and Time MomentJS Edition
Date and Time MomentJS EditionDate and Time MomentJS Edition
Date and Time MomentJS Edition
Maggie Pint
 
Date and Time Odds Ends Oddities
Date and Time Odds Ends OdditiesDate and Time Odds Ends Oddities
Date and Time Odds Ends Oddities
Maggie Pint
 
It Depends - Database admin for developers - Rev 20151205
It Depends - Database admin for developers - Rev 20151205It Depends - Database admin for developers - Rev 20151205
It Depends - Database admin for developers - Rev 20151205
Maggie Pint
 
It Depends
It DependsIt Depends
It Depends
Maggie Pint
 
Programming in the 4th Dimension
Programming in the 4th DimensionProgramming in the 4th Dimension
Programming in the 4th Dimension
Maggie Pint
 
Maintaining maintainers(copy)
Maintaining maintainers(copy)Maintaining maintainers(copy)
Maintaining maintainers(copy)
Maggie Pint
 
MomentJS at SeattleJS
MomentJS at SeattleJSMomentJS at SeattleJS
MomentJS at SeattleJS
Maggie Pint
 
That Conference Date and Time
That Conference Date and TimeThat Conference Date and Time
That Conference Date and Time
Maggie Pint
 
Date and Time MomentJS Edition
Date and Time MomentJS EditionDate and Time MomentJS Edition
Date and Time MomentJS Edition
Maggie Pint
 
Date and Time Odds Ends Oddities
Date and Time Odds Ends OdditiesDate and Time Odds Ends Oddities
Date and Time Odds Ends Oddities
Maggie Pint
 
It Depends - Database admin for developers - Rev 20151205
It Depends - Database admin for developers - Rev 20151205It Depends - Database admin for developers - Rev 20151205
It Depends - Database admin for developers - Rev 20151205
Maggie Pint
 

Recently uploaded (20)

Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 

Got documents Code Mash Revision

  • 1. Got Documents? AN EXPLORATION OF DOCUMENT DATABASES IN SOFTWARE ARCHITECTURE
  • 5. MongoDB •Dominant player in document databases •Runs on nearly all platforms •Strongly Consistent in default configuration •Indexes are similar to traditional SQL indexes in nature •Stores data in customized Binary JSON (BSON) format that allows typing •Limit support for cross-collection querying in latest release •Client API’s available in tons of languages •Must use a third party provider like SOLR for advanced search capabilities
  • 6. CouchDB •Stores documents in plain JSON format •Eventually consistent •Indexes are map-reduce and defined in Javascript •Clients in many languages •Runs on Linux, OSX and Windows •CouchDB-Lucene provides a Lucene integration for search
  • 7. RavenDB •Stores documents in plain JSON format •Eventually consistent •Indexes are built on Lucene. Lucene search is native to RavenDB. •Server only runs on Windows •.NET, Java, and HTTP Clients •Limited support for cross-collection querying
  • 8. Other Players •Azure DocumentDB • Very new product from Microsoft •ReactDB • Open source project that integrates push notifications into the database •Cloudant • IBM proprietary implementation of CouchDB •DynamoDB • Mixed model key value and document database
  • 10. How do document databases work? •Stores related data in a single document •Usually uses JSON format for documents •Enables the storage of complex object graphs together, instead of normalizing data out into tables •Stores documents in collections of the same type •Allows querying within collections •Does not typically allow querying across collections •Offers high availability at the cost of consistency
  • 11. Consideration: Schema Free PROS Easy to add properties Simple migrations Tolerant of differing data CONS Have to account for properties being missing
  • 12. ACID Atomicity ◦ Each transaction is all or nothing Consistency ◦ Any transaction brings the database from one valid state to another Isolation ◦ System ensures that transactions operated concurrently bring the database to the same state as if they had been operated serially Durability ◦ Once a transaction is committed, it remains so even in the event of power loss, etc
  • 13. ACID in Document Databases •Traditional transaction support is not available in any document database •Document databases do support something like transactions within the scope of a document •This makes document databases generally inappropriate for a wide variety of applications
  • 14. Consideration: Non-Acid PROS Performance Gain CONS No way to guarantee that operations across succeed or fail together No isolation when sharded Various implementation specific issues
  • 16. Requirements •An administration area is used to define ‘Surveys’. • Surveys have Questions • Questions have answers •Surveys can be administrated in sets called workflows •When a survey changes, this change can only apply to surveys moving forward • Because of this, each user must receive a survey ‘instance’ to track the version of the survey he/she got
  • 17. A Traditional SQL Schema •With various other requirements not described here, this schema came out to 83 tables •For one of our heaviest usage clients, the average user would have 119 answers in the ‘Saved Answer’ table •With over 200,000 users after two years of use, the ‘Saved Answer’ table had 24,014,330 rows •This table was both read and write heavy, so it was extremely difficult to define effective SQL indexes •The hardware cost for these SQL servers was astronomical •This sucked
  • 18. Designing Documents •An aggregate is a collection of objects that can be treated as one •An aggregate root is the object that contains all other objects inside of it •When designing document schema, find your aggregates and create documents around them •If you have an entity, it should be persisted as it’s own document because you will likely have to store references to it
  • 19. Survey System Design •A combination SQL and Document DB design was used •Survey Templates (one type of entity) were put into the SQL Database •When a survey was assigned to a user as part of a workflow (another entity, and also an aggregate), it’s data at that time was put into the document database •The user’s responses were saved as part of the workflow document •Reading a user’s application data became as simple as making one request for her workflow document
  • 20. Consideration: Models Aggregates Well PROS Improves performance by reducing lookups Allows for easy persistence of object oriented designs CONS none
  • 21. Sharding •Sharding is the practice of distributing data across multiple servers •All major document database providers support sharding natively •Document Databases are ideal for sharding because document data is self contained (less need to worry about a query having to run on two servers) •Sharding is usually accomplished by selecting a shard key for a collection, and allowing the collection to be distributed to different nodes based on that key •Tenant Id and geographic regions are typical choices for shard keys
  • 22. Replication •All major document database providers support replication •In most replication setups, a primary node takes all write operations, and a secondary node asynchronously replicates these write operations •In the event of a failure of the primary, the secondary begins to take write operations •MongoDB can be configured to allow reads from secondaries as a performance optimization, resulting in eventual instead of strong consistency
  • 23. Consideration: Scaling Out PROS Allows hardware to be scaled horizontally Ensures very high availability CONS Consistency is sacrificed
  • 24. Survey System: End Result •Each user is associated with about 20 documents •Documents are distributed across multiple databases using sharding •Master/Master replication is used to ensure extremely high availability •There have been no database performance issues in the year and a half the app has been in production •Because there is no schema migration concern, deploying updates has been drastically simplified •Hardware cost is reasonable (but not cheap)
  • 26. Indexes •All document databases support some form of indexing to improve query performance •Some document databases do not allow querying without an index •In general, you shouldn’t query without an index anyways
  • 27. Consideration: Indexes PROS Improve performance of queries CONS Queries cannot reasonably be issued without an index so indexes must frequently be defined and deployed
  • 29. Consideration: Eventual Consistency PROS Optimizes performance by allowing data transfer to be a background process CONS Requires entire team to be aware of eventual consistency implications
  • 31. CRM Requirements •Track customers and basic information about them •Track contacts and basic information about them •Track sales deals and where they are in the pipeline •Track orders generated from sales deals •Track user tasks
  • 32. Customers and Their Deals •Customers and Deals are both entities, which is to say that they have distinct identity •For this reason, Deals and Customer should be two separate collections •There is no native support for cross-collection querying in most Document Databases • The cross-collection querying support in RavenDB doesn’t perform well
  • 33. Consideration: One document per interaction PROS Improves performance Encourages modeling aggregates well CONS Not actually achievable in most cases
  • 34. Searching Deals by Customer Name •The deal document must contain a denormalized customer object with the customer’s ID and name •We have a choice to make with this denormalization • Allow the denormalization to just be wrong in the event the customer name is changed • Maintain the denormalization when the customer name is changed
  • 35. Denormalization Considerations •Is stale data acceptable? This is the best option in all cases where it is possible. •If stale data is unacceptable, how many documents are likely to need update when a change is made? How often are changes going to be made? •Using an event bus to move denormalization updates to a background process can be very beneficial if failure of an update isn’t critical for the user to know
  • 36. Consideration: Models Relationships Poorly PROS None CONS Stale (out of date) data must be accepted in the system Large amounts of boilerplate code must be written to maintain denormalizations In certain circumstances a queuing/eventing system is unavoidable
  • 38. Consideration: Administration PROS Generally less involved than SQL CONS Server performance must be monitored Hardware must be maintained Index processes must be tuned Settings must be tweaked
  • 39. Consideration Recap •Schema Free •Non-Acid •Models Aggregates Well •Scales out well •All queries must be indexed •Eventual Consistency •One document per interaction •Models relationships poorly •Requires administration
  • 40. …nerds like us are allowed to be unironically enthusiastic about stuff… Nerds are allowed to love stuff, like jump-up-and-down-in-the-chair- can’t-control-yourself love it. -John Green

Editor's Notes