The document provides an overview of schema design basics for document databases, including modeling goals, common data patterns like one-to-many and many-to-many relationships, and techniques for modeling tree structures.
2. A brief history of Data Modeling ISAM COBOL Network Hiearchical Relational 1970 E.F.Codd introduces 1 st Normal Form (1NF) 1971 E.F.Codd introduces 2 nd and 3 rd Normal Form (2NF, 3NF 1974 Codd & Boyce define Boyce/Codd Normal Form (BCNF) 2002 Date, Darween, Lorentzos define 6 th Normal Form (6NF) Object
4. Modeling goals Goals: Avoid anomalies when inserting, updating or deleting Minimize redesign when extending the schema Make the model informative to users Avoid bias towards a particular style of query * source : wikipedia
7. Some terms before we proceed RDBMS Document DBs Table Collection View / Row(s) JSON Document Index Index Join Embedding & Linking across documents Partition Shard Partition Key Shard Key
8. Recap Design documents that simply map to your application post = { author : “roger”, date : new Date(), text : “I love J.Biebs...”, tags : [“rockstar”,“puppy-love”]}
10. Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne, // find posts with any tags >db.posts.find({ tags : {$exists: true}}) Regular expressions: // posts where author starts with k >db.posts.find({ author : /^r*/i })
11. Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne, // find posts with any tags >db.posts.find({ tags : {$exists: true}}) Regular expressions: // posts where author starts with k >db.posts.find({ author : /^r*/i }) Counting: // posts written by mike >db.posts.find({ author : “roger”}).count()
12. Extending the Schema new_comment = { author : “Gretchen”, date : new Date(), text : “Biebs is Toll!!!!”} new_info = { ‘$push’: { comments : new_comment}, ‘ $inc’: { comments_count : 1}} >db.posts.update({ _id : “...” }, new_info)
13. { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : ”roger", date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)", text : " I love J.Biebs... ", tags : [ ”rockstar", ”puppy-love" ], comments_count : 1, comments : [ { author : ”Gretchen", date : "Sat Jul 24 2010 20:51:03 GMT-0700 (PDT)", text : ” Biebs is Toll!!!! " } ]} Extending the Schema
14. // create index on nested documents: >db.posts.ensureIndex({"comments.author": 1}) >db.posts.find({comments.author:”Gretchen”}) // find last 5 posts: >db.posts.find().sort({ date :-1}).limit(5) // most commented post: >db.posts.find().sort({ comments_count :-1}).limit(1) When sorting, check if you need an index Extending the Schema
15. Single Table Inheritance >db.shapes.find() { _id : ObjectId("..."), type : "circle", area : 3.14, radius : 1} { _id : ObjectId("..."), type : "square", area : 4, d : 2} { _id : ObjectId("..."), type : "rect", area : 10, length : 5, width : 2} // find shapes where radius > 0 >db.shapes.find({ radius : { $gt : 0}}) // create index >db.shapes.ensureIndex({ radius : 1})
16. One to Many - Embedded Array / Using Array Keys - slice operator to return subset of array - hard to find latest comments across all documents
17. One to Many - Embedded Array / Array Keys - slice operator to return subset of array - hard to find latest comments across all documents - Embedded tree - Single document - Natural
18. One to Many - Embedded Array / Array Keys - slice operator to return subset of array - hard to find latest comments across all documents - Embedded tree - Single document - Natural - Normalized (2 collections) - most flexible - more queries
19. Many - Many Example: - Product can be in many categories - Category can have many products Products - product_id Category - category_id Prod_Categories id product_id category_id
20. products: { _id : ObjectId("4c4ca23933fb5941681b912e"), name : "Sumatra Dark Roast", category_ids : [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} Many – Many
21. products: { _id : ObjectId("4c4ca23933fb5941681b912e"), name : "Sumatra Dark Roast", category_ids : [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id : ObjectId("4c4ca25433fb5941681b912f"), name : "Indonesia", product_ids : [ ObjectId("4c4ca23933fb5941681b912e"), ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"]} Many – Many
22. products: { _id : ObjectId("4c4ca23933fb5941681b912e"), name : "Sumatra Dark Roast", category_ids : [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id : ObjectId("4c4ca25433fb5941681b912f"), name : "Indonesia", product_ids : [ ObjectId("4c4ca23933fb5941681b912e"), ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"]} //All categories for a given product >db.categories.find({ product_ids : ObjectId("4c4ca23933fb5941681b912e")}) Many - Many
23. products: { _id : ObjectId("4c4ca23933fb5941681b912e"), name : "Sumatra Dark Roast", category_ids : [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id : ObjectId("4c4ca25433fb5941681b912f"), name : "Indonesia", product_ids : [ ObjectId("4c4ca23933fb5941681b912e"), ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"]} //All categories for a given product >db.categories.find({ product_ids : ObjectId("4c4ca23933fb5941681b912e")}) //All products for a given category >db.products.find({ category_ids : ObjectId("4c4ca25433fb5941681b912f")}) Many - Many
24. products: { _id : ObjectId("4c4ca23933fb5941681b912e"), name : "Sumatra Dark Roast", category_ids : [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id : ObjectId("4c4ca25433fb5941681b912f"), name : "Indonesia"} Alternative
25. products: { _id : ObjectId("4c4ca23933fb5941681b912e"), name : "Sumatra Dark Roast", category_ids : [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id : ObjectId("4c4ca25433fb5941681b912f"), name : "Indonesia"} // All products for a given category >db.products.find({ category_ids : ObjectId("4c4ca25433fb5941681b912f")}) Alternative
26. products: { _id : ObjectId("4c4ca23933fb5941681b912e"), name : "Sumatra Dark Roast", category_ids : [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id : ObjectId("4c4ca25433fb5941681b912f"), name : "Indonesia"} // All products for a given category >db.products.find({ category_ids : ObjectId("4c4ca25433fb5941681b912f")}) // All categories for a given product product = db.products.find( _id : some_id) >db.categories.find({ _id : {$in : product.category_ids}}) Alternative
27. Trees Full Tree in Document { comments : [ { author : “rpb”, text : “...”, replies : [ { author : “Fred”, text : “...”, replies : []} ]} ]} Pros: Single Document, Performance, Intuitive Cons: Hard to search, 4MB limit
28. Trees - continued Parent Links - Each node is stored as a document - Contains the id of the parent Child Links - Each node contains the id’s of the children - Can support graphs (multiple parents / child)