SlideShare a Scribd company logo
Schema Design Basics Roger Bodamer roger @ 10gen.com @rogerb
A brief history of Data Modeling ISAM COBOL  Network  Hiearchical Relational 1970 E.F.Codd introduces 1 st  Normal Form (1NF) 1971 E.F.Codd introduces 2 nd  and 3 rd  Normal Form (2NF, 3NF 1974 Codd & Boyce define Boyce/Codd Normal Form (BCNF) 2002 Date, Darween, Lorentzos define 6 th  Normal Form (6NF) Object
So why model data?
Modeling goals Goals: Avoid anomalies when inserting, updating or deleting Minimize redesign when extending the schema Make the model informative to users Avoid bias towards a particular style of query * source : wikipedia
Relational made normalized data look like this
Document databases make normalized data look like this
Some terms before we proceed RDBMS Document DBs Table Collection View / Row(s) JSON Document Index Index Join Embedding & Linking across documents Partition Shard Partition Key Shard Key
Recap Design documents that simply map to your application post = { author : “roger”, date : new Date(), text : “I love J.Biebs...”, tags : [“rockstar”,“puppy-love”]}
Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne,  // find posts with any tags >db.posts.find({ tags : {$exists: true}})
Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne,  // find posts with any tags >db.posts.find({ tags : {$exists: true}}) Regular expressions: // posts where author starts with k >db.posts.find({ author : /^r*/i })
Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne,  // find posts with any tags >db.posts.find({ tags : {$exists: true}}) Regular expressions: // posts where author starts with k >db.posts.find({ author : /^r*/i })  Counting:  // posts written by mike >db.posts.find({ author : “roger”}).count()
Extending the Schema new_comment = { author : “Gretchen”,  date : new Date(), text : “Biebs is Toll!!!!”} new_info = { ‘$push’: { comments : new_comment}, ‘ $inc’: { comments_count : 1}} >db.posts.update({ _id : “...” }, new_info)
{  _id  : ObjectId("4c4ba5c0672c685e5e8aabf3"),  author  : ”roger", date  : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)",  text  : " I love J.Biebs... ", tags  : [ ”rockstar", ”puppy-love" ], comments_count : 1,  comments  : [ { author  : ”Gretchen", date  : "Sat Jul 24 2010 20:51:03 GMT-0700 (PDT)", text  : ”  Biebs is Toll!!!! " } ]} Extending the Schema
// create index on nested documents: >db.posts.ensureIndex({"comments.author": 1}) >db.posts.find({comments.author:”Gretchen”}) // find last 5 posts: >db.posts.find().sort({ date :-1}).limit(5) // most commented post: >db.posts.find().sort({ comments_count :-1}).limit(1) When sorting, check if you need an index Extending the Schema
Single Table Inheritance >db.shapes.find() {  _id : ObjectId("..."),  type : "circle",  area : 3.14,  radius : 1} {  _id : ObjectId("..."),  type : "square",  area : 4,  d : 2} {  _id : ObjectId("..."),  type : "rect",  area : 10,  length : 5,  width : 2} // find shapes where radius > 0  >db.shapes.find({ radius : { $gt : 0}}) // create index >db.shapes.ensureIndex({ radius : 1})
One to Many - Embedded Array / Using Array Keys - slice operator to return subset of array - hard to find latest comments across all documents
One to Many - Embedded Array / Array Keys - slice operator to return subset of array - hard to find latest comments across all documents - Embedded tree - Single document - Natural
One to Many - Embedded Array / Array Keys - slice operator to return subset of array - hard to find latest comments across all documents - Embedded tree - Single document - Natural  - Normalized (2 collections) - most flexible - more queries
Many - Many Example: - Product can be in many categories - Category can have many products Products - product_id Category - category_id Prod_Categories id product_id category_id
products: {  _id : ObjectId("4c4ca23933fb5941681b912e"), name : "Sumatra Dark Roast", category_ids : [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} Many – Many
products: {  _id : ObjectId("4c4ca23933fb5941681b912e"), name : "Sumatra Dark Roast", category_ids : [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: {  _id : ObjectId("4c4ca25433fb5941681b912f"),  name : "Indonesia",  product_ids : [ ObjectId("4c4ca23933fb5941681b912e"), ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"]} Many – Many
products: {  _id : ObjectId("4c4ca23933fb5941681b912e"), name : "Sumatra Dark Roast", category_ids : [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: {  _id : ObjectId("4c4ca25433fb5941681b912f"),  name : "Indonesia",  product_ids : [ ObjectId("4c4ca23933fb5941681b912e"), ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"]} //All categories for a given product >db.categories.find({ product_ids : ObjectId("4c4ca23933fb5941681b912e")}) Many - Many
products: {  _id : ObjectId("4c4ca23933fb5941681b912e"), name : "Sumatra Dark Roast", category_ids : [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: {  _id : ObjectId("4c4ca25433fb5941681b912f"),  name : "Indonesia",  product_ids : [ ObjectId("4c4ca23933fb5941681b912e"), ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"]} //All categories for a given product >db.categories.find({ product_ids : ObjectId("4c4ca23933fb5941681b912e")}) //All products for a given category >db.products.find({ category_ids : ObjectId("4c4ca25433fb5941681b912f")})  Many - Many
products: {  _id : ObjectId("4c4ca23933fb5941681b912e"), name : "Sumatra Dark Roast", category_ids : [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: {  _id : ObjectId("4c4ca25433fb5941681b912f"),  name : "Indonesia"} Alternative
products: {  _id : ObjectId("4c4ca23933fb5941681b912e"), name : "Sumatra Dark Roast", category_ids : [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: {  _id : ObjectId("4c4ca25433fb5941681b912f"),  name : "Indonesia"} // All products for a given category >db.products.find({ category_ids : ObjectId("4c4ca25433fb5941681b912f")})  Alternative
products: {  _id : ObjectId("4c4ca23933fb5941681b912e"), name : "Sumatra Dark Roast", category_ids : [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: {  _id : ObjectId("4c4ca25433fb5941681b912f"),  name : "Indonesia"} // All products for a given category >db.products.find({ category_ids : ObjectId("4c4ca25433fb5941681b912f")})  // All categories for a given product product  = db.products.find( _id  : some_id) >db.categories.find({ _id  : {$in : product.category_ids}})  Alternative
Trees Full Tree in Document {  comments : [ {  author : “rpb”,  text : “...”,  replies : [ { author : “Fred”,  text : “...”, replies : []}  ]} ]} Pros: Single Document, Performance, Intuitive Cons: Hard to search,  4MB limit
Trees - continued Parent Links - Each node is stored as a document - Contains the id of the parent Child Links - Each node contains the id’s of the children - Can support graphs (multiple parents / child)
Array of Ancestors - Store Ancestors of a node  {  _id : "a" } {  _id : "b",  ancestors : [ "a" ],  parent : "a" } {  _id : "c",  ancestors : [ "a", "b" ],  parent : "b" } {  _id : "d",  ancestors : [ "a", "b" ],  parent : "b" } {  _id : "e",  ancestors : [ "a" ],  parent : "a" } {  _id : "f",  ancestors : [ "a", "e" ],  parent : "e" } {  _id : "g",  ancestors : [ "a", "b", "d" ],  parent : "d" }
Array of Ancestors - Store Ancestors of a node  {  _id : "a" } {  _id : "b",  ancestors : [ "a" ],  parent : "a" } {  _id : "c",  ancestors : [ "a", "b" ],  parent : "b" } {  _id : "d",  ancestors : [ "a", "b" ],  parent : "b" } {  _id : "e",  ancestors : [ "a" ],  parent : "a" } {  _id : "f",  ancestors : [ "a", "e" ],  parent : "e" } {  _id : "g",  ancestors : [ "a", "b", "d" ],  parent : "d" } //find all descendants of b: >db.tree2.find({ ancestors : ‘b’})
Array of Ancestors - Store Ancestors of a node  {  _id : "a" } {  _id : "b",  ancestors : [ "a" ],  parent : "a" } {  _id : "c",  ancestors : [ "a", "b" ],  parent : "b" } {  _id : "d",  ancestors : [ "a", "b" ],  parent : "b" } {  _id : "e",  ancestors : [ "a" ],  parent : "a" } {  _id : "f",  ancestors : [ "a", "e" ],  parent : "e" } {  _id : "g",  ancestors : [ "a", "b", "d" ],  parent : "d" } //find all descendants of b: >db.tree2.find({ ancestors : ‘b’}) //find all ancestors of f: >ancestors = db.tree2.findOne({ _id :’f’}).ancestors >db.tree2.find({ _id : { $in : ancestors})
Variable Keys How to index ? {  "_id" : "uuid1",   "field1" : {   "ctx1" : { "ctx3" : 5, … },      "ctx8" : { "ctx3" : 5, … } }} db.MyCollection.find({ "field1.ctx1.ctx3" : { $exists : true} }) Rewrite: {  "_id" : "uuid1",   "field1" : {   key: "ctx1”, value : { k:"ctx3”, v : 5, … },      key: "ctx8”, value : { k: "ctx3”, v : 5, … } }} db.x.ensureIndex({“field1.key.k”, 1})
findAndModify Queue example //Example: find highest priority job and mark job = db.jobs.findAndModify({   query :  {inprogress: false}, sort :  {priority: -1),  update : {$set: {inprogress: true,  started: new Date()}}, new : true})
Learn More Kyle’s presentation + video:  http://www.slideshare.net/kbanker/mongodb-schema-design http://www.blip.tv/file/3704083 Dwight’s presentation http://www.slideshare.net/mongosf/schema-design-with-mongodb-dwight-merriman Documentation Trees:      http://www.mongodb.org/display/DOCS/Trees+in+MongoDB Queues:   http://www.mongodb.org/display/DOCS/findandmodify+Command Aggregration:  http://www.mongodb.org/display/DOCS/Aggregation Capped Col. :  http://www.mongodb.org/display/DOCS/Capped+Collections Geo:  http://www.mongodb.org/display/DOCS/Geospatial+Indexing GridFS:  http://www.mongodb.org/display/DOCS/GridFS+Specification
Thank You :-)
Download MongoDB http://www.mongodb.org and let us know what you think @mongodb
DBRef DBRef { $ref : collection,  $id : id_value} - Think URL - YDSMV: your driver support may vary Sample Schema: nr =  { note_refs : [{"$ref" : "notes", "$id" : 5}, ... ]} Dereferencing: nr.forEach(function(r) { printjson(db[r.$ref].findOne({ _id : r.$id})); }
BSON Mongodb stores data in BSON  internally Lightweight, Traversable, Efficient encoding Typed  boolean, integer, float, date, string, binary, array...

More Related Content

What's hot (20)

PPTX
Dropping ACID with MongoDB
kchodorow
 
PDF
MongoDB and Ruby on Rails
rfischer20
 
PPTX
MongoDB (Advanced)
TO THE NEW | Technology
 
PPTX
Powerful Analysis with the Aggregation Pipeline
MongoDB
 
PDF
01 ElasticSearch : Getting Started
OpenThink Labs
 
PPTX
Back to Basics Webinar 2: Your First MongoDB Application
MongoDB
 
PPTX
Moose Best Practices
Aran Deltac
 
PPTX
ETL for Pros: Getting Data Into MongoDB
MongoDB
 
PPTX
Dex Technical Seminar (April 2011)
Sergio Gomez Villamor
 
PDF
NoSQL を Ruby で実践するための n 個の方法
Tomohiro Nishimura
 
PDF
03. ElasticSearch : Data In, Data Out
OpenThink Labs
 
PPTX
Back to Basics Webinar 3 - Thinking in Documents
Joe Drumgoole
 
PPTX
Back to Basics Webinar 1 - Introduction to NoSQL
Joe Drumgoole
 
ODP
Terms of endearment - the ElasticSearch Query DSL explained
clintongormley
 
PPTX
ETL for Pros: Getting Data Into MongoDB
MongoDB
 
PPTX
Dev Jumpstart: Schema Design Best Practices
MongoDB
 
KEY
The Ruby/mongoDB ecosystem
Harold Giménez
 
PPTX
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...
MongoDB
 
PDF
06. ElasticSearch : Mapping and Analysis
OpenThink Labs
 
PDF
ActiveRecord vs Mongoid
Ivan Nemytchenko
 
Dropping ACID with MongoDB
kchodorow
 
MongoDB and Ruby on Rails
rfischer20
 
MongoDB (Advanced)
TO THE NEW | Technology
 
Powerful Analysis with the Aggregation Pipeline
MongoDB
 
01 ElasticSearch : Getting Started
OpenThink Labs
 
Back to Basics Webinar 2: Your First MongoDB Application
MongoDB
 
Moose Best Practices
Aran Deltac
 
ETL for Pros: Getting Data Into MongoDB
MongoDB
 
Dex Technical Seminar (April 2011)
Sergio Gomez Villamor
 
NoSQL を Ruby で実践するための n 個の方法
Tomohiro Nishimura
 
03. ElasticSearch : Data In, Data Out
OpenThink Labs
 
Back to Basics Webinar 3 - Thinking in Documents
Joe Drumgoole
 
Back to Basics Webinar 1 - Introduction to NoSQL
Joe Drumgoole
 
Terms of endearment - the ElasticSearch Query DSL explained
clintongormley
 
ETL for Pros: Getting Data Into MongoDB
MongoDB
 
Dev Jumpstart: Schema Design Best Practices
MongoDB
 
The Ruby/mongoDB ecosystem
Harold Giménez
 
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...
MongoDB
 
06. ElasticSearch : Mapping and Analysis
OpenThink Labs
 
ActiveRecord vs Mongoid
Ivan Nemytchenko
 

Similar to Schema design short (20)

KEY
Schema design
christkv
 
PDF
10gen Presents Schema Design and Data Modeling
DATAVERSITY
 
PPTX
Mongo db – document oriented database
Wojciech Sznapka
 
PPTX
Schema design mongo_boston
MongoDB
 
PPTX
Schema Design
MongoDB
 
KEY
Schema Design
MongoDB
 
PDF
Schema Design
MongoDB
 
PPTX
Webinar: Schema Design
MongoDB
 
ODP
MongoDB San Francisco DrupalCon 2010
Karoly Negyesi
 
PDF
Schema & Design
MongoDB
 
KEY
MongoDB - Introduction
Vagmi Mudumbai
 
PPTX
Webinar: General Technical Overview of MongoDB for Dev Teams
MongoDB
 
PPTX
Why NoSQL Makes Sense
MongoDB
 
PPTX
Why NoSQL Makes Sense
MongoDB
 
PPTX
Schema Design
MongoDB
 
PDF
2012-08-29 - NoSQL Bootcamp (Redis, RavenDB & MongoDB für .NET Entwickler)
Johannes Hoppe
 
PDF
Latinoware
kchodorow
 
PDF
2013-03-23 - NoSQL Spartakiade
Johannes Hoppe
 
PDF
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
Matias Cascallares
 
PDF
Schema Design
MongoDB
 
Schema design
christkv
 
10gen Presents Schema Design and Data Modeling
DATAVERSITY
 
Mongo db – document oriented database
Wojciech Sznapka
 
Schema design mongo_boston
MongoDB
 
Schema Design
MongoDB
 
Schema Design
MongoDB
 
Schema Design
MongoDB
 
Webinar: Schema Design
MongoDB
 
MongoDB San Francisco DrupalCon 2010
Karoly Negyesi
 
Schema & Design
MongoDB
 
MongoDB - Introduction
Vagmi Mudumbai
 
Webinar: General Technical Overview of MongoDB for Dev Teams
MongoDB
 
Why NoSQL Makes Sense
MongoDB
 
Why NoSQL Makes Sense
MongoDB
 
Schema Design
MongoDB
 
2012-08-29 - NoSQL Bootcamp (Redis, RavenDB & MongoDB für .NET Entwickler)
Johannes Hoppe
 
Latinoware
kchodorow
 
2013-03-23 - NoSQL Spartakiade
Johannes Hoppe
 
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
Matias Cascallares
 
Schema Design
MongoDB
 
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
Ad

Recently uploaded (20)

PDF
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
PPTX
Machine Learning Benefits Across Industries
SynapseIndia
 
PPTX
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PDF
OpenInfra ID 2025 - Are Containers Dying? Rethinking Isolation with MicroVMs.pdf
Muhammad Yuga Nugraha
 
PDF
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PPTX
Agile Chennai 18-19 July 2025 | Workshop - Enhancing Agile Collaboration with...
AgileNetwork
 
PDF
Basics of Electronics for IOT(actuators ,microcontroller etc..)
arnavmanesh
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PPTX
PCU Keynote at IEEE World Congress on Services 250710.pptx
Ramesh Jain
 
PPTX
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
PDF
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
Machine Learning Benefits Across Industries
SynapseIndia
 
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
OpenInfra ID 2025 - Are Containers Dying? Rethinking Isolation with MicroVMs.pdf
Muhammad Yuga Nugraha
 
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
Agile Chennai 18-19 July 2025 | Workshop - Enhancing Agile Collaboration with...
AgileNetwork
 
Basics of Electronics for IOT(actuators ,microcontroller etc..)
arnavmanesh
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PCU Keynote at IEEE World Congress on Services 250710.pptx
Ramesh Jain
 
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 

Schema design short

  • 1. Schema Design Basics Roger Bodamer roger @ 10gen.com @rogerb
  • 2. A brief history of Data Modeling ISAM COBOL Network Hiearchical Relational 1970 E.F.Codd introduces 1 st Normal Form (1NF) 1971 E.F.Codd introduces 2 nd and 3 rd Normal Form (2NF, 3NF 1974 Codd & Boyce define Boyce/Codd Normal Form (BCNF) 2002 Date, Darween, Lorentzos define 6 th Normal Form (6NF) Object
  • 3. So why model data?
  • 4. Modeling goals Goals: Avoid anomalies when inserting, updating or deleting Minimize redesign when extending the schema Make the model informative to users Avoid bias towards a particular style of query * source : wikipedia
  • 5. Relational made normalized data look like this
  • 6. Document databases make normalized data look like this
  • 7. Some terms before we proceed RDBMS Document DBs Table Collection View / Row(s) JSON Document Index Index Join Embedding & Linking across documents Partition Shard Partition Key Shard Key
  • 8. Recap Design documents that simply map to your application post = { author : “roger”, date : new Date(), text : “I love J.Biebs...”, tags : [“rockstar”,“puppy-love”]}
  • 9. Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne, // find posts with any tags >db.posts.find({ tags : {$exists: true}})
  • 10. Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne, // find posts with any tags >db.posts.find({ tags : {$exists: true}}) Regular expressions: // posts where author starts with k >db.posts.find({ author : /^r*/i })
  • 11. Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne, // find posts with any tags >db.posts.find({ tags : {$exists: true}}) Regular expressions: // posts where author starts with k >db.posts.find({ author : /^r*/i }) Counting: // posts written by mike >db.posts.find({ author : “roger”}).count()
  • 12. Extending the Schema new_comment = { author : “Gretchen”, date : new Date(), text : “Biebs is Toll!!!!”} new_info = { ‘$push’: { comments : new_comment}, ‘ $inc’: { comments_count : 1}} >db.posts.update({ _id : “...” }, new_info)
  • 13. { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : ”roger", date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)", text : " I love J.Biebs... ", tags : [ ”rockstar", ”puppy-love" ], comments_count : 1, comments : [ { author : ”Gretchen", date : "Sat Jul 24 2010 20:51:03 GMT-0700 (PDT)", text : ” Biebs is Toll!!!! " } ]} Extending the Schema
  • 14. // create index on nested documents: >db.posts.ensureIndex({"comments.author": 1}) >db.posts.find({comments.author:”Gretchen”}) // find last 5 posts: >db.posts.find().sort({ date :-1}).limit(5) // most commented post: >db.posts.find().sort({ comments_count :-1}).limit(1) When sorting, check if you need an index Extending the Schema
  • 15. Single Table Inheritance >db.shapes.find() { _id : ObjectId("..."), type : "circle", area : 3.14, radius : 1} { _id : ObjectId("..."), type : "square", area : 4, d : 2} { _id : ObjectId("..."), type : "rect", area : 10, length : 5, width : 2} // find shapes where radius > 0 >db.shapes.find({ radius : { $gt : 0}}) // create index >db.shapes.ensureIndex({ radius : 1})
  • 16. One to Many - Embedded Array / Using Array Keys - slice operator to return subset of array - hard to find latest comments across all documents
  • 17. One to Many - Embedded Array / Array Keys - slice operator to return subset of array - hard to find latest comments across all documents - Embedded tree - Single document - Natural
  • 18. One to Many - Embedded Array / Array Keys - slice operator to return subset of array - hard to find latest comments across all documents - Embedded tree - Single document - Natural - Normalized (2 collections) - most flexible - more queries
  • 19. Many - Many Example: - Product can be in many categories - Category can have many products Products - product_id Category - category_id Prod_Categories id product_id category_id
  • 20. products: { _id : ObjectId("4c4ca23933fb5941681b912e"), name : "Sumatra Dark Roast", category_ids : [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} Many – Many
  • 21. products: { _id : ObjectId("4c4ca23933fb5941681b912e"), name : "Sumatra Dark Roast", category_ids : [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id : ObjectId("4c4ca25433fb5941681b912f"), name : "Indonesia", product_ids : [ ObjectId("4c4ca23933fb5941681b912e"), ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"]} Many – Many
  • 22. products: { _id : ObjectId("4c4ca23933fb5941681b912e"), name : "Sumatra Dark Roast", category_ids : [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id : ObjectId("4c4ca25433fb5941681b912f"), name : "Indonesia", product_ids : [ ObjectId("4c4ca23933fb5941681b912e"), ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"]} //All categories for a given product >db.categories.find({ product_ids : ObjectId("4c4ca23933fb5941681b912e")}) Many - Many
  • 23. products: { _id : ObjectId("4c4ca23933fb5941681b912e"), name : "Sumatra Dark Roast", category_ids : [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id : ObjectId("4c4ca25433fb5941681b912f"), name : "Indonesia", product_ids : [ ObjectId("4c4ca23933fb5941681b912e"), ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"]} //All categories for a given product >db.categories.find({ product_ids : ObjectId("4c4ca23933fb5941681b912e")}) //All products for a given category >db.products.find({ category_ids : ObjectId("4c4ca25433fb5941681b912f")}) Many - Many
  • 24. products: { _id : ObjectId("4c4ca23933fb5941681b912e"), name : "Sumatra Dark Roast", category_ids : [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id : ObjectId("4c4ca25433fb5941681b912f"), name : "Indonesia"} Alternative
  • 25. products: { _id : ObjectId("4c4ca23933fb5941681b912e"), name : "Sumatra Dark Roast", category_ids : [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id : ObjectId("4c4ca25433fb5941681b912f"), name : "Indonesia"} // All products for a given category >db.products.find({ category_ids : ObjectId("4c4ca25433fb5941681b912f")}) Alternative
  • 26. products: { _id : ObjectId("4c4ca23933fb5941681b912e"), name : "Sumatra Dark Roast", category_ids : [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id : ObjectId("4c4ca25433fb5941681b912f"), name : "Indonesia"} // All products for a given category >db.products.find({ category_ids : ObjectId("4c4ca25433fb5941681b912f")}) // All categories for a given product product = db.products.find( _id : some_id) >db.categories.find({ _id : {$in : product.category_ids}}) Alternative
  • 27. Trees Full Tree in Document { comments : [ { author : “rpb”, text : “...”, replies : [ { author : “Fred”, text : “...”, replies : []} ]} ]} Pros: Single Document, Performance, Intuitive Cons: Hard to search, 4MB limit
  • 28. Trees - continued Parent Links - Each node is stored as a document - Contains the id of the parent Child Links - Each node contains the id’s of the children - Can support graphs (multiple parents / child)
  • 29. Array of Ancestors - Store Ancestors of a node { _id : "a" } { _id : "b", ancestors : [ "a" ], parent : "a" } { _id : "c", ancestors : [ "a", "b" ], parent : "b" } { _id : "d", ancestors : [ "a", "b" ], parent : "b" } { _id : "e", ancestors : [ "a" ], parent : "a" } { _id : "f", ancestors : [ "a", "e" ], parent : "e" } { _id : "g", ancestors : [ "a", "b", "d" ], parent : "d" }
  • 30. Array of Ancestors - Store Ancestors of a node { _id : "a" } { _id : "b", ancestors : [ "a" ], parent : "a" } { _id : "c", ancestors : [ "a", "b" ], parent : "b" } { _id : "d", ancestors : [ "a", "b" ], parent : "b" } { _id : "e", ancestors : [ "a" ], parent : "a" } { _id : "f", ancestors : [ "a", "e" ], parent : "e" } { _id : "g", ancestors : [ "a", "b", "d" ], parent : "d" } //find all descendants of b: >db.tree2.find({ ancestors : ‘b’})
  • 31. Array of Ancestors - Store Ancestors of a node { _id : "a" } { _id : "b", ancestors : [ "a" ], parent : "a" } { _id : "c", ancestors : [ "a", "b" ], parent : "b" } { _id : "d", ancestors : [ "a", "b" ], parent : "b" } { _id : "e", ancestors : [ "a" ], parent : "a" } { _id : "f", ancestors : [ "a", "e" ], parent : "e" } { _id : "g", ancestors : [ "a", "b", "d" ], parent : "d" } //find all descendants of b: >db.tree2.find({ ancestors : ‘b’}) //find all ancestors of f: >ancestors = db.tree2.findOne({ _id :’f’}).ancestors >db.tree2.find({ _id : { $in : ancestors})
  • 32. Variable Keys How to index ? { "_id" : "uuid1",   "field1" : {   "ctx1" : { "ctx3" : 5, … },     "ctx8" : { "ctx3" : 5, … } }} db.MyCollection.find({ "field1.ctx1.ctx3" : { $exists : true} }) Rewrite: { "_id" : "uuid1",   "field1" : {   key: "ctx1”, value : { k:"ctx3”, v : 5, … },     key: "ctx8”, value : { k: "ctx3”, v : 5, … } }} db.x.ensureIndex({“field1.key.k”, 1})
  • 33. findAndModify Queue example //Example: find highest priority job and mark job = db.jobs.findAndModify({ query : {inprogress: false}, sort : {priority: -1), update : {$set: {inprogress: true, started: new Date()}}, new : true})
  • 34. Learn More Kyle’s presentation + video: http://www.slideshare.net/kbanker/mongodb-schema-design http://www.blip.tv/file/3704083 Dwight’s presentation http://www.slideshare.net/mongosf/schema-design-with-mongodb-dwight-merriman Documentation Trees: http://www.mongodb.org/display/DOCS/Trees+in+MongoDB Queues: http://www.mongodb.org/display/DOCS/findandmodify+Command Aggregration: http://www.mongodb.org/display/DOCS/Aggregation Capped Col. : http://www.mongodb.org/display/DOCS/Capped+Collections Geo: http://www.mongodb.org/display/DOCS/Geospatial+Indexing GridFS: http://www.mongodb.org/display/DOCS/GridFS+Specification
  • 37. DBRef DBRef { $ref : collection, $id : id_value} - Think URL - YDSMV: your driver support may vary Sample Schema: nr = { note_refs : [{"$ref" : "notes", "$id" : 5}, ... ]} Dereferencing: nr.forEach(function(r) { printjson(db[r.$ref].findOne({ _id : r.$id})); }
  • 38. BSON Mongodb stores data in BSON internally Lightweight, Traversable, Efficient encoding Typed boolean, integer, float, date, string, binary, array...

Editor's Notes

  • #37: blog post twitter