SlideShare a Scribd company logo
Azure DocumentDB:
Deep Dive into
Advanced Features
Aravind Ramachandran
Program Manager
Azure DocumentDB
@arkramac
Andrew Liu
Program Manager
Azure DocumentDB
@aliuy8
A Quick Recap…
3 V’s of data : Endless possibilities
LearningGaming
Retail
Telematics
Mobile Apps
IoT
The 2x2s of database tradeoffs
DocumentDB: Capabilities
Guaranteed low latency
• <10ms reads/<15ms writes @ P99.
• Requests are served from local region
• Write optimized, latch-free database
engine designed for SSDs and low latency
access.
• Synchronous and automatic document
indexing at sustained ingestion rates
Elastic and limitless
global scale
• Independently scale throughput and
storage - locally and globally
• Transparent partition management and
routing
Multiple consistency levels
• Multiple well defined consistency levels
• Intuitive programming model for relaxed consistency models
• Clear PACELC tradeoffs and 99.99% availability SLAs
SQL and JavaScript –
schema free
• Automatic tree path based indexing
• No schemas or secondary indices required
upfront
• SQL and JavaScript language integrated
queries
• Hash, range, and spatial
• Multi-document, JavaScript language
integrated transactions
DocumentDB resource model
Resources
• identified by their logical and stable URI
• Represented as JSON documents
• Partitioned and across span machines, clusters and regions
1
Resource model
• Stateless interaction (HTTP and TCP)
• Hierarchical overlay atop partitioning model
2
Partitioning Model
• Grid Partitioning – horizontal based on
hash/range and vertical across regions
• Each partition made highly available via a replica
set
3
Replica-
set
US-East
US-West
N
Europe
Partitions
Partition set
Local
distribution
Globaldistribution
Accessing DocumentDB
Java .NET Java .NET
Ruby
…
Let’s talk about…
• Modeling JSON Documents
• Collections and Scaling
• Query and Indexing
• Global Distribution
• Tips and Best Practices
Everything you need to know to build
Blazing fast, planet-scale applications!
Let’s talk about JSON documents
"With great power comes great responsibility“
- Uncle Ben
How do approaches differ?
Data normalization
How do approaches differ?
Come as you are
Data normalization
How do approaches differ?
Person
Address
ContactDetail
ContactDetailType
PersonContactDetailLnk
PersonId
ContactDetailId
Id Id
Id Id
Modeling Data: The Relational Way
Person
Id
Addresses
{
"id": "0ec1ab0c-de08-4e42-a429-...",
"addresses": [
{ "street": "1 Redmond Way",
"city": "Redmond", "state": "WA",
"zip": 98052}
],
"contactDetails": [
{"type": "home", "detail": “555-1212"},
{"type": "email", "detail": “me@ms.com"}
],
...
}
Address
…
Address
…
ContactDetails
ContactDetail
…
Modeling Data: The Document Way
To embed, or to reference, that is the question
{
"id": "1",
"firstName": "Thomas",
"lastName": "Andersen",
"addresses": [
{
"line1": "100 Some Street",
"line2": "Unit 1",
"city": "Seattle",
"state": "WA",
"zip": 98012 }
],
"contactDetails": [
{"email: "thomas@andersen.com"},
{"phone": "+1 555 555-5555", "extension": 5555}
]
}
Try model your entity as a self-
contained document
Generally, use embedded data
models when:
There are "contains" relationships
between entities
There are one-to-few relationships
between entities
Embedded data changes infrequently
Embedded data won’t grow without
bounds
Embedded data is integral to data in a
document
Data modeling with denormalization
better read performance
In general, use normalized data
models when:
Write performance is more important
than read performance
Representing one-to-many
relationships
Can representing many-to-many
relationships
Related data changes frequently
Provides more flexibility than
embedding
More round trips to read data
Data modeling with referencing
{
"id": "xyz",
"username: "user xyz"
}
{
"id": "address_xyz",
"userid": "xyz",
"address" : {
…
}
}
{
"id: "contact_xyz",
"userid": "xyz",
"email" : "user@user.com"
"phone" : "555 5555"
}
Normalizing typically provides better write performance
No magic bullet
Hybrid Approach:
Model on a property-level
(as opposed to record-level)
Optimize your data model for
your workload…
(as opposed to blindly following types)
Modeling impacts RU due to
document size
Hybrid models
{
"id": "1",
"firstName": "Thomas",
"lastName": "Andersen",
"countOfBooks": 3,
"books": [1, 2, 3],
"images": [
{"thumbnail": "http://....png"}
{"profile": "http://....png"}
]
}
{
"id": 1,
"name": "DocumentDB 101",
"authors": [
{"id": 1, "name": "Thomas Andersen", "thumbnail":
"http://....png"},
{"id": 2, "name": "William Wakefield", "thumbnail":
"http://....png"}
]
}
Collections + Elastic Scale
Elastic scale
Measuring Throughput (Request Units)
Replica gets a fixed
budget of request units
Request Unit/sec (RU) is
the normalized currency
% IOPS
% CPU
% Memory
Document
Documents
Document
Operations consume request units
(RUs)
Documents
Min
RU/sec
Max
RU/sec
IncomingRequests
Replica
Quiescen
t
Rate
limit
No
throttlin
g
Requests get rate limited
if they exceed the SLA
Customers pay for reserved
request units by the hour
What are partitions?
…. ….
Partition 1 Partition 2 Partition i Partition n
…
Collection
What are partitions?
…. ….
London
Paris
…
Partition 1 Partition 2 Partition i Partition n
New York …
Houston
Chicago
New Delhi
Mumbai
Boston
Berlin
…
Partition Key = city
Partitioning patterns
Writes should scale across Partition Keys
…. ….
…
Partition 1 Partition 2 Partition i Partition n
…
……
Partitioning patterns
Writes should scale across Partition Keys
…. ….
…
Partition 1 Partition 2 Partition i Partition n
…
……
Partitioning patterns
Reads should minimize cross-partition lookups
…. ….
…
Partition 1 Partition 2 Partition i Partition n
…
……
Recipe for Choosing Partition Key
Let's talk about Query and Indexing
Query and Indexing
Demo
DocumentDB: SQL and JavaScript queries
{ "locations":
[ { "country": "Germany", "city": "Berlin" },
{ "country": "France", "city": "Paris" }
],
"headquarter": "Belgium",
"exports": [{ "city": "Moscow" }, { "city": "Athens" }]
}
locations headquarter exports
0 1
country
Germany
city
Berlin
country
France
city
Paris
city
Moscow
city
Athens
Belgium 0 1
{ "locations": [{ "country": "Germany", "city": "Bonn", "revenue": 200 } ],
"headquarter": "Italy",
"exports": [ { "city": "Berlin","dealers": [{"name": "Hans"}] }, { "city":
"Athens" }
]
} locations headquarter
0
country
Germany
city
Bonn
revenue
200
Italy
exports
city
Berlin
city
Athens
0
1
dealers
0
Hans
name
{
"results":
[
{
"locations":
[
{"country":"Germany","city":"Berlin"},
{"country":"France","city":"Paris"}
]
}
]
0
locations
0 1
country
Germany
city
Berlin
country
France
city
Paris
results
SELECT C.locations
FROM company C
WHERE C.headquarter = "Belgium"
SQL function businessLogic() {
var country = "Belgium";
__.filter(function(x){return x.headquarter===country;});}
JavaScript
Indexing under the hood
• Logically the index is a union of all the document trees
• Structure contributed by the interior nodes, instance values are the leaves
Common
structure
• Structural information and instance values are normalized into a
unifying concept of JSON-Path
Terms Postings List
$/location/0/ 1, 2
location/0/country/ 1, 2
location/0/city/ 1, 2
0/country/Germany 1, 2
1/country/France 2
… …
0/city/Moscow 2
0/dealers/0 2
0
Germany
location
0
location
country
0
country
Range &
ORDERBY queries
0
Germany
location
0
location
country
0
country
Wildcard queries Spatial queries
0
coordinates
Dynamic
Encoding of
Postings List
(E-WAH/differential)
Queries that use the index
• Equality: =
• Range: <, >, <=, >=
• ORDER BY
• String operators: STARTSWITH
• Spatial operators: ST_WITHIN and ST_DISTANCE
• Array operators: ARRAY_CONTAINS
• Schema operators: IS_DEFINED, IS_NUMBER, IS_STRING, …
Indexing Policies
Configuration Level Options
Automatic Per collection True (default) or False
Override with each document write
Indexing Mode Per collection Consistent, Lazy, and None
None for KV workloads
Included and excluded
paths
Per path Individual path or recursive includes (? And *)
Indexing Type Per path Support Hash, Range, and Spatial
Indexing Precision Per path Supports 1 – 100 per path (and max)
Tradeoff storage, query RUs and write Rus
Let’s talk about Planet-Scale
Guaranteed low latency
“I want my data wherever my users are.”
Guaranteed high availability
Globally. With policy based failover.
99.99%
Multi-region DocumentDB databases
DocumentDB
Collection
Replica-
set
US-East
US-West
India
Partitions
Partition set
Globaldistribution
Local distribution
Primary Replica-sets
2M RUs
Secondary Replica-sets
2M RUs
2M RUs
Secondary Replica-sets
A DocumentDB collection
2M RUs
Total RUs =
Provisioned RUs x Number of
regions
In this example:
2M RUs x 3 regions = 6M RUs
Programmable data consistency
“Its hard to write distributed apps.”
Strong consistency,
High latency
Eventual consistency, Low
latency
Consistency Levels
• PACELC Theorem and the associated tradeoffs
Consistency Levels
• Strong, Eventual, Bounded Staleness, and Session
Strong Bounded
Staleness
Sessio
n
Eventu
al
LEFT TO RIGHT  Weaker Consistency, Better Read scalability, Lower write latency
Client
P SS
Client
P SS
Clie
nt
P SS
Client
P SS
Client
• Consistent Prefix reads.
• Reads lag behind writes by K
prefixes or T interval
• Monotonic reads, writes and
Read your writes guarantee
Global Distribution
Demo
DocumentDB Recent Updates
• Automatic Expiration via Time-To-Live (TTL)
• Expanded Geo-Spatial support for Polygons and Lines
• Preview Support for
• Local Emulator
• IP Filtering
• Self-Service Backup + Restore
• Protocol Support for MongoDB
Q&A and more resources…
Session Evaluations
ways to access
Go to passSummit.com Download the GuideBook App
and search: PASS Summit 2016
Follow the QR code link displayed
on session signage throughout the
conference venue and in the
program guide
Submit by 5pm
Friday November 6th to
WIN prizes
Your feedback is
important and valuable. 3
Thank You
Learn more from
Azure DocumentDB
askdocdb@microsoft.com or follow @DocumentDB
Ad

More Related Content

What's hot (20)

Azure DocumentDB 101
Azure DocumentDB 101Azure DocumentDB 101
Azure DocumentDB 101
Ike Ellis
 
Migrating from RDBMS to MongoDB
Migrating from RDBMS to MongoDBMigrating from RDBMS to MongoDB
Migrating from RDBMS to MongoDB
MongoDB
 
Cosmos DB at VLDB 2019
Cosmos DB at VLDB 2019Cosmos DB at VLDB 2019
Cosmos DB at VLDB 2019
Dharma Shukla
 
MongoDB Basics
MongoDB BasicsMongoDB Basics
MongoDB Basics
Sarang Shravagi
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and How
BigBlueHat
 
Common MongoDB Use Cases
Common MongoDB Use CasesCommon MongoDB Use Cases
Common MongoDB Use Cases
DATAVERSITY
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design Patterns
MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
MongoDB
 
MongoDB at the Silicon Valley iPhone and iPad Developers' Meetup
MongoDB at the Silicon Valley iPhone and iPad Developers' MeetupMongoDB at the Silicon Valley iPhone and iPad Developers' Meetup
MongoDB at the Silicon Valley iPhone and iPad Developers' Meetup
MongoDB
 
TechEd AU 2014: Microsoft Azure DocumentDB Deep Dive
TechEd AU 2014: Microsoft Azure DocumentDB Deep DiveTechEd AU 2014: Microsoft Azure DocumentDB Deep Dive
TechEd AU 2014: Microsoft Azure DocumentDB Deep Dive
Intergen
 
Transitioning from SQL to MongoDB
Transitioning from SQL to MongoDBTransitioning from SQL to MongoDB
Transitioning from SQL to MongoDB
MongoDB
 
Non Relational Databases
Non Relational DatabasesNon Relational Databases
Non Relational Databases
Chris Baglieri
 
Mongodb - NoSql Database
Mongodb - NoSql DatabaseMongodb - NoSql Database
Mongodb - NoSql Database
Prashant Gupta
 
A Presentation on MongoDB Introduction - Habilelabs
A Presentation on MongoDB Introduction - HabilelabsA Presentation on MongoDB Introduction - Habilelabs
A Presentation on MongoDB Introduction - Habilelabs
HabileLabs
 
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
Apache Spark and MongoDB - Turning Analytics into Real-Time ActionApache Spark and MongoDB - Turning Analytics into Real-Time Action
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
João Gabriel Lima
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)
Uwe Printz
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
Ike Ellis
 
Mongo db operations_v2
Mongo db operations_v2Mongo db operations_v2
Mongo db operations_v2
Thanabalan Sathneeganandan
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDB
Lee Theobald
 
Performance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4jPerformance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4j
ArangoDB Database
 
Azure DocumentDB 101
Azure DocumentDB 101Azure DocumentDB 101
Azure DocumentDB 101
Ike Ellis
 
Migrating from RDBMS to MongoDB
Migrating from RDBMS to MongoDBMigrating from RDBMS to MongoDB
Migrating from RDBMS to MongoDB
MongoDB
 
Cosmos DB at VLDB 2019
Cosmos DB at VLDB 2019Cosmos DB at VLDB 2019
Cosmos DB at VLDB 2019
Dharma Shukla
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and How
BigBlueHat
 
Common MongoDB Use Cases
Common MongoDB Use CasesCommon MongoDB Use Cases
Common MongoDB Use Cases
DATAVERSITY
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design Patterns
MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
MongoDB
 
MongoDB at the Silicon Valley iPhone and iPad Developers' Meetup
MongoDB at the Silicon Valley iPhone and iPad Developers' MeetupMongoDB at the Silicon Valley iPhone and iPad Developers' Meetup
MongoDB at the Silicon Valley iPhone and iPad Developers' Meetup
MongoDB
 
TechEd AU 2014: Microsoft Azure DocumentDB Deep Dive
TechEd AU 2014: Microsoft Azure DocumentDB Deep DiveTechEd AU 2014: Microsoft Azure DocumentDB Deep Dive
TechEd AU 2014: Microsoft Azure DocumentDB Deep Dive
Intergen
 
Transitioning from SQL to MongoDB
Transitioning from SQL to MongoDBTransitioning from SQL to MongoDB
Transitioning from SQL to MongoDB
MongoDB
 
Non Relational Databases
Non Relational DatabasesNon Relational Databases
Non Relational Databases
Chris Baglieri
 
Mongodb - NoSql Database
Mongodb - NoSql DatabaseMongodb - NoSql Database
Mongodb - NoSql Database
Prashant Gupta
 
A Presentation on MongoDB Introduction - Habilelabs
A Presentation on MongoDB Introduction - HabilelabsA Presentation on MongoDB Introduction - Habilelabs
A Presentation on MongoDB Introduction - Habilelabs
HabileLabs
 
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
Apache Spark and MongoDB - Turning Analytics into Real-Time ActionApache Spark and MongoDB - Turning Analytics into Real-Time Action
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
João Gabriel Lima
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)
Uwe Printz
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
Ike Ellis
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDB
Lee Theobald
 
Performance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4jPerformance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4j
ArangoDB Database
 

Viewers also liked (12)

Azure DocumentDB Overview
Azure DocumentDB OverviewAzure DocumentDB Overview
Azure DocumentDB Overview
Andrew Liu
 
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSONSQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
Sascha Dittmann
 
[GAB2016] Azure DocumentDB - Jean-Luc Boucho
[GAB2016] Azure DocumentDB - Jean-Luc Boucho[GAB2016] Azure DocumentDB - Jean-Luc Boucho
[GAB2016] Azure DocumentDB - Jean-Luc Boucho
Cellenza
 
Test driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDBTest driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDB
Andrew Siemer
 
Introduction to DocumentDB
Introduction to DocumentDBIntroduction to DocumentDB
Introduction to DocumentDB
Takekazu Omi
 
Azure DocumentDB en Dev@Nights
Azure DocumentDB en Dev@NightsAzure DocumentDB en Dev@Nights
Azure DocumentDB en Dev@Nights
Matias Quaranta
 
Azure DocumentDb
Azure DocumentDbAzure DocumentDb
Azure DocumentDb
Marco Parenzan
 
実プロジェクトの経験から学ぶazureサービス適用パターン
実プロジェクトの経験から学ぶazureサービス適用パターン実プロジェクトの経験から学ぶazureサービス適用パターン
実プロジェクトの経験から学ぶazureサービス適用パターン
Kuniteru Asami
 
20141010 マイクロソフト技術と共に目指すフルスタックエンジニアへの道
20141010 マイクロソフト技術と共に目指すフルスタックエンジニアへの道20141010 マイクロソフト技術と共に目指すフルスタックエンジニアへの道
20141010 マイクロソフト技術と共に目指すフルスタックエンジニアへの道
Osamu Takazoe
 
How to Determine What Your Customer Wants
How to Determine What Your Customer WantsHow to Determine What Your Customer Wants
How to Determine What Your Customer Wants
Travis Levell
 
[aOS N°2] DevOps & SharePoint - Michel Hubert
[aOS N°2] DevOps & SharePoint - Michel Hubert[aOS N°2] DevOps & SharePoint - Michel Hubert
[aOS N°2] DevOps & SharePoint - Michel Hubert
Cellenza
 
Introducing DocumentDB
Introducing DocumentDB Introducing DocumentDB
Introducing DocumentDB
James Serra
 
Azure DocumentDB Overview
Azure DocumentDB OverviewAzure DocumentDB Overview
Azure DocumentDB Overview
Andrew Liu
 
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSONSQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
Sascha Dittmann
 
[GAB2016] Azure DocumentDB - Jean-Luc Boucho
[GAB2016] Azure DocumentDB - Jean-Luc Boucho[GAB2016] Azure DocumentDB - Jean-Luc Boucho
[GAB2016] Azure DocumentDB - Jean-Luc Boucho
Cellenza
 
Test driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDBTest driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDB
Andrew Siemer
 
Introduction to DocumentDB
Introduction to DocumentDBIntroduction to DocumentDB
Introduction to DocumentDB
Takekazu Omi
 
Azure DocumentDB en Dev@Nights
Azure DocumentDB en Dev@NightsAzure DocumentDB en Dev@Nights
Azure DocumentDB en Dev@Nights
Matias Quaranta
 
実プロジェクトの経験から学ぶazureサービス適用パターン
実プロジェクトの経験から学ぶazureサービス適用パターン実プロジェクトの経験から学ぶazureサービス適用パターン
実プロジェクトの経験から学ぶazureサービス適用パターン
Kuniteru Asami
 
20141010 マイクロソフト技術と共に目指すフルスタックエンジニアへの道
20141010 マイクロソフト技術と共に目指すフルスタックエンジニアへの道20141010 マイクロソフト技術と共に目指すフルスタックエンジニアへの道
20141010 マイクロソフト技術と共に目指すフルスタックエンジニアへの道
Osamu Takazoe
 
How to Determine What Your Customer Wants
How to Determine What Your Customer WantsHow to Determine What Your Customer Wants
How to Determine What Your Customer Wants
Travis Levell
 
[aOS N°2] DevOps & SharePoint - Michel Hubert
[aOS N°2] DevOps & SharePoint - Michel Hubert[aOS N°2] DevOps & SharePoint - Michel Hubert
[aOS N°2] DevOps & SharePoint - Michel Hubert
Cellenza
 
Introducing DocumentDB
Introducing DocumentDB Introducing DocumentDB
Introducing DocumentDB
James Serra
 
Ad

Similar to [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features (20)

No SQL, No Problem: Use Azure DocumentDB
No SQL, No Problem: Use Azure DocumentDBNo SQL, No Problem: Use Azure DocumentDB
No SQL, No Problem: Use Azure DocumentDB
Ken Cenerelli
 
MongoDB 3.0
MongoDB 3.0 MongoDB 3.0
MongoDB 3.0
Victoria Malaya
 
Simplifying & accelerating application development with MongoDB's intelligent...
Simplifying & accelerating application development with MongoDB's intelligent...Simplifying & accelerating application development with MongoDB's intelligent...
Simplifying & accelerating application development with MongoDB's intelligent...
Maxime Beugnet
 
Azure CosmosDb - Where we are
Azure CosmosDb - Where we areAzure CosmosDb - Where we are
Azure CosmosDb - Where we are
Marco Parenzan
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhentranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
David Peyruc
 
Presentation
PresentationPresentation
Presentation
Dimitris Stripelis
 
Azure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep DiveAzure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep Dive
Andre Essing
 
Azure Cosmos DB - The Swiss Army NoSQL Cloud Database
Azure Cosmos DB - The Swiss Army NoSQL Cloud DatabaseAzure Cosmos DB - The Swiss Army NoSQL Cloud Database
Azure Cosmos DB - The Swiss Army NoSQL Cloud Database
BizTalk360
 
Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...
Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...
Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...
Andre Essing
 
Confluent & MongoDB APAC Lunch & Learn
Confluent & MongoDB APAC Lunch & LearnConfluent & MongoDB APAC Lunch & Learn
Confluent & MongoDB APAC Lunch & Learn
confluent
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA
 
Semi Formal Model for Document Oriented Databases
Semi Formal Model for Document Oriented DatabasesSemi Formal Model for Document Oriented Databases
Semi Formal Model for Document Oriented Databases
Daniel Coupal
 
MongoDB 4.0 새로운 기능 소개
MongoDB 4.0 새로운 기능 소개MongoDB 4.0 새로운 기능 소개
MongoDB 4.0 새로운 기능 소개
Ha-Yang(White) Moon
 
MongoDB 3.4 webinar
MongoDB 3.4 webinarMongoDB 3.4 webinar
MongoDB 3.4 webinar
Andrew Morgan
 
20160317 - PAZUR - PowerBI & R
20160317  - PAZUR - PowerBI & R20160317  - PAZUR - PowerBI & R
20160317 - PAZUR - PowerBI & R
Łukasz Grala
 
Technical overview of Azure Cosmos DB
Technical overview of Azure Cosmos DBTechnical overview of Azure Cosmos DB
Technical overview of Azure Cosmos DB
Microsoft Tech Community
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
Denny Lee
 
MongoDB: What, why, when
MongoDB: What, why, whenMongoDB: What, why, when
MongoDB: What, why, when
Eugenio Minardi
 
Azure CosmosDb
Azure CosmosDbAzure CosmosDb
Azure CosmosDb
Marco Parenzan
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
George Stathis
 
No SQL, No Problem: Use Azure DocumentDB
No SQL, No Problem: Use Azure DocumentDBNo SQL, No Problem: Use Azure DocumentDB
No SQL, No Problem: Use Azure DocumentDB
Ken Cenerelli
 
Simplifying & accelerating application development with MongoDB's intelligent...
Simplifying & accelerating application development with MongoDB's intelligent...Simplifying & accelerating application development with MongoDB's intelligent...
Simplifying & accelerating application development with MongoDB's intelligent...
Maxime Beugnet
 
Azure CosmosDb - Where we are
Azure CosmosDb - Where we areAzure CosmosDb - Where we are
Azure CosmosDb - Where we are
Marco Parenzan
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhentranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
David Peyruc
 
Azure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep DiveAzure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep Dive
Andre Essing
 
Azure Cosmos DB - The Swiss Army NoSQL Cloud Database
Azure Cosmos DB - The Swiss Army NoSQL Cloud DatabaseAzure Cosmos DB - The Swiss Army NoSQL Cloud Database
Azure Cosmos DB - The Swiss Army NoSQL Cloud Database
BizTalk360
 
Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...
Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...
Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...
Andre Essing
 
Confluent & MongoDB APAC Lunch & Learn
Confluent & MongoDB APAC Lunch & LearnConfluent & MongoDB APAC Lunch & Learn
Confluent & MongoDB APAC Lunch & Learn
confluent
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA
 
Semi Formal Model for Document Oriented Databases
Semi Formal Model for Document Oriented DatabasesSemi Formal Model for Document Oriented Databases
Semi Formal Model for Document Oriented Databases
Daniel Coupal
 
MongoDB 4.0 새로운 기능 소개
MongoDB 4.0 새로운 기능 소개MongoDB 4.0 새로운 기능 소개
MongoDB 4.0 새로운 기능 소개
Ha-Yang(White) Moon
 
20160317 - PAZUR - PowerBI & R
20160317  - PAZUR - PowerBI & R20160317  - PAZUR - PowerBI & R
20160317 - PAZUR - PowerBI & R
Łukasz Grala
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
Denny Lee
 
MongoDB: What, why, when
MongoDB: What, why, whenMongoDB: What, why, when
MongoDB: What, why, when
Eugenio Minardi
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
George Stathis
 
Ad

Recently uploaded (20)

Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia
Alexander Romero Arosquipa
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 

[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

  • 1. Azure DocumentDB: Deep Dive into Advanced Features Aravind Ramachandran Program Manager Azure DocumentDB @arkramac Andrew Liu Program Manager Azure DocumentDB @aliuy8
  • 3. 3 V’s of data : Endless possibilities LearningGaming Retail Telematics Mobile Apps IoT
  • 4. The 2x2s of database tradeoffs
  • 5. DocumentDB: Capabilities Guaranteed low latency • <10ms reads/<15ms writes @ P99. • Requests are served from local region • Write optimized, latch-free database engine designed for SSDs and low latency access. • Synchronous and automatic document indexing at sustained ingestion rates Elastic and limitless global scale • Independently scale throughput and storage - locally and globally • Transparent partition management and routing Multiple consistency levels • Multiple well defined consistency levels • Intuitive programming model for relaxed consistency models • Clear PACELC tradeoffs and 99.99% availability SLAs SQL and JavaScript – schema free • Automatic tree path based indexing • No schemas or secondary indices required upfront • SQL and JavaScript language integrated queries • Hash, range, and spatial • Multi-document, JavaScript language integrated transactions
  • 6. DocumentDB resource model Resources • identified by their logical and stable URI • Represented as JSON documents • Partitioned and across span machines, clusters and regions 1 Resource model • Stateless interaction (HTTP and TCP) • Hierarchical overlay atop partitioning model 2 Partitioning Model • Grid Partitioning – horizontal based on hash/range and vertical across regions • Each partition made highly available via a replica set 3 Replica- set US-East US-West N Europe Partitions Partition set Local distribution Globaldistribution
  • 7. Accessing DocumentDB Java .NET Java .NET Ruby …
  • 8. Let’s talk about… • Modeling JSON Documents • Collections and Scaling • Query and Indexing • Global Distribution • Tips and Best Practices Everything you need to know to build Blazing fast, planet-scale applications!
  • 9. Let’s talk about JSON documents "With great power comes great responsibility“ - Uncle Ben
  • 10. How do approaches differ?
  • 11. Data normalization How do approaches differ?
  • 12. Come as you are Data normalization How do approaches differ?
  • 14. Person Id Addresses { "id": "0ec1ab0c-de08-4e42-a429-...", "addresses": [ { "street": "1 Redmond Way", "city": "Redmond", "state": "WA", "zip": 98052} ], "contactDetails": [ {"type": "home", "detail": “555-1212"}, {"type": "email", "detail": “[email protected]"} ], ... } Address … Address … ContactDetails ContactDetail … Modeling Data: The Document Way
  • 15. To embed, or to reference, that is the question
  • 16. { "id": "1", "firstName": "Thomas", "lastName": "Andersen", "addresses": [ { "line1": "100 Some Street", "line2": "Unit 1", "city": "Seattle", "state": "WA", "zip": 98012 } ], "contactDetails": [ {"email: "[email protected]"}, {"phone": "+1 555 555-5555", "extension": 5555} ] } Try model your entity as a self- contained document Generally, use embedded data models when: There are "contains" relationships between entities There are one-to-few relationships between entities Embedded data changes infrequently Embedded data won’t grow without bounds Embedded data is integral to data in a document Data modeling with denormalization better read performance
  • 17. In general, use normalized data models when: Write performance is more important than read performance Representing one-to-many relationships Can representing many-to-many relationships Related data changes frequently Provides more flexibility than embedding More round trips to read data Data modeling with referencing { "id": "xyz", "username: "user xyz" } { "id": "address_xyz", "userid": "xyz", "address" : { … } } { "id: "contact_xyz", "userid": "xyz", "email" : "[email protected]" "phone" : "555 5555" } Normalizing typically provides better write performance
  • 18. No magic bullet Hybrid Approach: Model on a property-level (as opposed to record-level) Optimize your data model for your workload… (as opposed to blindly following types) Modeling impacts RU due to document size Hybrid models { "id": "1", "firstName": "Thomas", "lastName": "Andersen", "countOfBooks": 3, "books": [1, 2, 3], "images": [ {"thumbnail": "http://....png"} {"profile": "http://....png"} ] } { "id": 1, "name": "DocumentDB 101", "authors": [ {"id": 1, "name": "Thomas Andersen", "thumbnail": "http://....png"}, {"id": 2, "name": "William Wakefield", "thumbnail": "http://....png"} ] }
  • 21. Measuring Throughput (Request Units) Replica gets a fixed budget of request units Request Unit/sec (RU) is the normalized currency % IOPS % CPU % Memory Document Documents Document Operations consume request units (RUs) Documents Min RU/sec Max RU/sec IncomingRequests Replica Quiescen t Rate limit No throttlin g Requests get rate limited if they exceed the SLA Customers pay for reserved request units by the hour
  • 22. What are partitions? …. …. Partition 1 Partition 2 Partition i Partition n … Collection
  • 23. What are partitions? …. …. London Paris … Partition 1 Partition 2 Partition i Partition n New York … Houston Chicago New Delhi Mumbai Boston Berlin … Partition Key = city
  • 24. Partitioning patterns Writes should scale across Partition Keys …. …. … Partition 1 Partition 2 Partition i Partition n … ……
  • 25. Partitioning patterns Writes should scale across Partition Keys …. …. … Partition 1 Partition 2 Partition i Partition n … ……
  • 26. Partitioning patterns Reads should minimize cross-partition lookups …. …. … Partition 1 Partition 2 Partition i Partition n … ……
  • 27. Recipe for Choosing Partition Key
  • 28. Let's talk about Query and Indexing
  • 30. DocumentDB: SQL and JavaScript queries { "locations": [ { "country": "Germany", "city": "Berlin" }, { "country": "France", "city": "Paris" } ], "headquarter": "Belgium", "exports": [{ "city": "Moscow" }, { "city": "Athens" }] } locations headquarter exports 0 1 country Germany city Berlin country France city Paris city Moscow city Athens Belgium 0 1 { "locations": [{ "country": "Germany", "city": "Bonn", "revenue": 200 } ], "headquarter": "Italy", "exports": [ { "city": "Berlin","dealers": [{"name": "Hans"}] }, { "city": "Athens" } ] } locations headquarter 0 country Germany city Bonn revenue 200 Italy exports city Berlin city Athens 0 1 dealers 0 Hans name { "results": [ { "locations": [ {"country":"Germany","city":"Berlin"}, {"country":"France","city":"Paris"} ] } ] 0 locations 0 1 country Germany city Berlin country France city Paris results SELECT C.locations FROM company C WHERE C.headquarter = "Belgium" SQL function businessLogic() { var country = "Belgium"; __.filter(function(x){return x.headquarter===country;});} JavaScript
  • 31. Indexing under the hood • Logically the index is a union of all the document trees • Structure contributed by the interior nodes, instance values are the leaves Common structure • Structural information and instance values are normalized into a unifying concept of JSON-Path Terms Postings List $/location/0/ 1, 2 location/0/country/ 1, 2 location/0/city/ 1, 2 0/country/Germany 1, 2 1/country/France 2 … … 0/city/Moscow 2 0/dealers/0 2 0 Germany location 0 location country 0 country Range & ORDERBY queries 0 Germany location 0 location country 0 country Wildcard queries Spatial queries 0 coordinates Dynamic Encoding of Postings List (E-WAH/differential)
  • 32. Queries that use the index • Equality: = • Range: <, >, <=, >= • ORDER BY • String operators: STARTSWITH • Spatial operators: ST_WITHIN and ST_DISTANCE • Array operators: ARRAY_CONTAINS • Schema operators: IS_DEFINED, IS_NUMBER, IS_STRING, …
  • 33. Indexing Policies Configuration Level Options Automatic Per collection True (default) or False Override with each document write Indexing Mode Per collection Consistent, Lazy, and None None for KV workloads Included and excluded paths Per path Individual path or recursive includes (? And *) Indexing Type Per path Support Hash, Range, and Spatial Indexing Precision Per path Supports 1 – 100 per path (and max) Tradeoff storage, query RUs and write Rus
  • 34. Let’s talk about Planet-Scale
  • 35. Guaranteed low latency “I want my data wherever my users are.”
  • 36. Guaranteed high availability Globally. With policy based failover. 99.99%
  • 37. Multi-region DocumentDB databases DocumentDB Collection Replica- set US-East US-West India Partitions Partition set Globaldistribution Local distribution Primary Replica-sets 2M RUs Secondary Replica-sets 2M RUs 2M RUs Secondary Replica-sets A DocumentDB collection 2M RUs Total RUs = Provisioned RUs x Number of regions In this example: 2M RUs x 3 regions = 6M RUs
  • 38. Programmable data consistency “Its hard to write distributed apps.” Strong consistency, High latency Eventual consistency, Low latency
  • 39. Consistency Levels • PACELC Theorem and the associated tradeoffs
  • 40. Consistency Levels • Strong, Eventual, Bounded Staleness, and Session Strong Bounded Staleness Sessio n Eventu al LEFT TO RIGHT  Weaker Consistency, Better Read scalability, Lower write latency Client P SS Client P SS Clie nt P SS Client P SS Client • Consistent Prefix reads. • Reads lag behind writes by K prefixes or T interval • Monotonic reads, writes and Read your writes guarantee
  • 42. DocumentDB Recent Updates • Automatic Expiration via Time-To-Live (TTL) • Expanded Geo-Spatial support for Polygons and Lines • Preview Support for • Local Emulator • IP Filtering • Self-Service Backup + Restore • Protocol Support for MongoDB
  • 43. Q&A and more resources…
  • 44. Session Evaluations ways to access Go to passSummit.com Download the GuideBook App and search: PASS Summit 2016 Follow the QR code link displayed on session signage throughout the conference venue and in the program guide Submit by 5pm Friday November 6th to WIN prizes Your feedback is important and valuable. 3
  • 45. Thank You Learn more from Azure DocumentDB [email protected] or follow @DocumentDB

Editor's Notes

  • #4: Today our lives are informed and influenced by lots of data. Data is the new currency. From how we find a ride to what we watch and how we shop, data drives all these experiences. From the world of structured data for operational needs and analytical needs, today each LOB applications and services we consume use several variety of data. Variety of data dictates that operational databases become schema agnostic and support schema free data storage. Not only variety, but volume of data being generated and the applications deriving intelligence from data is increasing. Velocity of data is also increasing which means applications also expect high throughput for data ingestion and low latency for data retrieval. Databases that are able to provide intelligence to modern applications from high velocity data of different varieties of data at very high volume become what we call as intellibases. These design patterns create endless possibilities in all forms of business applications that improve products and services for consumers. Online retailing experiences, gaming experiences, mobile experiences, vehicle telematics and IoT are some of the key applications that take advantage of the 3V’s of data.
  • #5: Let’s talk about database systems… because that’s what we’re here for. More specifically, let’s talk about common tradeoffs in database systems. E.g. Do you want to be rich? Or do you want to be happy? Well… ideally, I’d like to be both  Elastically Scalable: You want it to be not only highly scalable… but you also want elastically scalable. You don’t want to just say I want 100 machines… and be stuck with 100 machines. You want to be able to grow and shrink. Distribution: You don’t want to just want to span a single cluster. You want this elasticity to span multiple regions and datacenters. This is hard…. You can then look at your favorite database… E.g. does MongoDB support geo distribution? Does it support elastic scale? SQL? Oracle? Low latency vs durability…. You want low latency, but you also want high durability. You don’t want low latency, where your database says got it… but it doesn’t actually commit or persist the data. Unfortunately, there are already too many databases that do that… and they lose data left and right. You want transactions, because they give you a friendly programming model… but you also want high scale. There are databases that give you very rich transactions. But what happens when you try to scale this across many many machines globally? There also database that give you no transactions, or transactions over single items… but that’s too extreme. You want have low-cost schema/index management, but you also want rich queries. There are many database out there today that give you flexible schemas, but they can’t auto index everything and give you rich queryability over that data. Programmability vs Availability Many of you may recognize this as the CAP theorem. You want to be able to trade off availability with consistency to provide a friendly programmability model. TCO vs Performance Isolation - Most databases are designed to run on-premise (outside from Google Big Table, AWS DynamoDB, and Azure DocumentDB). But they weren’t designed to run on the cloud, and require dedicated VMs on AWS or Azure to delivered performance… which isn’t very cost effective. If you try to host these at high density to achieve a lower TCO – you end up with noisy neighbor effects. In order to gaurentee predictable performance in a cost-effective multi-tenant model – you need to build a strong resource governance model directly in to the database to isolate tenants.
  • #6: PACELC: if there is a partition (P), how does the system trade off availability and consistency (A and C); else (E), when the system is running normally in the absence of partitions, how does the system trade off latency (L) and consistency (C)? Reference: Consistency Tradeoffs in Modern Distributed Database System Design
  • #23: What are partitions? Each partition has ​ fixed amount of SSD storage ​ highly available​ hosts one or many partition key value documents​ Collection = Logical container of physical partitions Practically unlimited scale for storage and/or throughput​
  • #24: Understanding partition keys Partition key – decides the placement of documents Any document property can be promoted to partition key Document => Hash(Partition key) => Physical partition Example partition key – on collection body Partitioning characteristics Each partition key is given 10GB space Supports online auto-partitioning Transactions scoped to partition key CRUD on partitioned collections Read/Write via examples x-ms-documentdb-partitionKey Query via examples x-ms-documentdb-query-enablecrosspartition Parallel Query execution Stored Procedures