SlideShare a Scribd company logo
#MongoDBDays




Indexing and Query
Optimization
Chad Tindel
Senior Solution Architect, 10gen
Agenda
• What are indexes?
• Why do I need them?
• Working with indexes in MongoDB
• Optimize your queries
• Avoiding common mistakes
What are indexes?
What are indexes?
Imagine you're looking for a recipe in a cookbook
ordered by recipe name. Looking up a recipe by
name is quick and easy.
What are indexes?
• How would you find a recipe using chicken?
• How about a 250-350 calorie recipe using
 chicken?
KRISTINE TO INSERT IMAGE OF COOKBOOK




Consult the index!
1   2   3    4    5   6   7




        Linked List
1    2    3     4    5     6   7




    Finding 7 in Linked List
4


    2                       6


1          3        5           7


        Finding 7 in Tree
Indexes in MongoDB are B-trees
Queries, inserts and deletes:
       O(log(n)) time
Indexes are the single
biggest tunable
performance factor in
MongoDB
Absent or suboptimal
indexes are the most
common avoidable
MongoDB performance
problem.
Why do I need indexes?
A brief story
Working with Indexes in
MongoDB
How do I create indexes?
// Create an index if one does not exist
db.recipes.createIndex({ main_ingredient: 1 })



// The client remembers the index and raises no errors
db.recipes.ensureIndex({ main_ingredient: 1 })




* 1 means ascending, -1 descending
What can be indexed?
// Multiple fields (compound key indexes)
db.recipes.ensureIndex({
   main_ingredient: 1,
   calories: -1
})

// Arrays of values (multikey indexes)
{
   name: 'Chicken Noodle Soup’,
   ingredients : ['chicken', 'noodles']
}

db.recipes.ensureIndex({ ingredients: 1 })
What can be indexed?
// Subdocuments
{
   name : 'Apple Pie',
   contributor: {
     name: 'Joe American',
     id: 'joea123'
   }
}

db.recipes.ensureIndex({ 'contributor.id': 1 })

db.recipes.ensureIndex({ 'contributor': 1 })
How do I manage indexes?
// List a collection's indexes
db.recipes.getIndexes()
db.recipes.getIndexKeys()


// Drop a specific index
db.recipes.dropIndex({ ingredients: 1 })


// Drop all indexes and recreate them
db.recipes.reIndex()


// Default (unique) index on _id
Background Index Builds
// Index creation is a blocking operation that can take a long time
// Background creation yields to other operations
db.recipes.ensureIndex(
    { ingredients: 1 },
    { background: true }
)
Options
• Uniqueness constraints (unique, dropDups)
• Sparse Indexes
• Geospatial (2d) Indexes
• TTL Collections (expireAfterSeconds)
Uniqueness Constraints
// Only one recipe can have a given value for name
db.recipes.ensureIndex( { name: 1 }, { unique: true } )


// Force index on collection with duplicate recipe names – drop the
duplicates
db.recipes.ensureIndex(
    { name: 1 },
    { unique: true, dropDups: true }
)


* dropDups is probably never what you want
Sparse Indexes
// Only documents with field calories will be indexed
db.recipes.ensureIndex(
    { calories: -1 },
    { sparse: true }
)
// Allow multiple documents to not have calories field
db.recipes.ensureIndex(
    { name: 1 , calories: -1 },
    { unique: true, sparse: true }
)
* Missing fields are stored as null(s) in the index
Geospatial Indexes
// Add latitude, longitude coordinates
{
     name: '10gen Palo Alto’,
     loc: [ 37.449157, -122.158574 ]
}
// Index the coordinates
db.locations.ensureIndex( { loc : '2d' } )


// Query for locations 'near' a particular coordinate
db.locations.find({
     loc: { $near: [ 37.4, -122.3 ] }
})
TTL Collections
// Documents must have a BSON UTC Date field
{ 'status' : ISODate('2012-10-12T05:24:07.211Z'), … }


// Documents are removed after 'expireAfterSeconds' seconds
db.recipes.ensureIndex(
    { submitted_date: 1 },
    { expireAfterSeconds: 3600 }
)
Limitations
• Collections can not have > 64 indexes.

• Index keys can not be > 1024 bytes (1K).

• The name of an index, including the namespace, must be <
  128 characters.
• Queries can only use 1 index*

• Indexes have storage requirements, and impact the
  performance of writes.
• In memory sort (no-index) limited to 32mb of return data.
Optimize Your Queries
Profiling Slow Ops
db.setProfilingLevel( n , slowms=100ms )


n=0 profiler off
n=1 record operations longer than slowms
n=2 record all queries


db.system.profile.find()




* The profile collection is a capped collection, and fixed in size
The Explain Plan (Pre Index)
db.recipes.find( { calories:
    { $lt : 40 } }
).explain( )
{
    "cursor" : "BasicCursor" ,
    "n" : 42,
    "nscannedObjects” : 12345
    "nscanned" : 12345,
    ...
    "millis" : 356,
    ...
}
* Doesn’t use cached plans, re-evals and resets cache
The Explain Plan (Post Index)
db.recipes.find( { calories:
    { $lt : 40 } }
).explain( )
{
    "cursor" : "BtreeCursor calories_-1" ,
    "n" : 42,
    "nscannedObjects": 42
    "nscanned" : 42,
    ...
    "millis" : 0,
    ...
}
* Doesn’t use cached plans, re-evals and resets cache
The Query Optimizer
• For each "type" of query, MongoDB
  periodically tries all useful indexes
• Aborts the rest as soon as one plan wins
• The winning plan is temporarily cached for
  each “type” of query
Manually Select Index to Use
// Tell the database what index to use
db.recipes.find({
  calories: { $lt: 1000 } }
).hint({ _id: 1 })


// Tell the database to NOT use an index
db.recipes.find(
  { calories: { $lt: 1000 } }
).hint({ $natural: 1 })
Use Indexes to Sort Query
Results
// Given the following index
db.collection.ensureIndex({ a:1, b:1 , c:1, d:1 })

// The following query and sort operations can use the index
db.collection.find( ).sort({ a:1 })
db.collection.find( ).sort({ a:1, b:1 })

db.collection.find({ a:4 }).sort({ a:1, b:1 })
db.collection.find({ b:5 }).sort({ a:1, b:1 })
Indexes that won’t work for
sorting query results
// Given the following index
db.collection.ensureIndex({ a:1, b:1, c:1, d:1 })


// These can not sort using the index
db.collection.find( ).sort({ b: 1 })
db.collection.find({ b: 5 }).sort({ b: 1 })
Index Covered Queries
// MongoDB can return data from just the index
db.recipes.ensureIndex({ main_ingredient: 1, name: 1 })

// Return only the ingredients field
db.recipes.find(
   { main_ingredient: 'chicken’ },
   { _id: 0, name: 1 }
)

// indexOnly will be true in the explain plan
db.recipes.find(
    { main_ingredient: 'chicken' },
    { _id: 0, name: 1 }
).explain()
{
    "indexOnly": true,
}
Absent or suboptimal
indexes are the most
common avoidable
MongoDB performance
problem.
Avoiding Common
Mistakes
Trying to Use Multiple
Indexes
// MongoDB can only use one index for a query
db.collection.ensureIndex({ a: 1 })
db.collection.ensureIndex({ b: 1 })


// Only one of the above indexes is used
db.collection.find({ a: 3, b: 4 })
Compound Key Mistakes
// Compound key indexes are very effective
db.collection.ensureIndex({ a: 1, b: 1, c: 1 })


// But only if the query is a prefix of the index


// This query can't effectively use the index
db.collection.find({ c: 2 })


// …but this query can
db.collection.find({ a: 3, b: 5 })
Low Selectivity Indexes
db.collection.distinct('status’)
[ 'new', 'processed' ]


db.collection.ensureIndex({ status: 1 })


// Low selectivity indexes provide little benefit
db.collection.find({ status: 'new' })


// Better
db.collection.ensureIndex({ status: 1, created_at: -1 })
db.collection.find(
  { status: 'new' }
).sort({ created_at: -1 })
Regular Expressions
db.users.ensureIndex({ username: 1 })


// Left anchored regex queries can use the index
db.users.find({ username: /^joe smith/ })


// But not generic regexes
db.users.find({username: /smith/ })


// Or case insensitive queries
db.users.find({ username: /Joe/i })
Negation
// Indexes aren't helpful with negations
db.things.ensureIndex({ x: 1 })

// e.g. "not equal" queries
db.things.find({ x: { $ne: 3 } })

// …or "not in" queries
db.things.find({ x: { $nin: [2, 3, 4 ] } })

// …or the $not operator
db.people.find({ name: { $not: 'John Doe' } })
Choosing the right
indexes is one of the
most important things
you can do as a
MongoDB developer so
take the time to get your
indexes right!
#MongoDBDays




Thank you
Chad Tindel
Senior Solution Architect, 10gen

More Related Content

What's hot (20)

PDF
Apache Spark Core—Deep Dive—Proper Optimization
Databricks
 
PPTX
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
DataStax
 
PDF
Data Science Across Data Sources with Apache Arrow
Databricks
 
PDF
Hudi architecture, fundamentals and capabilities
Nishith Agarwal
 
PDF
Making Apache Spark Better with Delta Lake
Databricks
 
PDF
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
HostedbyConfluent
 
PDF
Dynamic Partition Pruning in Apache Spark
Databricks
 
PDF
Apache Hudi: The Path Forward
Alluxio, Inc.
 
PDF
Wide Column Store NoSQL vs SQL Data Modeling
ScyllaDB
 
PDF
The Patterns of Distributed Logging and Containers
SATOSHI TAGOMORI
 
PPTX
Elastic 101 - Get started
Ismaeel Enjreny
 
PDF
Delta from a Data Engineer's Perspective
Databricks
 
PDF
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Databricks
 
PPTX
Css Basics
Jay Patel
 
PPTX
Elastic Compute Cloud (EC2) on AWS Presentation
Knoldus Inc.
 
PDF
Apache Druid 101
Data Con LA
 
PDF
MongoDB vs. Postgres Benchmarks
EDB
 
PPTX
Redis introduction
Federico Daniel Colombo Gennarelli
 
PDF
Spark Performance Tuning .pdf
Amit Raj
 
PDF
Introduction to Cassandra
Gokhan Atil
 
Apache Spark Core—Deep Dive—Proper Optimization
Databricks
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
DataStax
 
Data Science Across Data Sources with Apache Arrow
Databricks
 
Hudi architecture, fundamentals and capabilities
Nishith Agarwal
 
Making Apache Spark Better with Delta Lake
Databricks
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
HostedbyConfluent
 
Dynamic Partition Pruning in Apache Spark
Databricks
 
Apache Hudi: The Path Forward
Alluxio, Inc.
 
Wide Column Store NoSQL vs SQL Data Modeling
ScyllaDB
 
The Patterns of Distributed Logging and Containers
SATOSHI TAGOMORI
 
Elastic 101 - Get started
Ismaeel Enjreny
 
Delta from a Data Engineer's Perspective
Databricks
 
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Databricks
 
Css Basics
Jay Patel
 
Elastic Compute Cloud (EC2) on AWS Presentation
Knoldus Inc.
 
Apache Druid 101
Data Con LA
 
MongoDB vs. Postgres Benchmarks
EDB
 
Spark Performance Tuning .pdf
Amit Raj
 
Introduction to Cassandra
Gokhan Atil
 

Similar to Indexing & Query Optimization (20)

PPTX
Indexing and Query Optimization
MongoDB
 
PPTX
Indexing and Query Optimisation
MongoDB
 
PPTX
Indexing and Query Optimisation
MongoDB
 
PPTX
Webinar: Indexing and Query Optimization
MongoDB
 
PPTX
Indexing and Query Optimization
MongoDB
 
PPTX
MongoDB and Indexes - MUG Denver - 20160329
Douglas Duncan
 
PPTX
Query Optimization in MongoDB
Hamoon Mohammadian Pour
 
PPT
Fast querying indexing for performance (4)
MongoDB
 
PPTX
Indexing Strategies to Help You Scale
MongoDB
 
PPTX
#MongoDB indexes
Daniele Graziani
 
PDF
Indexing and Query Optimization Webinar
MongoDB
 
PPTX
Webinar: General Technical Overview of MongoDB for Dev Teams
MongoDB
 
PDF
A17 indexing and query optimization by paul pederson
Insight Technology, Inc.
 
PDF
Indexing and Query Performance in MongoDB.pdf
Malak Abu Hammad
 
PPTX
Mongodb Performance
Jack
 
PPTX
Indexing In MongoDB
Kishor Parkhe
 
PDF
Mongophilly indexing-2011-04-26
kreuter
 
PDF
Indexing and Query Optimizer (Mongo Austin)
MongoDB
 
PDF
Nosql part 2
Ruru Chowdhury
 
TXT
Getting Started With MongoDB
Bill Kunneke
 
Indexing and Query Optimization
MongoDB
 
Indexing and Query Optimisation
MongoDB
 
Indexing and Query Optimisation
MongoDB
 
Webinar: Indexing and Query Optimization
MongoDB
 
Indexing and Query Optimization
MongoDB
 
MongoDB and Indexes - MUG Denver - 20160329
Douglas Duncan
 
Query Optimization in MongoDB
Hamoon Mohammadian Pour
 
Fast querying indexing for performance (4)
MongoDB
 
Indexing Strategies to Help You Scale
MongoDB
 
#MongoDB indexes
Daniele Graziani
 
Indexing and Query Optimization Webinar
MongoDB
 
Webinar: General Technical Overview of MongoDB for Dev Teams
MongoDB
 
A17 indexing and query optimization by paul pederson
Insight Technology, Inc.
 
Indexing and Query Performance in MongoDB.pdf
Malak Abu Hammad
 
Mongodb Performance
Jack
 
Indexing In MongoDB
Kishor Parkhe
 
Mongophilly indexing-2011-04-26
kreuter
 
Indexing and Query Optimizer (Mongo Austin)
MongoDB
 
Nosql part 2
Ruru Chowdhury
 
Getting Started With MongoDB
Bill Kunneke
 
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
Ad

Indexing & Query Optimization

  • 1. #MongoDBDays Indexing and Query Optimization Chad Tindel Senior Solution Architect, 10gen
  • 2. Agenda • What are indexes? • Why do I need them? • Working with indexes in MongoDB • Optimize your queries • Avoiding common mistakes
  • 4. What are indexes? Imagine you're looking for a recipe in a cookbook ordered by recipe name. Looking up a recipe by name is quick and easy.
  • 5. What are indexes? • How would you find a recipe using chicken? • How about a 250-350 calorie recipe using chicken?
  • 6. KRISTINE TO INSERT IMAGE OF COOKBOOK Consult the index!
  • 7. 1 2 3 4 5 6 7 Linked List
  • 8. 1 2 3 4 5 6 7 Finding 7 in Linked List
  • 9. 4 2 6 1 3 5 7 Finding 7 in Tree
  • 10. Indexes in MongoDB are B-trees
  • 11. Queries, inserts and deletes: O(log(n)) time
  • 12. Indexes are the single biggest tunable performance factor in MongoDB
  • 13. Absent or suboptimal indexes are the most common avoidable MongoDB performance problem.
  • 14. Why do I need indexes? A brief story
  • 15. Working with Indexes in MongoDB
  • 16. How do I create indexes? // Create an index if one does not exist db.recipes.createIndex({ main_ingredient: 1 }) // The client remembers the index and raises no errors db.recipes.ensureIndex({ main_ingredient: 1 }) * 1 means ascending, -1 descending
  • 17. What can be indexed? // Multiple fields (compound key indexes) db.recipes.ensureIndex({ main_ingredient: 1, calories: -1 }) // Arrays of values (multikey indexes) { name: 'Chicken Noodle Soup’, ingredients : ['chicken', 'noodles'] } db.recipes.ensureIndex({ ingredients: 1 })
  • 18. What can be indexed? // Subdocuments { name : 'Apple Pie', contributor: { name: 'Joe American', id: 'joea123' } } db.recipes.ensureIndex({ 'contributor.id': 1 }) db.recipes.ensureIndex({ 'contributor': 1 })
  • 19. How do I manage indexes? // List a collection's indexes db.recipes.getIndexes() db.recipes.getIndexKeys() // Drop a specific index db.recipes.dropIndex({ ingredients: 1 }) // Drop all indexes and recreate them db.recipes.reIndex() // Default (unique) index on _id
  • 20. Background Index Builds // Index creation is a blocking operation that can take a long time // Background creation yields to other operations db.recipes.ensureIndex( { ingredients: 1 }, { background: true } )
  • 21. Options • Uniqueness constraints (unique, dropDups) • Sparse Indexes • Geospatial (2d) Indexes • TTL Collections (expireAfterSeconds)
  • 22. Uniqueness Constraints // Only one recipe can have a given value for name db.recipes.ensureIndex( { name: 1 }, { unique: true } ) // Force index on collection with duplicate recipe names – drop the duplicates db.recipes.ensureIndex( { name: 1 }, { unique: true, dropDups: true } ) * dropDups is probably never what you want
  • 23. Sparse Indexes // Only documents with field calories will be indexed db.recipes.ensureIndex( { calories: -1 }, { sparse: true } ) // Allow multiple documents to not have calories field db.recipes.ensureIndex( { name: 1 , calories: -1 }, { unique: true, sparse: true } ) * Missing fields are stored as null(s) in the index
  • 24. Geospatial Indexes // Add latitude, longitude coordinates { name: '10gen Palo Alto’, loc: [ 37.449157, -122.158574 ] } // Index the coordinates db.locations.ensureIndex( { loc : '2d' } ) // Query for locations 'near' a particular coordinate db.locations.find({ loc: { $near: [ 37.4, -122.3 ] } })
  • 25. TTL Collections // Documents must have a BSON UTC Date field { 'status' : ISODate('2012-10-12T05:24:07.211Z'), … } // Documents are removed after 'expireAfterSeconds' seconds db.recipes.ensureIndex( { submitted_date: 1 }, { expireAfterSeconds: 3600 } )
  • 26. Limitations • Collections can not have > 64 indexes. • Index keys can not be > 1024 bytes (1K). • The name of an index, including the namespace, must be < 128 characters. • Queries can only use 1 index* • Indexes have storage requirements, and impact the performance of writes. • In memory sort (no-index) limited to 32mb of return data.
  • 28. Profiling Slow Ops db.setProfilingLevel( n , slowms=100ms ) n=0 profiler off n=1 record operations longer than slowms n=2 record all queries db.system.profile.find() * The profile collection is a capped collection, and fixed in size
  • 29. The Explain Plan (Pre Index) db.recipes.find( { calories: { $lt : 40 } } ).explain( ) { "cursor" : "BasicCursor" , "n" : 42, "nscannedObjects” : 12345 "nscanned" : 12345, ... "millis" : 356, ... } * Doesn’t use cached plans, re-evals and resets cache
  • 30. The Explain Plan (Post Index) db.recipes.find( { calories: { $lt : 40 } } ).explain( ) { "cursor" : "BtreeCursor calories_-1" , "n" : 42, "nscannedObjects": 42 "nscanned" : 42, ... "millis" : 0, ... } * Doesn’t use cached plans, re-evals and resets cache
  • 31. The Query Optimizer • For each "type" of query, MongoDB periodically tries all useful indexes • Aborts the rest as soon as one plan wins • The winning plan is temporarily cached for each “type” of query
  • 32. Manually Select Index to Use // Tell the database what index to use db.recipes.find({ calories: { $lt: 1000 } } ).hint({ _id: 1 }) // Tell the database to NOT use an index db.recipes.find( { calories: { $lt: 1000 } } ).hint({ $natural: 1 })
  • 33. Use Indexes to Sort Query Results // Given the following index db.collection.ensureIndex({ a:1, b:1 , c:1, d:1 }) // The following query and sort operations can use the index db.collection.find( ).sort({ a:1 }) db.collection.find( ).sort({ a:1, b:1 }) db.collection.find({ a:4 }).sort({ a:1, b:1 }) db.collection.find({ b:5 }).sort({ a:1, b:1 })
  • 34. Indexes that won’t work for sorting query results // Given the following index db.collection.ensureIndex({ a:1, b:1, c:1, d:1 }) // These can not sort using the index db.collection.find( ).sort({ b: 1 }) db.collection.find({ b: 5 }).sort({ b: 1 })
  • 35. Index Covered Queries // MongoDB can return data from just the index db.recipes.ensureIndex({ main_ingredient: 1, name: 1 }) // Return only the ingredients field db.recipes.find( { main_ingredient: 'chicken’ }, { _id: 0, name: 1 } ) // indexOnly will be true in the explain plan db.recipes.find( { main_ingredient: 'chicken' }, { _id: 0, name: 1 } ).explain() { "indexOnly": true, }
  • 36. Absent or suboptimal indexes are the most common avoidable MongoDB performance problem.
  • 38. Trying to Use Multiple Indexes // MongoDB can only use one index for a query db.collection.ensureIndex({ a: 1 }) db.collection.ensureIndex({ b: 1 }) // Only one of the above indexes is used db.collection.find({ a: 3, b: 4 })
  • 39. Compound Key Mistakes // Compound key indexes are very effective db.collection.ensureIndex({ a: 1, b: 1, c: 1 }) // But only if the query is a prefix of the index // This query can't effectively use the index db.collection.find({ c: 2 }) // …but this query can db.collection.find({ a: 3, b: 5 })
  • 40. Low Selectivity Indexes db.collection.distinct('status’) [ 'new', 'processed' ] db.collection.ensureIndex({ status: 1 }) // Low selectivity indexes provide little benefit db.collection.find({ status: 'new' }) // Better db.collection.ensureIndex({ status: 1, created_at: -1 }) db.collection.find( { status: 'new' } ).sort({ created_at: -1 })
  • 41. Regular Expressions db.users.ensureIndex({ username: 1 }) // Left anchored regex queries can use the index db.users.find({ username: /^joe smith/ }) // But not generic regexes db.users.find({username: /smith/ }) // Or case insensitive queries db.users.find({ username: /Joe/i })
  • 42. Negation // Indexes aren't helpful with negations db.things.ensureIndex({ x: 1 }) // e.g. "not equal" queries db.things.find({ x: { $ne: 3 } }) // …or "not in" queries db.things.find({ x: { $nin: [2, 3, 4 ] } }) // …or the $not operator db.people.find({ name: { $not: 'John Doe' } })
  • 43. Choosing the right indexes is one of the most important things you can do as a MongoDB developer so take the time to get your indexes right!
  • 44. #MongoDBDays Thank you Chad Tindel Senior Solution Architect, 10gen

Editor's Notes

  • #4: When speaking: What are indexes and why do we need them?First part of this talk is conceptualSecond part is extremely detailed
  • #10: Look at 7 documents
  • #11: Queries, inserts and deletes: O(log(n)) time
  • #12: MongoDB&apos;s indexes are B-Trees.Lookups (queries), inserts and deletes happen in O(log(n)) time.TODO: Add a page describing what a B-Tree is???
  • #13: So this is helpful, and can speed up queries by a tremendous amount
  • #14: So it’s imperative we understand them
  • #16: Tell a story about a customer problem caused by a missing index.
  • #18: Repeated calls to ensureIndex only result in one create message going to the server. The index is cached client side for some period of time (varies by driver).
  • #20: Indexes can be costly if you have too manysoooo....
  • #21: getIndexes returns an index document for each index in the collection.dropIndex requires the spec used to create the index initiallyreIndex drops *all* indexes (including the _id index) and rebuilds them
  • #22: Caveats:Still a resource-intensive operationIndex build is slowerThe mongo shell session or app will block while the index buildsIndexes are still built in the foreground on secondariesKristine to provide replica set image.
  • #24: unique applies a uniqueness constant on duplicate values.dropDups will force the server to create a unique index by only keeping the first document found in natural order with a value and dropping all other documents with that value.dropDups will likely result in data loss!!!TODO: Maybe add a red exclamation point for dropDups.
  • #25: MongoDB doesn&apos;t enforce a schema – documents are not required to have the same fields.Sparse indexes only contain entries for documents that have the indexed field.Without sparse, documents without field &apos;a&apos; have a null entry in the index for that field.With sparse a unique constraint can be applied to a field not shared by all documents. Otherwise multiple &apos;null&apos; values violate the unique constraint.XXX: Is there a visual that makes sense here?
  • #26: &apos;2d&apos; index is a geohash on top of the b-tree.Allows you to search for documents &apos;near&apos; a latitude/longitude position. Bounds queries are also possible using $within.TODO: Google maps image, or something similar. Kristine to provide.
  • #27: Index must be on a BSON date field.Documents are removed after expireAfterSeconds seconds.Reaper thread runs every 60 seconds.TODO: Hourglass image, or something similar. Kristine to provide.
  • #28: Indexes are a really powerful feature of MongoDB, however there are some limitations.Understanding these limitations is an important part of using MongoDB correctly.With the exception of $or queries.If index key exceeds 1k, documents silently dropped/not included
  • #30: Changingslowms also affects what queries are logged to the mongodb log file.
  • #31: cursor – the type of cursor used. BasicCursor means no index was used. TODO: Use a real example here instead of made up numbers…n – the number of documents that match the querynscannedObjects – the number of documents that had to be scannednscanned – the number of items (index entries or documents) examinedmillis – how long the query tookRatio of n to nscanned should be as close to 1 as possible.
  • #32: cursor – the type of cursor used. BasicCursor means no index was used.n – the number of documents that match the querynscannedObjects – the number of documents that had to be scannednscanned – the number of items (index entries or documents) examinedmillis – how long the query tookRatio of n to nscanned should be as close to 1 as possible.
  • #33: Winning plan is reevaluated after 1000 write operations (insert, update, remove, etc.).TODO: Replace much of this with an animation? Kristine to provide.
  • #34: Tells MongoDB exactly what index to use.
  • #35: MongoDB sorts results based on the field order in the index.For queries that include a sort that uses a compound key index, ensure that all fields before the first sorted field are equality matches.TODO: Better explanation
  • #36: MongoDB sorts results based on the field order in the index.For queries that include a sort that uses a compound key index, ensure that all fields before the first sorted field are equality matches.TODO: Better explanation
  • #37: TODO: Cookbook image here? Rework to go along with the cookbook example?
  • #39: Tell a story about a customer problem caused by a suboptimal index.TODO: Change background color?
  • #42: Better to use a compound index on the low selectivity field and some other more selective field.