SlideShare a Scribd company logo
MongoDBโ€™s New Aggregation FeaturesChris Westinยฉ Copyright 2010 10gen Inc.
What problem are we solving?Map/Reduce can be used for aggregationโ€ฆCurrently being used for totaling, averaging, etcMap/Reduce is a big hammerSimpler tasks should be easierShouldnโ€™t need to write JavaScriptAvoid the overhead of JavaScript engineWeโ€™re seeing requests for help in handling complex documentsSelect only subdocuments or arrays
How will we solve the problem?Our new aggregation frameworkDeclarative frameworkNo JavaScript requiredDescribe a chain of operations to applyExpression evaluationReturn computed valuesFramework:  we can add new operations easilyC++ implementationHigher performance than JavaScript
Aggregation - PipelinesAggregation requests specify a pipelineA pipeline is a series of operationsConceptually, the members of a collection are passed through a pipeline to produce a resultSimilar to a command-line pipe
Pipeline Operations$matchUses a query predicate (like .find({โ€ฆ})) as a filter$projectUses a sample document to determine the shape of the result (similar to .find()โ€™s optional argument)This can include computed values$groupAggregates items into buckets defined by a key
Computed ExpressionsAvailable in $project operationsPrefix expression languageAdd two fields:  $add:[โ€œ$field1โ€, โ€œ$field2โ€]Provide a value for a missing field: $ifnull:[โ€œ$field1โ€, โ€œ$field2โ€]Nesting:  $add:[โ€œ$field1โ€, $ifnull:[โ€œ$field2โ€, โ€œ$field3โ€]]Other functionsโ€ฆ.And we can easily add more as required
Projections$project can reshape results$unwind expression doles out array values one at a timePull fields from nested documents to the topPush fields from the top down into new virtual documents
Grouping$group aggregation expressionsTotal of column values:  $sumAverage of column values: $avgCollect column values in an array:  $push
Demo(See script at https://gist.github.com/993733)
Usage TipsUse $match in a pipeline as early as possibleThe query optimizer can then be used to choose an index and avoid scanning the entire collection
Driver SupportInitial version is a commandFor any language, build a JSON database object, and execute the command{ aggregate : <collection>, pipeline : {โ€ฆ} }Beware of command result size limit
When is this being released?In final development nowExpect to see this in the near future
Sharding supportInitial release will support shardingMongos analyzes pipeline, and forwards operations up to $group to shards; combines shard server results and continues
Pipeline Operations โ€“ Future Plans$sortSorts the document stream according to a key$outSaves the document stream to a collectionSimilar to M/R $out, but with sharded output
Expressions โ€“ Future PlansDate field extractionGet year, month, day, hour, etc, from DateDate arithmetic
MongoDB Aggregation MongoSF May 2011
Ad

More Related Content

What's hot (20)

Data Processing with Cascading Java API on Apache Hadoop
Data Processing with Cascading Java API on Apache HadoopData Processing with Cascading Java API on Apache Hadoop
Data Processing with Cascading Java API on Apache Hadoop
Hikmat Dhamee
ย 
Introduction to ELK
Introduction to ELKIntroduction to ELK
Introduction to ELK
Harshakumar Ummerpillai
ย 
Machine Learning in a Twitter ETL using ELK
Machine Learning in a Twitter ETL using ELK Machine Learning in a Twitter ETL using ELK
Machine Learning in a Twitter ETL using ELK
hypto
ย 
Updating materialized views and caches using kafka
Updating materialized views and caches using kafkaUpdating materialized views and caches using kafka
Updating materialized views and caches using kafka
Zach Cox
ย 
RethinkDB - the open-source database for the realtime web
RethinkDB - the open-source database for the realtime webRethinkDB - the open-source database for the realtime web
RethinkDB - the open-source database for the realtime web
Alex Ivanov
ย 
Building data flows with Celery and SQLAlchemy
Building data flows with Celery and SQLAlchemyBuilding data flows with Celery and SQLAlchemy
Building data flows with Celery and SQLAlchemy
Roger Barnes
ย 
EG Reports - Delicious Data
EG Reports - Delicious DataEG Reports - Delicious Data
EG Reports - Delicious Data
Benjamin Shum
ย 
Apache Spark - Aram Mkrtchyan
Apache Spark - Aram MkrtchyanApache Spark - Aram Mkrtchyan
Apache Spark - Aram Mkrtchyan
Hovhannes Kuloghlyan
ย 
MongoDB
MongoDBMongoDB
MongoDB
Ganesh Kunwar
ย 
Replicating application data into materialized views
Replicating application data into materialized viewsReplicating application data into materialized views
Replicating application data into materialized views
Zach Cox
ย 
9.4json
9.4json9.4json
9.4json
Andrew Dunstan
ย 
Sphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQLSphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQL
Nguyen Van Vuong
ย 
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech MeetupLogstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Startit
ย 
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
ForgeRock
ย 
Get docs from sp doc library
Get docs from sp doc libraryGet docs from sp doc library
Get docs from sp doc library
Sudip Sengupta
ย 
Mongo db admin_20110316
Mongo db admin_20110316Mongo db admin_20110316
Mongo db admin_20110316
radiocats
ย 
A Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with LuigiA Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with Luigi
Growth Intelligence
ย 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
Cloudera, Inc.
ย 
MongoDB San Francisco DrupalCon 2010
MongoDB San Francisco DrupalCon 2010MongoDB San Francisco DrupalCon 2010
MongoDB San Francisco DrupalCon 2010
Karoly Negyesi
ย 
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Modern Data Stack France
ย 
Data Processing with Cascading Java API on Apache Hadoop
Data Processing with Cascading Java API on Apache HadoopData Processing with Cascading Java API on Apache Hadoop
Data Processing with Cascading Java API on Apache Hadoop
Hikmat Dhamee
ย 
Machine Learning in a Twitter ETL using ELK
Machine Learning in a Twitter ETL using ELK Machine Learning in a Twitter ETL using ELK
Machine Learning in a Twitter ETL using ELK
hypto
ย 
Updating materialized views and caches using kafka
Updating materialized views and caches using kafkaUpdating materialized views and caches using kafka
Updating materialized views and caches using kafka
Zach Cox
ย 
RethinkDB - the open-source database for the realtime web
RethinkDB - the open-source database for the realtime webRethinkDB - the open-source database for the realtime web
RethinkDB - the open-source database for the realtime web
Alex Ivanov
ย 
Building data flows with Celery and SQLAlchemy
Building data flows with Celery and SQLAlchemyBuilding data flows with Celery and SQLAlchemy
Building data flows with Celery and SQLAlchemy
Roger Barnes
ย 
EG Reports - Delicious Data
EG Reports - Delicious DataEG Reports - Delicious Data
EG Reports - Delicious Data
Benjamin Shum
ย 
Apache Spark - Aram Mkrtchyan
Apache Spark - Aram MkrtchyanApache Spark - Aram Mkrtchyan
Apache Spark - Aram Mkrtchyan
Hovhannes Kuloghlyan
ย 
Replicating application data into materialized views
Replicating application data into materialized viewsReplicating application data into materialized views
Replicating application data into materialized views
Zach Cox
ย 
Sphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQLSphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQL
Nguyen Van Vuong
ย 
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech MeetupLogstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Startit
ย 
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
ForgeRock
ย 
Get docs from sp doc library
Get docs from sp doc libraryGet docs from sp doc library
Get docs from sp doc library
Sudip Sengupta
ย 
Mongo db admin_20110316
Mongo db admin_20110316Mongo db admin_20110316
Mongo db admin_20110316
radiocats
ย 
A Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with LuigiA Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with Luigi
Growth Intelligence
ย 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
Cloudera, Inc.
ย 
MongoDB San Francisco DrupalCon 2010
MongoDB San Francisco DrupalCon 2010MongoDB San Francisco DrupalCon 2010
MongoDB San Francisco DrupalCon 2010
Karoly Negyesi
ย 
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Modern Data Stack France
ย 

Viewers also liked (11)

ะŸั€ะฐะบั‚ะธั‡ะตัะบะพะต ะฟั€ะธะผะตะฝะตะฝะธะต MongoDB Aggregation Framework
ะŸั€ะฐะบั‚ะธั‡ะตัะบะพะต ะฟั€ะธะผะตะฝะตะฝะธะต MongoDB Aggregation FrameworkะŸั€ะฐะบั‚ะธั‡ะตัะบะพะต ะฟั€ะธะผะตะฝะตะฝะธะต MongoDB Aggregation Framework
ะŸั€ะฐะบั‚ะธั‡ะตัะบะพะต ะฟั€ะธะผะตะฝะตะฝะธะต MongoDB Aggregation Framework
ะ”ะตะฝะธั ะšั€ะฐะฒั‡ะตะฝะบะพ
ย 
MongoDB's New Aggregation framework
MongoDB's New Aggregation frameworkMongoDB's New Aggregation framework
MongoDB's New Aggregation framework
Chris Westin
ย 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB
ย 
Web Design Trends 2011
Web Design Trends 2011Web Design Trends 2011
Web Design Trends 2011
Vitaly Friedman
ย 
Sharding
ShardingSharding
Sharding
MongoDB
ย 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
Gianfranco Palumbo
ย 
Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2
MongoDB
ย 
Optimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at LocalyticsOptimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at Localytics
andrew311
ย 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
MongoDB
ย 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB
ย 
Grid FS
Grid FSGrid FS
Grid FS
Chris Powers
ย 
ะŸั€ะฐะบั‚ะธั‡ะตัะบะพะต ะฟั€ะธะผะตะฝะตะฝะธะต MongoDB Aggregation Framework
ะŸั€ะฐะบั‚ะธั‡ะตัะบะพะต ะฟั€ะธะผะตะฝะตะฝะธะต MongoDB Aggregation FrameworkะŸั€ะฐะบั‚ะธั‡ะตัะบะพะต ะฟั€ะธะผะตะฝะตะฝะธะต MongoDB Aggregation Framework
ะŸั€ะฐะบั‚ะธั‡ะตัะบะพะต ะฟั€ะธะผะตะฝะตะฝะธะต MongoDB Aggregation Framework
ะ”ะตะฝะธั ะšั€ะฐะฒั‡ะตะฝะบะพ
ย 
MongoDB's New Aggregation framework
MongoDB's New Aggregation frameworkMongoDB's New Aggregation framework
MongoDB's New Aggregation framework
Chris Westin
ย 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB
ย 
Web Design Trends 2011
Web Design Trends 2011Web Design Trends 2011
Web Design Trends 2011
Vitaly Friedman
ย 
Sharding
ShardingSharding
Sharding
MongoDB
ย 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
Gianfranco Palumbo
ย 
Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2
MongoDB
ย 
Optimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at LocalyticsOptimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at Localytics
andrew311
ย 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
MongoDB
ย 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB
ย 
Ad

Similar to MongoDB Aggregation MongoSF May 2011 (20)

Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Raghunath A
ย 
Experiment no 05
Experiment no 05Experiment no 05
Experiment no 05
Ankit Dubey
ย 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
Caserta
ย 
Hands on Mahout!
Hands on Mahout!Hands on Mahout!
Hands on Mahout!
OSCON Byrum
ย 
phoenix-on-calcite-hadoop-summit-2016
phoenix-on-calcite-hadoop-summit-2016phoenix-on-calcite-hadoop-summit-2016
phoenix-on-calcite-hadoop-summit-2016
Maryann Xue
ย 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Julian Hyde
ย 
Cost-based Query Optimization
Cost-based Query Optimization Cost-based Query Optimization
Cost-based Query Optimization
DataWorks Summit/Hadoop Summit
ย 
Cost-Based query optimization
Cost-Based query optimizationCost-Based query optimization
Cost-Based query optimization
DataWorks Summit/Hadoop Summit
ย 
9-Query Processing-05-06-2023.PPT
9-Query Processing-05-06-2023.PPT9-Query Processing-05-06-2023.PPT
9-Query Processing-05-06-2023.PPT
venkatapranaykumarGa
ย 
Sedna XML Database: Query Parser & Optimizing Rewriter
Sedna XML Database: Query Parser & Optimizing RewriterSedna XML Database: Query Parser & Optimizing Rewriter
Sedna XML Database: Query Parser & Optimizing Rewriter
Ivan Shcheklein
ย 
cheat-sheets.pdf
cheat-sheets.pdfcheat-sheets.pdf
cheat-sheets.pdf
FabianaFCordeiro
ย 
Software development - the java perspective
Software development - the java perspectiveSoftware development - the java perspective
Software development - the java perspective
Alin Pandichi
ย 
Einfรผhrung in MongoDB
Einfรผhrung in MongoDBEinfรผhrung in MongoDB
Einfรผhrung in MongoDB
NETUserGroupBern
ย 
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB
ย 
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
Lisa Roth, PMP
ย 
Query optimization to improve performance of the code execution
Query optimization to improve performance of the code executionQuery optimization to improve performance of the code execution
Query optimization to improve performance of the code execution
Alexander Decker
ย 
11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution
Alexander Decker
ย 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
Jason Terpko
ย 
Practical catalyst
Practical catalystPractical catalyst
Practical catalyst
dwm042
ย 
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamMalo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Flink Forward
ย 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Raghunath A
ย 
Experiment no 05
Experiment no 05Experiment no 05
Experiment no 05
Ankit Dubey
ย 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
Caserta
ย 
Hands on Mahout!
Hands on Mahout!Hands on Mahout!
Hands on Mahout!
OSCON Byrum
ย 
phoenix-on-calcite-hadoop-summit-2016
phoenix-on-calcite-hadoop-summit-2016phoenix-on-calcite-hadoop-summit-2016
phoenix-on-calcite-hadoop-summit-2016
Maryann Xue
ย 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Julian Hyde
ย 
9-Query Processing-05-06-2023.PPT
9-Query Processing-05-06-2023.PPT9-Query Processing-05-06-2023.PPT
9-Query Processing-05-06-2023.PPT
venkatapranaykumarGa
ย 
Sedna XML Database: Query Parser & Optimizing Rewriter
Sedna XML Database: Query Parser & Optimizing RewriterSedna XML Database: Query Parser & Optimizing Rewriter
Sedna XML Database: Query Parser & Optimizing Rewriter
Ivan Shcheklein
ย 
Software development - the java perspective
Software development - the java perspectiveSoftware development - the java perspective
Software development - the java perspective
Alin Pandichi
ย 
Einfรผhrung in MongoDB
Einfรผhrung in MongoDBEinfรผhrung in MongoDB
Einfรผhrung in MongoDB
NETUserGroupBern
ย 
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB
ย 
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
Lisa Roth, PMP
ย 
Query optimization to improve performance of the code execution
Query optimization to improve performance of the code executionQuery optimization to improve performance of the code execution
Query optimization to improve performance of the code execution
Alexander Decker
ย 
11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution
Alexander Decker
ย 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
Jason Terpko
ย 
Practical catalyst
Practical catalystPractical catalyst
Practical catalyst
dwm042
ย 
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamMalo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Flink Forward
ย 
Ad

More from Chris Westin (20)

Data torrent meetup-productioneng
Data torrent meetup-productionengData torrent meetup-productioneng
Data torrent meetup-productioneng
Chris Westin
ย 
Gripshort
GripshortGripshort
Gripshort
Chris Westin
ย 
Ambari hadoop-ops-meetup-2013-09-19.final
Ambari hadoop-ops-meetup-2013-09-19.finalAmbari hadoop-ops-meetup-2013-09-19.final
Ambari hadoop-ops-meetup-2013-09-19.final
Chris Westin
ย 
Cluster management and automation with cloudera manager
Cluster management and automation with cloudera managerCluster management and automation with cloudera manager
Cluster management and automation with cloudera manager
Chris Westin
ย 
Building low latency java applications with ehcache
Building low latency java applications with ehcacheBuilding low latency java applications with ehcache
Building low latency java applications with ehcache
Chris Westin
ย 
SDN/OpenFlow #lspe
SDN/OpenFlow #lspeSDN/OpenFlow #lspe
SDN/OpenFlow #lspe
Chris Westin
ย 
cfengine3 at #lspe
cfengine3 at #lspecfengine3 at #lspe
cfengine3 at #lspe
Chris Westin
ย 
Nimbula lspe-2012-04-19
Nimbula lspe-2012-04-19Nimbula lspe-2012-04-19
Nimbula lspe-2012-04-19
Chris Westin
ย 
mongodb-brief-intro-february-2012
mongodb-brief-intro-february-2012mongodb-brief-intro-february-2012
mongodb-brief-intro-february-2012
Chris Westin
ย 
Stingray - Riverbed Technology
Stingray - Riverbed TechnologyStingray - Riverbed Technology
Stingray - Riverbed Technology
Chris Westin
ย 
Replication and replica sets
Replication and replica setsReplication and replica sets
Replication and replica sets
Chris Westin
ย 
Architecting a Scale Out Cloud Storage Solution
Architecting a Scale Out Cloud Storage SolutionArchitecting a Scale Out Cloud Storage Solution
Architecting a Scale Out Cloud Storage Solution
Chris Westin
ย 
FlashCache
FlashCacheFlashCache
FlashCache
Chris Westin
ย 
Large Scale Cacti
Large Scale CactiLarge Scale Cacti
Large Scale Cacti
Chris Westin
ย 
MongoDB: An Introduction - July 2011
MongoDB:  An Introduction - July 2011MongoDB:  An Introduction - July 2011
MongoDB: An Introduction - July 2011
Chris Westin
ย 
Practical Replication June-2011
Practical Replication June-2011Practical Replication June-2011
Practical Replication June-2011
Chris Westin
ย 
MongoDB: An Introduction - june-2011
MongoDB:  An Introduction - june-2011MongoDB:  An Introduction - june-2011
MongoDB: An Introduction - june-2011
Chris Westin
ย 
Ganglia Overview-v2
Ganglia Overview-v2Ganglia Overview-v2
Ganglia Overview-v2
Chris Westin
ย 
Mysql Proxy Presentation Yahoo
Mysql Proxy Presentation YahooMysql Proxy Presentation Yahoo
Mysql Proxy Presentation Yahoo
Chris Westin
ย 
Mysql proxy presentation_yahoo
Mysql proxy presentation_yahooMysql proxy presentation_yahoo
Mysql proxy presentation_yahoo
Chris Westin
ย 
Data torrent meetup-productioneng
Data torrent meetup-productionengData torrent meetup-productioneng
Data torrent meetup-productioneng
Chris Westin
ย 
Gripshort
GripshortGripshort
Gripshort
Chris Westin
ย 
Ambari hadoop-ops-meetup-2013-09-19.final
Ambari hadoop-ops-meetup-2013-09-19.finalAmbari hadoop-ops-meetup-2013-09-19.final
Ambari hadoop-ops-meetup-2013-09-19.final
Chris Westin
ย 
Cluster management and automation with cloudera manager
Cluster management and automation with cloudera managerCluster management and automation with cloudera manager
Cluster management and automation with cloudera manager
Chris Westin
ย 
Building low latency java applications with ehcache
Building low latency java applications with ehcacheBuilding low latency java applications with ehcache
Building low latency java applications with ehcache
Chris Westin
ย 
SDN/OpenFlow #lspe
SDN/OpenFlow #lspeSDN/OpenFlow #lspe
SDN/OpenFlow #lspe
Chris Westin
ย 
cfengine3 at #lspe
cfengine3 at #lspecfengine3 at #lspe
cfengine3 at #lspe
Chris Westin
ย 
Nimbula lspe-2012-04-19
Nimbula lspe-2012-04-19Nimbula lspe-2012-04-19
Nimbula lspe-2012-04-19
Chris Westin
ย 
mongodb-brief-intro-february-2012
mongodb-brief-intro-february-2012mongodb-brief-intro-february-2012
mongodb-brief-intro-february-2012
Chris Westin
ย 
Stingray - Riverbed Technology
Stingray - Riverbed TechnologyStingray - Riverbed Technology
Stingray - Riverbed Technology
Chris Westin
ย 
Replication and replica sets
Replication and replica setsReplication and replica sets
Replication and replica sets
Chris Westin
ย 
Architecting a Scale Out Cloud Storage Solution
Architecting a Scale Out Cloud Storage SolutionArchitecting a Scale Out Cloud Storage Solution
Architecting a Scale Out Cloud Storage Solution
Chris Westin
ย 
FlashCache
FlashCacheFlashCache
FlashCache
Chris Westin
ย 
Large Scale Cacti
Large Scale CactiLarge Scale Cacti
Large Scale Cacti
Chris Westin
ย 
MongoDB: An Introduction - July 2011
MongoDB:  An Introduction - July 2011MongoDB:  An Introduction - July 2011
MongoDB: An Introduction - July 2011
Chris Westin
ย 
Practical Replication June-2011
Practical Replication June-2011Practical Replication June-2011
Practical Replication June-2011
Chris Westin
ย 
MongoDB: An Introduction - june-2011
MongoDB:  An Introduction - june-2011MongoDB:  An Introduction - june-2011
MongoDB: An Introduction - june-2011
Chris Westin
ย 
Ganglia Overview-v2
Ganglia Overview-v2Ganglia Overview-v2
Ganglia Overview-v2
Chris Westin
ย 
Mysql Proxy Presentation Yahoo
Mysql Proxy Presentation YahooMysql Proxy Presentation Yahoo
Mysql Proxy Presentation Yahoo
Chris Westin
ย 
Mysql proxy presentation_yahoo
Mysql proxy presentation_yahooMysql proxy presentation_yahoo
Mysql proxy presentation_yahoo
Chris Westin
ย 

Recently uploaded (20)

Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
ย 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
ย 
Hands On: Create a Lightning Aura Component with force:RecordData
Hands On: Create a Lightning Aura Component with force:RecordDataHands On: Create a Lightning Aura Component with force:RecordData
Hands On: Create a Lightning Aura Component with force:RecordData
Lynda Kane
ย 
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
Lynda Kane
ย 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
ย 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
ย 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
ย 
"PHP and MySQL CRUD Operations for Student Management System"
"PHP and MySQL CRUD Operations for Student Management System""PHP and MySQL CRUD Operations for Student Management System"
"PHP and MySQL CRUD Operations for Student Management System"
Jainul Musani
ย 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
ย 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
ย 
Datastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptxDatastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptx
kaleeswaric3
ย 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
ย 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
ย 
Learn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step GuideLearn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step Guide
Marcel David
ย 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
ย 
"Client Partnership โ€” the Path to Exponential Growth for Companies Sized 50-5...
"Client Partnership โ€” the Path to Exponential Growth for Companies Sized 50-5..."Client Partnership โ€” the Path to Exponential Growth for Companies Sized 50-5...
"Client Partnership โ€” the Path to Exponential Growth for Companies Sized 50-5...
Fwdays
ย 
AI Changes Everything โ€“ Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything โ€“ Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything โ€“ Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything โ€“ Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
ย 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
ย 
Network Security. Different aspects of Network Security.
Network Security. Different aspects of Network Security.Network Security. Different aspects of Network Security.
Network Security. Different aspects of Network Security.
gregtap1
ย 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
ย 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
ย 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
ย 
Hands On: Create a Lightning Aura Component with force:RecordData
Hands On: Create a Lightning Aura Component with force:RecordDataHands On: Create a Lightning Aura Component with force:RecordData
Hands On: Create a Lightning Aura Component with force:RecordData
Lynda Kane
ย 
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
Lynda Kane
ย 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
ย 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
ย 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
ย 
"PHP and MySQL CRUD Operations for Student Management System"
"PHP and MySQL CRUD Operations for Student Management System""PHP and MySQL CRUD Operations for Student Management System"
"PHP and MySQL CRUD Operations for Student Management System"
Jainul Musani
ย 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
ย 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
ย 
Datastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptxDatastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptx
kaleeswaric3
ย 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
ย 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
ย 
Learn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step GuideLearn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step Guide
Marcel David
ย 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
ย 
"Client Partnership โ€” the Path to Exponential Growth for Companies Sized 50-5...
"Client Partnership โ€” the Path to Exponential Growth for Companies Sized 50-5..."Client Partnership โ€” the Path to Exponential Growth for Companies Sized 50-5...
"Client Partnership โ€” the Path to Exponential Growth for Companies Sized 50-5...
Fwdays
ย 
AI Changes Everything โ€“ Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything โ€“ Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything โ€“ Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything โ€“ Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
ย 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
ย 
Network Security. Different aspects of Network Security.
Network Security. Different aspects of Network Security.Network Security. Different aspects of Network Security.
Network Security. Different aspects of Network Security.
gregtap1
ย 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
ย 

MongoDB Aggregation MongoSF May 2011

  • 1. MongoDBโ€™s New Aggregation FeaturesChris Westinยฉ Copyright 2010 10gen Inc.
  • 2. What problem are we solving?Map/Reduce can be used for aggregationโ€ฆCurrently being used for totaling, averaging, etcMap/Reduce is a big hammerSimpler tasks should be easierShouldnโ€™t need to write JavaScriptAvoid the overhead of JavaScript engineWeโ€™re seeing requests for help in handling complex documentsSelect only subdocuments or arrays
  • 3. How will we solve the problem?Our new aggregation frameworkDeclarative frameworkNo JavaScript requiredDescribe a chain of operations to applyExpression evaluationReturn computed valuesFramework: we can add new operations easilyC++ implementationHigher performance than JavaScript
  • 4. Aggregation - PipelinesAggregation requests specify a pipelineA pipeline is a series of operationsConceptually, the members of a collection are passed through a pipeline to produce a resultSimilar to a command-line pipe
  • 5. Pipeline Operations$matchUses a query predicate (like .find({โ€ฆ})) as a filter$projectUses a sample document to determine the shape of the result (similar to .find()โ€™s optional argument)This can include computed values$groupAggregates items into buckets defined by a key
  • 6. Computed ExpressionsAvailable in $project operationsPrefix expression languageAdd two fields: $add:[โ€œ$field1โ€, โ€œ$field2โ€]Provide a value for a missing field: $ifnull:[โ€œ$field1โ€, โ€œ$field2โ€]Nesting: $add:[โ€œ$field1โ€, $ifnull:[โ€œ$field2โ€, โ€œ$field3โ€]]Other functionsโ€ฆ.And we can easily add more as required
  • 7. Projections$project can reshape results$unwind expression doles out array values one at a timePull fields from nested documents to the topPush fields from the top down into new virtual documents
  • 8. Grouping$group aggregation expressionsTotal of column values: $sumAverage of column values: $avgCollect column values in an array: $push
  • 9. Demo(See script at https://gist.github.com/993733)
  • 10. Usage TipsUse $match in a pipeline as early as possibleThe query optimizer can then be used to choose an index and avoid scanning the entire collection
  • 11. Driver SupportInitial version is a commandFor any language, build a JSON database object, and execute the command{ aggregate : <collection>, pipeline : {โ€ฆ} }Beware of command result size limit
  • 12. When is this being released?In final development nowExpect to see this in the near future
  • 13. Sharding supportInitial release will support shardingMongos analyzes pipeline, and forwards operations up to $group to shards; combines shard server results and continues
  • 14. Pipeline Operations โ€“ Future Plans$sortSorts the document stream according to a key$outSaves the document stream to a collectionSimilar to M/R $out, but with sharded output
  • 15. Expressions โ€“ Future PlansDate field extractionGet year, month, day, hour, etc, from DateDate arithmetic