SlideShare a Scribd company logo
MongoDBEurope2016
Old Billingsgate, London
15th November
mongodb.com/europe
Register with code JD20 for a 20% discount
Back to Basics 2016 : Webinar 5
Introduction to the Aggregation Framework
Joe Drumgoole
Director of Developer Advocacy, EMEA
@jdrumgoole
V1.1
3
Recap
• Webinar 1 – Introduction to NoSQL
– The different types of NoSQL databases
– MongoDB is a document database
• Webinar 2 – My First Application
– Creating databases and collections
– CRUD, Indexes and Explain
• Webinar 3 – Schema Design
– Dynamic schema
– Embedding approaches
• Webinar 4 – GeoSpatial and Text Indexes
4
The Aggregation Framework
• An analytics engine for MongoDB
• What is analytics?
• Think of the two types of database, OLTP, OLAP
• OLTP : Online Transaction Processing
– airline booking,
– ATMs,
– Taxi booking
• OLAP : Online Analytical Processing
– Which tickets make us most money?
– When do we need to refill our ATMs?
– How many cabs do we need to service the West End of London?
5
What Does This Look Like?
OLTP OLAP
6
OLAP - There Be (Hadoop) Dragons Here
• OLAP queries are often table scans
• The output of the queries is often stored for future analysis and comparison
• Many customers are looking at Spark and Hadoop for OLAP but:
– Complexity is astronomical
– Focused on algorithmic analysis of data (you gotta write a program)
– Requires some significant knowledge of parallel alogrithms and parallel
processing
• The aggregation framework is a kinder gentler tool 
• May do what you want in less time with less anguish
7
The Aggregation Framework – A Processing Pipeline
Match Project Group SortLimit
• Think unix pipeline
• The output of one stage is passed to the input of the next stage
• Each stage performs one job
• Stages can be repeated
• The input is a single collection
8
Pipeline Operators
• $match
Filter documents
• $project
Reshape documents
• $group
Summarize documents
• $out
Create new collections
• $sort
Order documents
• $limit/$skip
Paginate documents
• $lookup
Join two collections togther
• $unwind
Expand an array
9
The MoT Data Set
10
Example Document
{ "_id" : ObjectId("5759ee6e8684975e1098af68"),
"TestID" : 400,
"VehicleID" : "278",
"TestDate" : ISODate("2013-04-23T00:00:00Z"),
"TestClassID" : "4",
"TestType" : "N",
"TestResult" : "P",
"TestMileage" : 99284,
"Postcode" : "E",
"Make" : "AUDI",
"Model" : "A3",
"Colour" : "BLACK",
"FuelType" : "P",
"CylinderCapacity" : 1598,
"FirstUseDate" : ISODate("2003-11-11T00:00:00Z“) }
11
Lets Use The Shell
MongoDB Enterprise > use vosa
switched to db vosa
MongoDB Enterprise > db.results2013.findOne()
{
"_id" : ObjectId("577294020cb23533dfbaac18"),
"TestID" : 17,
"VehicleID" : 28,
"TestDate" : ISODate("2013-05-02T00:00:00Z"),
"TestClassID" : "2",
"TestType" : "N",
"TestResult" : "P",
"TestMileage" : 46414,
"Postcode" : "BN",
"Make" : "SUZUKI",
"Model" : "UNCLASSIFIED",
"Colour" : "GREEN",
"FuelType" : "P",
"CylinderCapacity" : 398,
"FirstUseDate" : ISODate("1993-08-11T00:00:00Z")
}
12
$limit
MongoDB Enterprise > db.results2013.aggregate([ { "$limit" :2 } ] )
{
"_id" : ObjectId("577294020cb23533dfbaac18"),
"TestID" : 17,
"VehicleID" : 28,
"TestDate" : ISODate("2013-05-02T00:00:00Z"),
"TestClassID" : "2",
"TestType" : "N",
"TestResult" : "P",
"TestMileage" : 46414,
"Postcode" : "BN",
"Make" : "SUZUKI",
"Model" : "UNCLASSIFIED",
…
13
Let’s Make a Small Collection
> db.results2013.aggregate( [ { "$limit" : 10000 }, {"$out" : "results10k" } ] )
> db.results10k.count()
10000
> db.results10k.findOne()
{
"_id" : ObjectId("577294020cb23533dfbaac18"),
"TestID" : 17,
"VehicleID" : 28,
"TestDate" : ISODate("2013-05-02T00:00:00Z"),
"TestClassID" : "2",
"TestType" : "N",
"TestResult" : "P",
"TestMileage" : 46414,
"Postcode" : "BN",
"Make" : "SUZUKI",
"Model" : "UNCLASSIFIED",
"Colour" : "GREEN",
"FuelType" : "P",
"CylinderCapacity" : 398,
"FirstUseDate" : ISODate("1993-08-11T00:00:00Z")
}
14
$match
…aggregate([ { "$limit" :2000 },
{ "$match" : { "FirstUseDate" : { "$ne" : "NULL" }}} ])
{ "_id" : ObjectId("577294020cb23533dfbaac18"), "TestID" : 17, "VehicleID" : 28,
"TestDate" : ISODate("2013-05-02T00:00:00Z"), "TestClassID" : "2", "TestType" :
"N", "TestResult" : "P", "TestMileage" : 46414, "Postcode" : "BN", "Make" :
"SUZUKI", "Model" : "UNCLASSIFIED", "Colour" : "GREEN", "FuelType" : "P",
"CylinderCapacity" : 398, "FirstUseDate" : ISODate("1993-08-11T00:00:00Z") }
{ "_id" : ObjectId("577294020cb23533dfbaac19"), "TestID" : 22, "VehicleID" : 33,
"TestDate" : ISODate("2013-06-07T00:00:00Z"), "TestClassID" : "2", "TestType" :
"N", "TestResult" : "P", "TestMileage" : 15605, "Postcode" : "PE", "Make" :
"UNCLASSIFIED", "Model" : "UNCLASSIFIED", "Colour" : "BLACK", "FuelType" : "P",
"CylinderCapacity" : 150, "FirstUseDate" : ISODate("1962-01-01T00:00:00Z") }
{ "_id" : ObjectId("577294020cb23533dfbaac1a"), "TestID" : 44, "VehicleID" : 49,
"TestDate" : ISODate("2013-08-09T00:00:00Z"), "TestClassID" : "4", "TestType" :
"N", "TestResult" : "PRS", "TestMileage" : 72694, "Postcode" : "SO", "Make" :
"UNCLASSIFIED", "Model" : "UNCLASSIFIED", "Colour" : "BLACK", "FuelType" : "P",
"CylinderCapacity" : 998, "FirstUseDate" : ISODate("2001-05-16T00:00:00Z") }
...
15
$project (1 of 2)
ageinusecs = { "$subtract" : [ "$TestDate", "$FirstUseDate" ] }
ageinyears = { "$divide" :[ ageinusecs , (1000*3600*24*365) ] }
floorage = { "$floor" : ageinyears }
ispass = { "$cond" : [{"$eq": ["$TestResult","P"]},1,0]}
project = { "$project" : { "Make” :1,
"Model” :1,
"VehicleID" :1,
"TestResult” :1,
"Age” :floorage,
"pass” :ispass }}
16
$project (2 of 2)
MongoDB Enterprise > db.nonulldates.aggregate( [ project ] )
{ "_id" : ObjectId("577294020cb23533dfbaac18"), "VehicleID" : 28, "TestResult" : "P", "Make" :
"SUZUKI", "Model" : "UNCLASSIFIED", "Age" : 19, "pass" : 1 }
{ "_id" : ObjectId("577294020cb23533dfbaac19"), "VehicleID" : 33, "TestResult" : "P", "Make" :
"UNCLASSIFIED", "Model" : "UNCLASSIFIED", "Age" : 51, "pass" : 1 }
{ "_id" : ObjectId("577294020cb23533dfbaac1a"), "VehicleID" : 49, "TestResult" : "PRS", "Make"
: "UNCLASSIFIED", "Model" : "UNCLASSIFIED", "Age" : 12, "pass" : 0 }
{ "_id" : ObjectId("577294020cb23533dfbaac1b"), "VehicleID" : 54, "TestResult" : "P", "Make" :
"NISSAN", "Model" : "MICRA GX", "Age" : 13, "pass" : 1 }
{ "_id" : ObjectId("577294020cb23533dfbaac1c"), "VehicleID" : 54, "TestResult" : "F", "Make" :
"UNCLASSIFIED", "Model" : "UNCLASSIFIED", "Age" : 13, "pass" : 0 }
{ "_id" : ObjectId("577294020cb23533dfbaac1d"), "VehicleID" : 63, "TestResult" : "P", "Make" :
"UNCLASSIFIED", "Model" : "UNCLASSIFIED", "Age" : 12, "pass" : 1 }
{ "_id" : ObjectId("577294020cb23533dfbaac1e"), "VehicleID" : 63, "TestResult" : "F", "Make" :
"UNCLASSIFIED", "Model" : "UNCLASSIFIED", "Age" : 12, "pass" : 0 }
{ "_id" : ObjectId("577294020cb23533dfbaac1f"), "VehicleID" : 93, "TestResult" : "P", "Make" :
"BMW", "Model" : "318ti SE COMPACT", "Age" : 12, "pass" : 1 }
…
17
$group
countMakes = { "$group" : {{ "_id" : "$Make"} , "total" : {"$sum" : 1 }}}
db.nonulldates.aggregate( [ countMakes ])
{ "_id" : "IVECO", "total" : 1 }
{ "_id" : "ISUZU", "total" : 1 }
{ "_id" : "YAMAHA", "total" : 1 }
{ "_id" : "OLDSMOBILE", "total" : 1 }
{ "_id" : "KAWASAKI", "total" : 1 }
{ "_id" : "MASERATI", "total" : 1 }
{ "_id" : "BENELLI", "total" : 1 }
{ "_id" : "BENTLEY", "total" : 3 }
{ "_id" : "AUDI", "total" : 26 }
{ "_id" : "SMART", "total" : 2 }
{ "_id" : "HARLEY-DAVIDSON", "total" : 1 }
…
18
Summary
• A pipeline of operations
• Select, project, group, sort
• $out must appear last in an aggregation pipeline
• There are a range of accumulators (see the group by
documentation)
• Very powerful way to reshape and analyse data
• Shard aware to gain maxinum performance for large clusters
Q&A
Back to Basics Webinar 5: Introduction to the Aggregation Framework
21
• This is slide content
Back to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation Framework
25
26
LOREM
IPSUM
LOREM
IPSUM
LOREM
IPSUM
LOREM
IPSUM
Sollicitudin VenenatisLOREM
IPSUM
LOREM
IPSUM
LOREM
IPSUM
LOREM
IPSUM
Graphic Element Examples
Porta Ultricies
Commodo Porta
Graph Examples
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Category 1 Category 2 Category 3 Category 4
Series 1
Series 2
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Category 1 Category 2 Category 3 Category 4
Series 1
Series 2
{
_id : ObjectId("4c4ba5e5e8aabf3"),
employee_name: "Dunham, Justin",
department : "Marketing",
title : "Product Manager, Web",
report_up: "Neray, Graham",
pay_band: “C",
benefits : [
{ type : "Health",
plan : "PPO Plus" },
{ type : "Dental",
plan : "Standard" }
]
}
Code/Highlight Example
Aggregation Framework Agility Backup Big Data Briefcase
Buildings Business Intelligence Camera Cash Register Catalog
Chat Checkmark Checkmark Cloud Commercial Contract
Computer Content Continuous Development Credit Card Customer Success
Data Center Data Variety Data Velocity Data Volume Data Warehouse Database
Dialogue Directory Documents Downloads Drivers Dynamic Schema
EDW Integration Faster Time to Market File Transfer Flexible Gear Hadoop
Health Check High Availability Horizontal Scaling Integrating into Infrastructure Internet of Things Iterative Development
Life Preserver Line Graph Lock Log Data Lower Cost Magnifying Glass
Man Mobile Phone Meter Monitoring Music New Apps
New Data Types Online Open Source Parachute Personalization Pin
Platform Certification Product Catalog Puzzle Pieces RDBMS Realtime Analytics Rich Querying
Life Preserver RSS Scalability Scale Secondary Indexing Steering Wheel
Stopwatch Text Search Tick Data Training Transmission Tower Trophy
Woman World

More Related Content

What's hot (18)

PPTX
Back to Basics: My First MongoDB Application
MongoDB
 
PPTX
Back to Basics Webinar 3: Schema Design Thinking in Documents
MongoDB
 
PDF
MongoDB Performance Tuning
Puneet Behl
 
PPT
Introduction to MongoDB
Nosh Petigara
 
PDF
MongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB
 
PDF
Webinar: Working with Graph Data in MongoDB
MongoDB
 
PPTX
Webinar: Index Tuning and Evaluation
MongoDB
 
PPT
Introduction to MongoDB
antoinegirbal
 
PPTX
Indexing Strategies to Help You Scale
MongoDB
 
PPTX
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
MongoDB
 
PDF
Inside MongoDB: the Internals of an Open-Source Database
Mike Dirolf
 
PPTX
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
MongoDB
 
PPTX
High Performance Applications with MongoDB
MongoDB
 
PPTX
MongoDB - Aggregation Pipeline
Jason Terpko
 
PPTX
Webinar: Back to Basics: Thinking in Documents
MongoDB
 
PPTX
MongoDB + Java - Everything you need to know
Norberto Leite
 
PDF
MongoDB Performance Tuning
MongoDB
 
PPTX
Back to Basics Webinar 2: Your First MongoDB Application
MongoDB
 
Back to Basics: My First MongoDB Application
MongoDB
 
Back to Basics Webinar 3: Schema Design Thinking in Documents
MongoDB
 
MongoDB Performance Tuning
Puneet Behl
 
Introduction to MongoDB
Nosh Petigara
 
MongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB
 
Webinar: Working with Graph Data in MongoDB
MongoDB
 
Webinar: Index Tuning and Evaluation
MongoDB
 
Introduction to MongoDB
antoinegirbal
 
Indexing Strategies to Help You Scale
MongoDB
 
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
MongoDB
 
Inside MongoDB: the Internals of an Open-Source Database
Mike Dirolf
 
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
MongoDB
 
High Performance Applications with MongoDB
MongoDB
 
MongoDB - Aggregation Pipeline
Jason Terpko
 
Webinar: Back to Basics: Thinking in Documents
MongoDB
 
MongoDB + Java - Everything you need to know
Norberto Leite
 
MongoDB Performance Tuning
MongoDB
 
Back to Basics Webinar 2: Your First MongoDB Application
MongoDB
 

Viewers also liked (11)

PPTX
MongoDB for Developers
Ciro Donato Caiazzo
 
PPTX
Beyond the Basics 1: Storage Engines
MongoDB
 
PPTX
Back to Basics Webinar 6: Production Deployment
MongoDB
 
PPTX
Back to Basics Webinar 1: Introduction to NoSQL
MongoDB
 
KEY
OSCON 2012 MongoDB Tutorial
Steven Francia
 
PDF
Mongo db data-models guide
Deysi Gmarra
 
PPTX
Back to Basics, webinar 4: Indicizzazione avanzata, indici testuali e geospaz...
MongoDB
 
PPTX
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
MongoDB
 
PDF
MongoDB Schema Design (Event: An Evening with MongoDB Houston 3/11/15)
MongoDB
 
PDF
Advanced Schema Design Patterns
MongoDB
 
PPTX
Developing with the Modern App Stack: MEAN and MERN (with Angular2 and ReactJS)
MongoDB
 
MongoDB for Developers
Ciro Donato Caiazzo
 
Beyond the Basics 1: Storage Engines
MongoDB
 
Back to Basics Webinar 6: Production Deployment
MongoDB
 
Back to Basics Webinar 1: Introduction to NoSQL
MongoDB
 
OSCON 2012 MongoDB Tutorial
Steven Francia
 
Mongo db data-models guide
Deysi Gmarra
 
Back to Basics, webinar 4: Indicizzazione avanzata, indici testuali e geospaz...
MongoDB
 
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
MongoDB
 
MongoDB Schema Design (Event: An Evening with MongoDB Houston 3/11/15)
MongoDB
 
Advanced Schema Design Patterns
MongoDB
 
Developing with the Modern App Stack: MEAN and MERN (with Angular2 and ReactJS)
MongoDB
 
Ad

Similar to Back to Basics Webinar 5: Introduction to the Aggregation Framework (20)

PDF
R, Scikit-Learn and Apache Spark ML - What difference does it make?
Villu Ruusmann
 
PDF
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Lucidworks
 
PPTX
Joins and Other MongoDB 3.2 Aggregation Enhancements
Andrew Morgan
 
PPTX
S01 e00 einfuehrung-in_mongodb
MongoDB
 
PDF
Elasticsearch intro output
Tom Chen
 
ODP
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
DataStax Academy
 
ODP
Intravert Server side processing for Cassandra
Edward Capriolo
 
PDF
Aggregation Framework MongoDB Days Munich
Norberto Leite
 
PDF
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...
Lucidworks
 
PDF
Learning to rank search results
Jettro Coenradie
 
PDF
Scaling MongoDB; Sharding Into and Beyond the Multi-Terabyte Range
MongoDB
 
PPTX
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
Keshav Murthy
 
PDF
Webinar: Schema Patterns and Your Storage Engine
MongoDB
 
PPTX
Couchbase N1QL: Index Advisor
Keshav Murthy
 
PPTX
N1QL: What's new in Couchbase 5.0
Keshav Murthy
 
PPTX
A miało być tak... bez wycieków
Konrad Kokosa
 
PPTX
Introduction to Azure DocumentDB
Alex Zyl
 
PPTX
Webinar: Position and Trade Management with MongoDB
MongoDB
 
PPTX
MongoDB World 2016 : Advanced Aggregation
Joe Drumgoole
 
PPTX
Connecting Teradata and MongoDB with QueryGrid
MongoDB
 
R, Scikit-Learn and Apache Spark ML - What difference does it make?
Villu Ruusmann
 
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Lucidworks
 
Joins and Other MongoDB 3.2 Aggregation Enhancements
Andrew Morgan
 
S01 e00 einfuehrung-in_mongodb
MongoDB
 
Elasticsearch intro output
Tom Chen
 
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
DataStax Academy
 
Intravert Server side processing for Cassandra
Edward Capriolo
 
Aggregation Framework MongoDB Days Munich
Norberto Leite
 
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...
Lucidworks
 
Learning to rank search results
Jettro Coenradie
 
Scaling MongoDB; Sharding Into and Beyond the Multi-Terabyte Range
MongoDB
 
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
Keshav Murthy
 
Webinar: Schema Patterns and Your Storage Engine
MongoDB
 
Couchbase N1QL: Index Advisor
Keshav Murthy
 
N1QL: What's new in Couchbase 5.0
Keshav Murthy
 
A miało być tak... bez wycieków
Konrad Kokosa
 
Introduction to Azure DocumentDB
Alex Zyl
 
Webinar: Position and Trade Management with MongoDB
MongoDB
 
MongoDB World 2016 : Advanced Aggregation
Joe Drumgoole
 
Connecting Teradata and MongoDB with QueryGrid
MongoDB
 
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 

Recently uploaded (20)

PPTX
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
PPTX
美国史蒂文斯理工学院毕业证书{SIT学费发票SIT录取通知书}哪里购买
Taqyea
 
PDF
GOOGLE ADS (1).pdf THE ULTIMATE GUIDE TO
kushalkeshwanisou
 
PPTX
Data anlytics Hospitals Research India.pptx
SayantanChakravorty2
 
PDF
IT GOVERNANCE 4-1 - Information System Security (1).pdf
mdirfanuddin1322
 
PPTX
big data eco system fundamentals of data science
arivukarasi
 
PPTX
Krezentios memories in college data.pptx
notknown9
 
PDF
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
PDF
2025 Global Data Summit - FOM with AI.pdf
Marco Wobben
 
PPTX
Presentation.pptx hhgihyugyygyijguuffddfffffff
abhiruppal2007
 
PDF
IT GOVERNANCE 4-2 - Information System Security (1).pdf
mdirfanuddin1322
 
PDF
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 
PPTX
BinarySearchTree in datastructures in detail
kichokuttu
 
PPTX
在线购买英国本科毕业证苏格兰皇家音乐学院水印成绩单RSAMD学费发票
Taqyea
 
PDF
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
PDF
Loading Data into Snowflake (Bulk & Stream)
Accentfuture
 
PDF
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
PDF
SQL for Accountants and Finance Managers
ysmaelreyes
 
PDF
5- Global Demography Concepts _ Population Pyramids .pdf
pkhadka824
 
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
美国史蒂文斯理工学院毕业证书{SIT学费发票SIT录取通知书}哪里购买
Taqyea
 
GOOGLE ADS (1).pdf THE ULTIMATE GUIDE TO
kushalkeshwanisou
 
Data anlytics Hospitals Research India.pptx
SayantanChakravorty2
 
IT GOVERNANCE 4-1 - Information System Security (1).pdf
mdirfanuddin1322
 
big data eco system fundamentals of data science
arivukarasi
 
Krezentios memories in college data.pptx
notknown9
 
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
2025 Global Data Summit - FOM with AI.pdf
Marco Wobben
 
Presentation.pptx hhgihyugyygyijguuffddfffffff
abhiruppal2007
 
IT GOVERNANCE 4-2 - Information System Security (1).pdf
mdirfanuddin1322
 
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 
BinarySearchTree in datastructures in detail
kichokuttu
 
在线购买英国本科毕业证苏格兰皇家音乐学院水印成绩单RSAMD学费发票
Taqyea
 
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
Loading Data into Snowflake (Bulk & Stream)
Accentfuture
 
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
SQL for Accountants and Finance Managers
ysmaelreyes
 
5- Global Demography Concepts _ Population Pyramids .pdf
pkhadka824
 

Back to Basics Webinar 5: Introduction to the Aggregation Framework

  • 1. MongoDBEurope2016 Old Billingsgate, London 15th November mongodb.com/europe Register with code JD20 for a 20% discount
  • 2. Back to Basics 2016 : Webinar 5 Introduction to the Aggregation Framework Joe Drumgoole Director of Developer Advocacy, EMEA @jdrumgoole V1.1
  • 3. 3 Recap • Webinar 1 – Introduction to NoSQL – The different types of NoSQL databases – MongoDB is a document database • Webinar 2 – My First Application – Creating databases and collections – CRUD, Indexes and Explain • Webinar 3 – Schema Design – Dynamic schema – Embedding approaches • Webinar 4 – GeoSpatial and Text Indexes
  • 4. 4 The Aggregation Framework • An analytics engine for MongoDB • What is analytics? • Think of the two types of database, OLTP, OLAP • OLTP : Online Transaction Processing – airline booking, – ATMs, – Taxi booking • OLAP : Online Analytical Processing – Which tickets make us most money? – When do we need to refill our ATMs? – How many cabs do we need to service the West End of London?
  • 5. 5 What Does This Look Like? OLTP OLAP
  • 6. 6 OLAP - There Be (Hadoop) Dragons Here • OLAP queries are often table scans • The output of the queries is often stored for future analysis and comparison • Many customers are looking at Spark and Hadoop for OLAP but: – Complexity is astronomical – Focused on algorithmic analysis of data (you gotta write a program) – Requires some significant knowledge of parallel alogrithms and parallel processing • The aggregation framework is a kinder gentler tool  • May do what you want in less time with less anguish
  • 7. 7 The Aggregation Framework – A Processing Pipeline Match Project Group SortLimit • Think unix pipeline • The output of one stage is passed to the input of the next stage • Each stage performs one job • Stages can be repeated • The input is a single collection
  • 8. 8 Pipeline Operators • $match Filter documents • $project Reshape documents • $group Summarize documents • $out Create new collections • $sort Order documents • $limit/$skip Paginate documents • $lookup Join two collections togther • $unwind Expand an array
  • 10. 10 Example Document { "_id" : ObjectId("5759ee6e8684975e1098af68"), "TestID" : 400, "VehicleID" : "278", "TestDate" : ISODate("2013-04-23T00:00:00Z"), "TestClassID" : "4", "TestType" : "N", "TestResult" : "P", "TestMileage" : 99284, "Postcode" : "E", "Make" : "AUDI", "Model" : "A3", "Colour" : "BLACK", "FuelType" : "P", "CylinderCapacity" : 1598, "FirstUseDate" : ISODate("2003-11-11T00:00:00Z“) }
  • 11. 11 Lets Use The Shell MongoDB Enterprise > use vosa switched to db vosa MongoDB Enterprise > db.results2013.findOne() { "_id" : ObjectId("577294020cb23533dfbaac18"), "TestID" : 17, "VehicleID" : 28, "TestDate" : ISODate("2013-05-02T00:00:00Z"), "TestClassID" : "2", "TestType" : "N", "TestResult" : "P", "TestMileage" : 46414, "Postcode" : "BN", "Make" : "SUZUKI", "Model" : "UNCLASSIFIED", "Colour" : "GREEN", "FuelType" : "P", "CylinderCapacity" : 398, "FirstUseDate" : ISODate("1993-08-11T00:00:00Z") }
  • 12. 12 $limit MongoDB Enterprise > db.results2013.aggregate([ { "$limit" :2 } ] ) { "_id" : ObjectId("577294020cb23533dfbaac18"), "TestID" : 17, "VehicleID" : 28, "TestDate" : ISODate("2013-05-02T00:00:00Z"), "TestClassID" : "2", "TestType" : "N", "TestResult" : "P", "TestMileage" : 46414, "Postcode" : "BN", "Make" : "SUZUKI", "Model" : "UNCLASSIFIED", …
  • 13. 13 Let’s Make a Small Collection > db.results2013.aggregate( [ { "$limit" : 10000 }, {"$out" : "results10k" } ] ) > db.results10k.count() 10000 > db.results10k.findOne() { "_id" : ObjectId("577294020cb23533dfbaac18"), "TestID" : 17, "VehicleID" : 28, "TestDate" : ISODate("2013-05-02T00:00:00Z"), "TestClassID" : "2", "TestType" : "N", "TestResult" : "P", "TestMileage" : 46414, "Postcode" : "BN", "Make" : "SUZUKI", "Model" : "UNCLASSIFIED", "Colour" : "GREEN", "FuelType" : "P", "CylinderCapacity" : 398, "FirstUseDate" : ISODate("1993-08-11T00:00:00Z") }
  • 14. 14 $match …aggregate([ { "$limit" :2000 }, { "$match" : { "FirstUseDate" : { "$ne" : "NULL" }}} ]) { "_id" : ObjectId("577294020cb23533dfbaac18"), "TestID" : 17, "VehicleID" : 28, "TestDate" : ISODate("2013-05-02T00:00:00Z"), "TestClassID" : "2", "TestType" : "N", "TestResult" : "P", "TestMileage" : 46414, "Postcode" : "BN", "Make" : "SUZUKI", "Model" : "UNCLASSIFIED", "Colour" : "GREEN", "FuelType" : "P", "CylinderCapacity" : 398, "FirstUseDate" : ISODate("1993-08-11T00:00:00Z") } { "_id" : ObjectId("577294020cb23533dfbaac19"), "TestID" : 22, "VehicleID" : 33, "TestDate" : ISODate("2013-06-07T00:00:00Z"), "TestClassID" : "2", "TestType" : "N", "TestResult" : "P", "TestMileage" : 15605, "Postcode" : "PE", "Make" : "UNCLASSIFIED", "Model" : "UNCLASSIFIED", "Colour" : "BLACK", "FuelType" : "P", "CylinderCapacity" : 150, "FirstUseDate" : ISODate("1962-01-01T00:00:00Z") } { "_id" : ObjectId("577294020cb23533dfbaac1a"), "TestID" : 44, "VehicleID" : 49, "TestDate" : ISODate("2013-08-09T00:00:00Z"), "TestClassID" : "4", "TestType" : "N", "TestResult" : "PRS", "TestMileage" : 72694, "Postcode" : "SO", "Make" : "UNCLASSIFIED", "Model" : "UNCLASSIFIED", "Colour" : "BLACK", "FuelType" : "P", "CylinderCapacity" : 998, "FirstUseDate" : ISODate("2001-05-16T00:00:00Z") } ...
  • 15. 15 $project (1 of 2) ageinusecs = { "$subtract" : [ "$TestDate", "$FirstUseDate" ] } ageinyears = { "$divide" :[ ageinusecs , (1000*3600*24*365) ] } floorage = { "$floor" : ageinyears } ispass = { "$cond" : [{"$eq": ["$TestResult","P"]},1,0]} project = { "$project" : { "Make” :1, "Model” :1, "VehicleID" :1, "TestResult” :1, "Age” :floorage, "pass” :ispass }}
  • 16. 16 $project (2 of 2) MongoDB Enterprise > db.nonulldates.aggregate( [ project ] ) { "_id" : ObjectId("577294020cb23533dfbaac18"), "VehicleID" : 28, "TestResult" : "P", "Make" : "SUZUKI", "Model" : "UNCLASSIFIED", "Age" : 19, "pass" : 1 } { "_id" : ObjectId("577294020cb23533dfbaac19"), "VehicleID" : 33, "TestResult" : "P", "Make" : "UNCLASSIFIED", "Model" : "UNCLASSIFIED", "Age" : 51, "pass" : 1 } { "_id" : ObjectId("577294020cb23533dfbaac1a"), "VehicleID" : 49, "TestResult" : "PRS", "Make" : "UNCLASSIFIED", "Model" : "UNCLASSIFIED", "Age" : 12, "pass" : 0 } { "_id" : ObjectId("577294020cb23533dfbaac1b"), "VehicleID" : 54, "TestResult" : "P", "Make" : "NISSAN", "Model" : "MICRA GX", "Age" : 13, "pass" : 1 } { "_id" : ObjectId("577294020cb23533dfbaac1c"), "VehicleID" : 54, "TestResult" : "F", "Make" : "UNCLASSIFIED", "Model" : "UNCLASSIFIED", "Age" : 13, "pass" : 0 } { "_id" : ObjectId("577294020cb23533dfbaac1d"), "VehicleID" : 63, "TestResult" : "P", "Make" : "UNCLASSIFIED", "Model" : "UNCLASSIFIED", "Age" : 12, "pass" : 1 } { "_id" : ObjectId("577294020cb23533dfbaac1e"), "VehicleID" : 63, "TestResult" : "F", "Make" : "UNCLASSIFIED", "Model" : "UNCLASSIFIED", "Age" : 12, "pass" : 0 } { "_id" : ObjectId("577294020cb23533dfbaac1f"), "VehicleID" : 93, "TestResult" : "P", "Make" : "BMW", "Model" : "318ti SE COMPACT", "Age" : 12, "pass" : 1 } …
  • 17. 17 $group countMakes = { "$group" : {{ "_id" : "$Make"} , "total" : {"$sum" : 1 }}} db.nonulldates.aggregate( [ countMakes ]) { "_id" : "IVECO", "total" : 1 } { "_id" : "ISUZU", "total" : 1 } { "_id" : "YAMAHA", "total" : 1 } { "_id" : "OLDSMOBILE", "total" : 1 } { "_id" : "KAWASAKI", "total" : 1 } { "_id" : "MASERATI", "total" : 1 } { "_id" : "BENELLI", "total" : 1 } { "_id" : "BENTLEY", "total" : 3 } { "_id" : "AUDI", "total" : 26 } { "_id" : "SMART", "total" : 2 } { "_id" : "HARLEY-DAVIDSON", "total" : 1 } …
  • 18. 18 Summary • A pipeline of operations • Select, project, group, sort • $out must appear last in an aggregation pipeline • There are a range of accumulators (see the group by documentation) • Very powerful way to reshape and analyse data • Shard aware to gain maxinum performance for large clusters
  • 19. Q&A
  • 21. 21 • This is slide content
  • 25. 25
  • 26. 26
  • 28. Porta Ultricies Commodo Porta Graph Examples 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Category 1 Category 2 Category 3 Category 4 Series 1 Series 2
  • 29. 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Category 1 Category 2 Category 3 Category 4 Series 1 Series 2
  • 30. { _id : ObjectId("4c4ba5e5e8aabf3"), employee_name: "Dunham, Justin", department : "Marketing", title : "Product Manager, Web", report_up: "Neray, Graham", pay_band: “C", benefits : [ { type : "Health", plan : "PPO Plus" }, { type : "Dental", plan : "Standard" } ] } Code/Highlight Example
  • 31. Aggregation Framework Agility Backup Big Data Briefcase Buildings Business Intelligence Camera Cash Register Catalog Chat Checkmark Checkmark Cloud Commercial Contract Computer Content Continuous Development Credit Card Customer Success
  • 32. Data Center Data Variety Data Velocity Data Volume Data Warehouse Database Dialogue Directory Documents Downloads Drivers Dynamic Schema EDW Integration Faster Time to Market File Transfer Flexible Gear Hadoop Health Check High Availability Horizontal Scaling Integrating into Infrastructure Internet of Things Iterative Development
  • 33. Life Preserver Line Graph Lock Log Data Lower Cost Magnifying Glass Man Mobile Phone Meter Monitoring Music New Apps New Data Types Online Open Source Parachute Personalization Pin Platform Certification Product Catalog Puzzle Pieces RDBMS Realtime Analytics Rich Querying
  • 34. Life Preserver RSS Scalability Scale Secondary Indexing Steering Wheel Stopwatch Text Search Tick Data Training Transmission Tower Trophy Woman World

Editor's Notes

  • #3: Who I am, how long have I been at MongoDB.