SlideShare a Scribd company logo
O C T O B E R 1 2 , 2 0 1 7 | B E S P O K E | S A N F R A N C I S C O
# M D B l o c a l
ETL for Pros
Getting Data into
MongoDB
# M D B l o c a l
Principal
Consulting
Engineer
André Spiegel
MongoDB @drmirror
# M D B l o c a l
Remember this?
# M D B l o c a l
Sound familiar?
At some point, most
applications
need to batch-load large
amounts of data
• billions of documents
• huge initial load
• daily updates
# M D B l o c a l
Using MongoDB properly
means complex documents
Sound familiar? {
"_id" : "admin.mongo_dba",
"user" : "mongo_dba",
"db" : "admin",
"roles" : [
{ "role" : "root", "db" : "admin" },
{ "role" : "restore", "db" : "admin" }
]
}
[
{ "$sort" : { "st": 1 } },
{
"$group" : { "_id" : "$st",
"start" : { "$first" : "$ts" },
"end" : { "$last" : "$ts" } }
}
]
# M D B l o c a l
How do I create these documents from relational tables?
Sound familiar?
# M D B l o c a l
Sound familiar?
How do I do it fast?
Image: Julian Lim
# M D B l o c a l
I've done this for a few years
I've seen people do it
We all make the same mistakes
Let's understand them and come up with something better
# M D B l o c a l
Case Study
ORDER
S
TRACKIN
G
ITEM
S
I
D
FIRST_NA
ME
LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
I
D
ORDER_I
D
QT
Y
DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad
1,000,00
0
ORDER_I
D
TIMESTAMP STATUS
1
1985-04-30
09:48:00
ORDERED
2
1985-04-23
01:30:22
ORDERED
2
1985-04-25
08:30:00
SHIPPED
2
1985-05-14
21:37:00
DELIVERED
ORDER
S
TRACKIN
G
ITEM
S
I
D
FIRST_NA
ME
LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
I
D
ORDER_I
D
QT
Y
DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad
1,000,00
0
ORDER_I
D
TIMESTAMP STATUS
1
1985-04-30
09:48:00
ORDERED
2
1985-04-23
01:30:22
ORDERED
2
1985-04-25
08:30:00
SHIPPED
2
1985-05-14
21:37:00
DELIVERED
ORDER
S
TRACKIN
G
ITEM
S
I
D
FIRST_NA
ME
LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
I
D
ORDER_I
D
QT
Y
DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad
1,000,00
0
ORDER_I
D
TIMESTAMP STATUS
1
1985-04-30
09:48:00
ORDERED
2
1985-04-23
01:30:22
ORDERED
2
1985-04-25
08:30:00
SHIPPED
2
1985-05-14
21:37:00
DELIVERED
{
"first_name" : "James",
"last_name" : "Bond",
"address" : "Nassau, Bahamas, US",
"items" : [
{ "qty": 1, "description" : "Aston Martin", "price" : 120000 },
{ "qty": 1, "description" : "Dinner Jacket", "price" : 4000 },
{ "qty": 3, "description" : "Champagne Veuve-Cliquot", "price": 200 }
],
"tracking" : [
{ "timestamp" : "1985-04-30 09:48:00", "status": "ORDERED" }
]
}
{
"first_name" : "James",
"last_name" : "Bond",
"address" : "Nassau, Bahamas, US",
"items" : [
{ "qty": 1, "description" : "Aston Martin", "price" : 120000 },
{ "qty": 1, "description" : "Dinner Jacket", "price" : 4000 },
{ "qty": 3, "description" : "Champagne Veuve-Cliquot", "price": 200 }
],
"tracking" : [
{ "timestamp" : "1985-04-30 09:48:00", "status": "ORDERED" }
]
}
{
"first_name" : "James",
"last_name" : "Bond",
"address" : "Nassau, Bahamas, US",
"items" : [
{ "qty": 1, "description" : "Aston Martin", "price" : 120000 },
{ "qty": 1, "description" : "Dinner Jacket", "price" : 4000 },
{ "qty": 3, "description" : "Champagne Veuve-Cliquot", "price": 200 }
],
"tracking" : [
{ "timestamp" : "1985-04-30 09:48:00", "status": "ORDERED" }
]
}
{
"first_name" : "James",
"last_name" : "Bond",
"address" : "Nassau, Bahamas, US",
"items" : [
{ "qty": 1, "description" : "Aston Martin", "price" : 120000 },
{ "qty": 1, "description" : "Dinner Jacket", "price" : 4000 },
{ "qty": 3, "description" : "Champagne Veuve-Cliquot", "price": 200 }
],
"tracking" : [
{ "timestamp" : "1985-04-30 09:48:00", "status": "ORDERED" }
]
}
# M D B l o c a l
ETL Tools: Talend, Pentaho,
Informatica, ...
• Gretchen's Question:
How do you handle arrays?
How do I get from relational to JSON?
# M D B l o c a l
WYOC (Write Your
Own Code)
• More challenging,
but you've got
ultimate control
How do I get from relational to JSON?
# M D B l o c a l
• Any operation in the CPU is on the order of nanoseconds:
0.000 000
001s
• typically tens of nanoseconds per high-level operation
• Any roundtrip to the database is on the order of milliseconds:
0.001s
• typically just under 1 millisecond at the minimum
• mostly due to network protocol stack latency
• faster networks don't help
• in-memory storage does not help
Orders of Magnitude
# M D B l o c a l
A Gallery of Mistakes
ORDER
S
TRACKIN
G
ITEM
S
I
D
FIRST_NA
ME
LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
I
D
ORDER_I
D
QT
Y
DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad
1,000,00
0
ORDER_I
D
TIMESTAMP STATUS
1
1985-04-30
09:48:00
ORDERED
2
1985-04-23
01:30:22
ORDERED
2
1985-04-25
08:30:00
SHIPPED
2
1985-05-14
21:37:00
DELIVERED
# M D B l o c a l
Mistake #1 – Nested queries
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
for y in SELECT * FROM ITEMS WHERE ORDER_ID = x.order_id
doc.items.push (y)
for z in SELECT * FROM TRACKING WHERE ORDER_ID = x.order_id
doc.tracking.push (y)
mongodb.insert (doc)
# M D B l o c a l
Mistake #1 – Nested queries
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
for y in SELECT * FROM ITEMS WHERE ORDER_ID = x.order_id
doc.items.push (y)
for z in SELECT * FROM TRACKING WHERE ORDER_ID = x.order_id
doc.tracking.push (y)
mongodb.insert (doc)
# M D B l o c a l
Mistake #1 – Nested queries
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
for y in SELECT * FROM ITEMS WHERE ORDER_ID = x.order_id
doc.items.push (y)
for z in SELECT * FROM TRACKING WHERE ORDER_ID = x.order_id
doc.tracking.push (y)
mongodb.insert (doc)
# M D B l o c a l
Mistake #1 – Nested queries
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
for y in SELECT * FROM ITEMS WHERE ORDER_ID = x.order_id
doc.items.push (y)
for z in SELECT * FROM TRACKING WHERE ORDER_ID = x.order_id
doc.tracking.push (y)
mongodb.insert (doc)
# M D B l o c a l
Mistake #1 – Nested queries
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
for y in SELECT * FROM ITEMS WHERE ORDER_ID = x.order_id
doc.items.push (y)
for z in SELECT * FROM TRACKING WHERE ORDER_ID = x.order_id
doc.tracking.push (y)
mongodb.insert (doc)
# M D B l o c a l
Mistake #1 – Nested queries
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
for y in SELECT * FROM ITEMS WHERE ORDER_ID = x.order_id
doc.items.push (y)
for z in SELECT * FROM TRACKING WHERE ORDER_ID = x.order_id
doc.tracking.push (y)
mongodb.insert (doc)
# M D B l o c a l
Mistake #1 – Nested queries
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
for y in SELECT * FROM ITEMS WHERE ORDER_ID = x.order_id
doc.items.push (y)
for z in SELECT * FROM TRACKING WHERE ORDER_ID = x.order_id
doc.tracking.push (y)
mongodb.insert (doc)
# M D B l o c a l
Results
14.5
0
2
4
6
8
10
12
14
16
Time (min)
Nested Queries
• 1 million orders
• 10 million line items
• 3 million tracking states
• MySQL (local) to MongoDB (local)
• Python
# M D B l o c a l
Fan-In and Fan-out
ETL Job
Number of Database Operations per MongoDB Document
1/n + 2 1
# M D B l o c a l
Mistake #2 – Build documents in DB
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
mongodb.insert (doc)
for y in SELECT * FROM ITEMS
mongodb.update ({"_id" : y.order_id},
{"$push" : {"items" : y}})
for z in SELECT * FROM TRACKING
mongodb.update ({"_id" : z.order_id},
{"$push" : {"tracking" : z}})
# M D B l o c a l
Mistake #2 – Build documents in DB
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
mongodb.insert (doc)
for y in SELECT * FROM ITEMS
mongodb.update ({"_id" : y.order_id},
{"$push" : {"items" : y}})
for z in SELECT * FROM TRACKING
mongodb.update ({"_id" : z.order_id},
{"$push" : {"tracking" : z}})
# M D B l o c a l
Mistake #2 – Build documents in DB
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
mongodb.insert (doc)
for y in SELECT * FROM ITEMS
mongodb.update ({"_id" : y.order_id},
{"$push" : {"items" : y}})
for z in SELECT * FROM TRACKING
mongodb.update ({"_id" : z.order_id},
{"$push" : {"tracking" : z}})
# M D B l o c a l
Mistake #2 – Build documents in DB
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
mongodb.insert (doc)
for y in SELECT * FROM ITEMS
mongodb.update ({"_id" : y.order_id},
{"$push" : {"items" : y}})
for z in SELECT * FROM TRACKING
mongodb.update ({"_id" : z.order_id},
{"$push" : {"tracking" : z}})
# M D B l o c a l
Mistake #2 – Build documents in DB
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
mongodb.insert (doc)
for y in SELECT * FROM ITEMS
mongodb.update ({"_id" : y.order_id},
{"$push" : {"items" : y}})
for z in SELECT * FROM TRACKING
mongodb.update ({"_id" : z.order_id},
{"$push" : {"tracking" : z}})
# M D B l o c a l
Mistake #2 – Build documents in DB
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
mongodb.insert (doc)
for y in SELECT * FROM ITEMS
mongodb.update ({"_id" : y.order_id},
{"$push" : {"items" : y}})
for z in SELECT * FROM TRACKING
mongodb.update ({"_id" : z.order_id},
{"$push" : {"tracking" : z}})
# M D B l o c a l
Mistake #2 – Build documents in DB
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
mongodb.insert (doc)
for y in SELECT * FROM ITEMS
mongodb.update ({"_id" : y.order_id},
{"$push" : {"items" : y}})
for z in SELECT * FROM TRACKING
mongodb.update ({"_id" : z.order_id},
{"$push" : {"tracking" : z}})
# M D B l o c a l
Fan-In and Fan-out
ETL Job
Number of Database Operations per MongoDB Document
3/n 1 + p + q
# M D B l o c a l
Results
14.5
95.9
0
20
40
60
80
100
120
Time (min)
Nested Queries Build in DB
# M D B l o c a l
Mistake #3 – Load it all into memory
db_items = SELECT * FROM ITEMS
db_tracking = SELECT * FROM TRACKING
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
doc.items.pushAll (db_items.getAll(x.order_id))
doc.tracking.pushAll (db_tracking.getAll(x.order_id))
mongodb.insert (doc)
# M D B l o c a l
Mistake #3 – Load it all into memory
db_items = SELECT * FROM ITEMS
db_tracking = SELECT * FROM TRACKING
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
doc.items.pushAll (db_items.getAll(x.order_id))
doc.tracking.pushAll (db_tracking.getAll(x.order_id))
mongodb.insert (doc)
# M D B l o c a l
Mistake #3 – Load it all into memory
db_items = SELECT * FROM ITEMS
db_tracking = SELECT * FROM TRACKING
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
doc.items.pushAll (db_items.getAll(x.order_id))
doc.tracking.pushAll (db_tracking.getAll(x.order_id))
mongodb.insert (doc)
# M D B l o c a l
Mistake #3 – Load it all into memory
db_items = SELECT * FROM ITEMS
db_tracking = SELECT * FROM TRACKING
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
doc.items.pushAll (db_items.getAll(x.order_id))
doc.tracking.pushAll (db_tracking.getAll(x.order_id))
mongodb.insert (doc)
# M D B l o c a l
Mistake #3 – Load it all into memory
db_items = SELECT * FROM ITEMS
db_tracking = SELECT * FROM TRACKING
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
doc.items.pushAll (db_items.getAll(x.order_id))
doc.tracking.pushAll (db_tracking.getAll(x.order_id))
mongodb.insert (doc)
# M D B l o c a l
Mistake #3 – Load it all into memory
db_items = SELECT * FROM ITEMS
db_tracking = SELECT * FROM TRACKING
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
doc.items.pushAll (db_items.getAll(x.order_id))
doc.tracking.pushAll (db_tracking.getAll(x.order_id))
mongodb.insert (doc)
# M D B l o c a l
Fan-In and Fan-out
ETL Job
Number of Database Operations per MongoDB Document
3/n 1
# M D B l o c a l
Results
14.5
95.9
8.5
0
20
40
60
80
100
120
Time (min)
Nested Queries Build in DB Lookup from Memory
# M D B l o c a l
Getting it right:
Co-Iteration
ORDER
S
TRACKIN
G
ITEM
S
I
D
FIRST_NA
ME
LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
I
D
ORDER_I
D
QT
Y
DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad
1,000,00
0
ORDER_I
D
TIMESTAMP STATUS
1
1985-04-30
09:48:00
ORDERED
2
1985-04-23
01:30:22
ORDERED
2
1985-04-25
08:30:00
SHIPPED
2
1985-05-14
21:37:00
DELIVERED
ORDER
S
TRACKIN
G
ITEM
S
I
D
FIRST_NA
ME
LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
I
D
ORDER_I
D
QT
Y
DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad
1,000,00
0
ORDER_I
D
TIMESTAMP STATUS
1
1985-04-30
09:48:00
ORDERED
2
1985-04-23
01:30:22
ORDERED
2
1985-04-25
08:30:00
SHIPPED
2
1985-05-14
21:37:00
DELIVERED
ORDER
S
TRACKIN
G
ITEM
S
I
D
FIRST_NA
ME
LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
I
D
ORDER_I
D
QT
Y
DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad
1,000,00
0
ORDER_I
D
TIMESTAMP STATUS
1
1985-04-30
09:48:00
ORDERED
2
1985-04-23
01:30:22
ORDERED
2
1985-04-25
08:30:00
SHIPPED
2
1985-05-14
21:37:00
DELIVERED
{
"first_name" : "James",
"last_name" : "Bond",
"address" : "Nassau, Bahamas, US"
}
ORDER
S
TRACKIN
G
ITEM
S
I
D
FIRST_NA
ME
LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
I
D
ORDER_I
D
QT
Y
DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad
1,000,00
0
ORDER_I
D
TIMESTAMP STATUS
1
1985-04-30
09:48:00
ORDERED
2
1985-04-23
01:30:22
ORDERED
2
1985-04-25
08:30:00
SHIPPED
2
1985-05-14
21:37:00
DELIVERED
{
"first_name" : "James",
"last_name" : "Bond",
"address" : "Nassau, Bahamas, US",
"items" : [
{ ..., "description" : "Aston Martin", ... }
]
}
ORDER
S
TRACKIN
G
ITEM
S
I
D
FIRST_NA
ME
LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
I
D
ORDER_I
D
QT
Y
DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad
1,000,00
0
ORDER_I
D
TIMESTAMP STATUS
1
1985-04-30
09:48:00
ORDERED
2
1985-04-23
01:30:22
ORDERED
2
1985-04-25
08:30:00
SHIPPED
2
1985-05-14
21:37:00
DELIVERED
{
"first_name" : "James",
"last_name" : "Bond",
"address" : "Nassau, Bahamas, US",
"items" : [
{ ..., "description" : "Aston Martin", ... },
{ ..., "description" : "Dinner Jacket", ... }
]
}
ORDER
S
TRACKIN
G
ITEM
S
I
D
FIRST_NA
ME
LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
I
D
ORDER_I
D
QT
Y
DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad
1,000,00
0
ORDER_I
D
TIMESTAMP STATUS
1
1985-04-30
09:48:00
ORDERED
2
1985-04-23
01:30:22
ORDERED
2
1985-04-25
08:30:00
SHIPPED
2
1985-05-14
21:37:00
DELIVERED
{
"first_name" : "James",
"last_name" : "Bond",
"address" : "Nassau, Bahamas, US",
"items" : [
{ ..., "description" : "Aston Martin", ... },
{ ..., "description" : "Dinner Jacket", ... },
{ ..., "description" : "Champagne...", ... }
]
}
ORDER
S
TRACKIN
G
ITEM
S
I
D
FIRST_NA
ME
LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
I
D
ORDER_I
D
QT
Y
DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad
1,000,00
0
ORDER_I
D
TIMESTAMP STATUS
1
1985-04-30
09:48:00
ORDERED
2
1985-04-23
01:30:22
ORDERED
2
1985-04-25
08:30:00
SHIPPED
2
1985-05-14
21:37:00
DELIVERED
{
"first_name" : "James",
"last_name" : "Bond",
"address" : "Nassau, Bahamas, US",
"items" : [
{ ..., "description" : "Aston Martin", ... },
{ ..., "description" : "Dinner Jacket", ... },
{ ..., "description" : "Champagne...", ... }
]
}
ORDER
S
TRACKIN
G
ITEM
S
I
D
FIRST_NA
ME
LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
I
D
ORDER_I
D
QT
Y
DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad
1,000,00
0
ORDER_I
D
TIMESTAMP STATUS
1
1985-04-30
09:48:00
ORDERED
2
1985-04-23
01:30:22
ORDERED
2
1985-04-25
08:30:00
SHIPPED
2
1985-05-14
21:37:00
DELIVERED
{
"first_name" : "James",
"last_name" : "Bond",
"address" : "Nassau, Bahamas, US",
"items" : [
{ ..., "description" : "Aston Martin", ... },
{ ..., "description" : "Dinner Jacket", ... },
{ ..., "description" : "Champagne...", ... }
],
"tracking" : [
{ ... "1985-04-30 09:48:00", ... "ORDERED" }
]
}
ORDER
S
TRACKIN
G
ITEM
S
I
D
FIRST_NA
ME
LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
I
D
ORDER_I
D
QT
Y
DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad
1,000,00
0
ORDER_I
D
TIMESTAMP STATUS
1
1985-04-30
09:48:00
ORDERED
2
1985-04-23
01:30:22
ORDERED
2
1985-04-25
08:30:00
SHIPPED
2
1985-05-14
21:37:00
DELIVERED
{
"first_name" : "James",
"last_name" : "Bond",
"address" : "Nassau, Bahamas, US",
"items" : [
{ ..., "description" : "Aston Martin", ... },
{ ..., "description" : "Dinner Jacket", ... },
{ ..., "description" : "Champagne...", ... }
],
"tracking" : [
{ ... "1985-04-30 09:48:00", ... "ORDERED" }
]
}
ORDER
S
TRACKIN
G
ITEM
S
I
D
FIRST_NA
ME
LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
I
D
ORDER_I
D
QT
Y
DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad
1,000,00
0
ORDER_I
D
TIMESTAMP STATUS
1
1985-04-30
09:48:00
ORDERED
2
1985-04-23
01:30:22
ORDERED
2
1985-04-25
08:30:00
SHIPPED
2
1985-05-14
21:37:00
DELIVERED
ORDER
S
TRACKIN
G
ITEM
S
I
D
FIRST_NA
ME
LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
I
D
ORDER_I
D
QT
Y
DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad
1,000,00
0
ORDER_I
D
TIMESTAMP STATUS
1
1985-04-30
09:48:00
ORDERED
2
1985-04-23
01:30:22
ORDERED
2
1985-04-25
08:30:00
SHIPPED
2
1985-05-14
21:37:00
DELIVERED
ORDER
S
TRACKIN
G
ITEM
S
I
D
FIRST_NA
ME
LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
I
D
ORDER_I
D
QT
Y
DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad
1,000,00
0
ORDER_I
D
TIMESTAMP STATUS
1
1985-04-30
09:48:00
ORDERED
2
1985-04-23
01:30:22
ORDERED
2
1985-04-25
08:30:00
SHIPPED
2
1985-05-14
21:37:00
DELIVERED
{
"first_name" : "Ernst",
"last_name" : "Blofeldt",
"address" : "Caracas, Venezuela"
}
ORDER
S
TRACKIN
G
ITEM
S
I
D
FIRST_NA
ME
LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
I
D
ORDER_I
D
QT
Y
DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad
1,000,00
0
ORDER_I
D
TIMESTAMP STATUS
1
1985-04-30
09:48:00
ORDERED
2
1985-04-23
01:30:22
ORDERED
2
1985-04-25
08:30:00
SHIPPED
2
1985-05-14
21:37:00
DELIVERED
{
"first_name" : "Ernst",
"last_name" : "Blofeldt",
"address" : "Caracas, Venezuela",
"items" : [
{ ..., "description" : "Cat Food", ... }
]
}
ORDER
S
TRACKIN
G
ITEM
S
I
D
FIRST_NA
ME
LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
I
D
ORDER_I
D
QT
Y
DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad
1,000,00
0
ORDER_I
D
TIMESTAMP STATUS
1
1985-04-30
09:48:00
ORDERED
2
1985-04-23
01:30:22
ORDERED
2
1985-04-25
08:30:00
SHIPPED
2
1985-05-14
21:37:00
DELIVERED
{
"first_name" : "Ernst",
"last_name" : "Blofeldt",
"address" : "Caracas, Venezuela",
"items" : [
{ ..., "description" : "Cat Food", ... },
{ ..., "description" : "Launch Pad", ... }
]
}
ORDER
S
TRACKIN
G
ITEM
S
I
D
FIRST_NA
ME
LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
I
D
ORDER_I
D
QT
Y
DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad
1,000,00
0
ORDER_I
D
TIMESTAMP STATUS
1
1985-04-30
09:48:00
ORDERED
2
1985-04-23
01:30:22
ORDERED
2
1985-04-25
08:30:00
SHIPPED
2
1985-05-14
21:37:00
DELIVERED
{
"first_name" : "Ernst",
"last_name" : "Blofeldt",
"address" : "Caracas, Venezuela",
"items" : [
{ ..., "description" : "Cat Food", ... },
{ ..., "description" : "Launch Pad", ... }
]
}
ORDER
S
TRACKIN
G
ITEM
S
I
D
FIRST_NA
ME
LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
I
D
ORDER_I
D
QT
Y
DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad
1,000,00
0
ORDER_I
D
TIMESTAMP STATUS
1
1985-04-30
09:48:00
ORDERED
2
1985-04-23
01:30:22
ORDERED
2
1985-04-25
08:30:00
SHIPPED
2
1985-05-14
21:37:00
DELIVERED
{
"first_name" : "Ernst",
"last_name" : "Blofeldt",
"address" : "Caracas, Venezuela",
"items" : [
{ ..., "description" : "Cat Food", ... },
{ ..., "description" : "Launch Pad", ... }
],
"tracking" : [
{ ... "1985-04-23 01:30:22", ... "ORDERED" }
]
}
ORDER
S
TRACKIN
G
ITEM
S
I
D
FIRST_NA
ME
LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
I
D
ORDER_I
D
QT
Y
DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad
1,000,00
0
ORDER_I
D
TIMESTAMP STATUS
1
1985-04-30
09:48:00
ORDERED
2
1985-04-23
01:30:22
ORDERED
2
1985-04-25
08:30:00
SHIPPED
2
1985-05-14
21:37:00
DELIVERED
{
"first_name" : "Ernst",
"last_name" : "Blofeldt",
"address" : "Caracas, Venezuela",
"items" : [
{ ..., "description" : "Cat Food", ... },
{ ..., "description" : "Launch Pad", ... }
],
"tracking" : [
{ ... "1985-04-23 01:30:22", ... "ORDERED" },
{ ... "1985-04-25 08:30:00", ... "SHIPPED" }
]
}
ORDER
S
TRACKIN
G
ITEM
S
I
D
FIRST_NA
ME
LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
I
D
ORDER_I
D
QT
Y
DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad
1,000,00
0
ORDER_I
D
TIMESTAMP STATUS
1
1985-04-30
09:48:00
ORDERED
2
1985-04-23
01:30:22
ORDERED
2
1985-04-25
08:30:00
SHIPPED
2
1985-05-14
21:37:00
DELIVERED
{
"first_name" : "Ernst",
"last_name" : "Blofeldt",
"address" : "Caracas, Venezuela",
"items" : [
{ ..., "description" : "Cat Food", ... },
{ ..., "description" : "Launch Pad", ... }
],
"tracking" : [
{ ... "1985-04-23 01:30:22", ... "ORDERED" },
{ ... "1985-04-25 08:30:00", ... "SHIPPED" },
{ ... "1985-05-14 21:37:00", .. "DELIVERED" }
]
}
ORDER
S
TRACKIN
G
ITEM
S
I
D
FIRST_NA
ME
LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
I
D
ORDER_I
D
QT
Y
DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad
1,000,00
0
ORDER_I
D
TIMESTAMP STATUS
1
1985-04-30
09:48:00
ORDERED
2
1985-04-23
01:30:22
ORDERED
2
1985-04-25
08:30:00
SHIPPED
2
1985-05-14
21:37:00
DELIVERED
{
"first_name" : "Ernst",
"last_name" : "Blofeldt",
"address" : "Caracas, Venezuela",
"items" : [
{ ..., "description" : "Cat Food", ... },
{ ..., "description" : "Launch Pad", ... }
],
"tracking" : [
{ ... "1985-04-23 01:30:22", ... "ORDERED" },
{ ... "1985-04-25 08:30:00", ... "SHIPPED" },
{ ... "1985-05-14 21:37:00", .. "DELIVERED" }
]
}
ORDER
S
TRACKIN
G
ITEM
S
I
D
FIRST_NA
ME
LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
I
D
ORDER_I
D
QT
Y
DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad
1,000,00
0
ORDER_I
D
TIMESTAMP STATUS
1
1985-04-30
09:48:00
ORDERED
2
1985-04-23
01:30:22
ORDERED
2
1985-04-25
08:30:00
SHIPPED
2
1985-05-14
21:37:00
DELIVERED
Done!
# M D B l o c a l
Results
14.5
95.9
8.5 8.1
0
20
40
60
80
100
120
Time (min)
Nested Queries Build in DB Lookup from Memory Co-Iteration
# M D B l o c a l
Fan-In and Fan-Out
ETL Job
Number of Database Operations per MongoDB Document
3/n 1
# M D B l o c a l
• Yes. Although not as straightforward as you might think.
Did you just explain to me what a JOIN
is?
• No. Co-Iteration works from multiple data sources.
NAME ITEM TRACKING
James Bond Aston Martin ORDERED
James Bond Aston Martin SHIPPED
James Bond Dinner Jacket ORDERED
James Bond Dinner Jacket SHIPPED
James Bond Champagne ORDERED
James Bond Champagne SHIPPED
# M D B l o c a l
Oh, and one more thing...
# M D B l o c a l
Threading and Batching
batch
size
threads
through
put
# M D B l o c a l
Fan-In and Fan-out
ETL Job
Number of Database Operations per MongoDB Document
3/n 1/1000
# M D B l o c a l
Results
14.5
9.1
95.9
36.2
8.5
48.1 3.9
0
20
40
60
80
100
120
Simple Batch = 1000
Nested Queries Build in DB Lookup from Memory Co-Iteration
# M D B l o c a l
• Common Mistakes to Watch Out For
• Nested Queries
• Building Documents in the Database
• Loading Everything into Memory
• The Co-Iteration Pattern
• Open All Tables at Once
• Perform a Single Pass over Them
• Build Documents as You Go Along
• Don't Forget Batching and Threading
Summary
# M D B l o c a l
Thank you.
github.com/drmirror/etlpro

More Related Content

What's hot (20)

Data Governance with JSON Schema
Data Governance with JSON SchemaData Governance with JSON Schema
Data Governance with JSON Schema
MongoDB
 
Webinar: Strongly Typed Languages and Flexible Schemas
Webinar: Strongly Typed Languages and Flexible SchemasWebinar: Strongly Typed Languages and Flexible Schemas
Webinar: Strongly Typed Languages and Flexible Schemas
MongoDB
 
Strongly Typed Languages and Flexible Schemas
Strongly Typed Languages and Flexible SchemasStrongly Typed Languages and Flexible Schemas
Strongly Typed Languages and Flexible Schemas
Norberto Leite
 
MongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB Europe 2016 - Debugging MongoDB PerformanceMongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB
 
MongoDB (Advanced)
MongoDB (Advanced)MongoDB (Advanced)
MongoDB (Advanced)
TO THE NEW | Technology
 
[1062BPY12001] Data analysis with R / week 4
[1062BPY12001] Data analysis with R / week 4[1062BPY12001] Data analysis with R / week 4
[1062BPY12001] Data analysis with R / week 4
Kevin Chun-Hsien Hsu
 
Webinar: Exploring the Aggregation Framework
Webinar: Exploring the Aggregation FrameworkWebinar: Exploring the Aggregation Framework
Webinar: Exploring the Aggregation Framework
MongoDB
 
Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)
MongoSF
 
Mongo db basic installation
Mongo db basic installationMongo db basic installation
Mongo db basic installation
Kishor Parkhe
 
Patterns for slick database applications
Patterns for slick database applicationsPatterns for slick database applications
Patterns for slick database applications
Skills Matter
 
Cassandra introduction at FinishJUG
Cassandra introduction at FinishJUGCassandra introduction at FinishJUG
Cassandra introduction at FinishJUG
Duyhai Doan
 
Using Scala Slick at FortyTwo
Using Scala Slick at FortyTwoUsing Scala Slick at FortyTwo
Using Scala Slick at FortyTwo
Eishay Smith
 
Schema design short
Schema design shortSchema design short
Schema design short
MongoDB
 
NoSQL を Ruby で実践するための n 個の方法
NoSQL を Ruby で実践するための n 個の方法NoSQL を Ruby で実践するための n 個の方法
NoSQL を Ruby で実践するための n 個の方法
Tomohiro Nishimura
 
MongoDB .local Toronto 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
MongoDB .local Toronto 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...MongoDB .local Toronto 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
MongoDB .local Toronto 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
MongoDB
 
MongoDB .local Chicago 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
MongoDB .local Chicago 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...MongoDB .local Chicago 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
MongoDB .local Chicago 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
MongoDB
 
Embedding a language into string interpolator
Embedding a language into string interpolatorEmbedding a language into string interpolator
Embedding a language into string interpolator
Michael Limansky
 
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
Indexing
IndexingIndexing
Indexing
Mike Dirolf
 
Data Governance with JSON Schema
Data Governance with JSON SchemaData Governance with JSON Schema
Data Governance with JSON Schema
MongoDB
 
Webinar: Strongly Typed Languages and Flexible Schemas
Webinar: Strongly Typed Languages and Flexible SchemasWebinar: Strongly Typed Languages and Flexible Schemas
Webinar: Strongly Typed Languages and Flexible Schemas
MongoDB
 
Strongly Typed Languages and Flexible Schemas
Strongly Typed Languages and Flexible SchemasStrongly Typed Languages and Flexible Schemas
Strongly Typed Languages and Flexible Schemas
Norberto Leite
 
MongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB Europe 2016 - Debugging MongoDB PerformanceMongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB
 
[1062BPY12001] Data analysis with R / week 4
[1062BPY12001] Data analysis with R / week 4[1062BPY12001] Data analysis with R / week 4
[1062BPY12001] Data analysis with R / week 4
Kevin Chun-Hsien Hsu
 
Webinar: Exploring the Aggregation Framework
Webinar: Exploring the Aggregation FrameworkWebinar: Exploring the Aggregation Framework
Webinar: Exploring the Aggregation Framework
MongoDB
 
Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)
MongoSF
 
Mongo db basic installation
Mongo db basic installationMongo db basic installation
Mongo db basic installation
Kishor Parkhe
 
Patterns for slick database applications
Patterns for slick database applicationsPatterns for slick database applications
Patterns for slick database applications
Skills Matter
 
Cassandra introduction at FinishJUG
Cassandra introduction at FinishJUGCassandra introduction at FinishJUG
Cassandra introduction at FinishJUG
Duyhai Doan
 
Using Scala Slick at FortyTwo
Using Scala Slick at FortyTwoUsing Scala Slick at FortyTwo
Using Scala Slick at FortyTwo
Eishay Smith
 
Schema design short
Schema design shortSchema design short
Schema design short
MongoDB
 
NoSQL を Ruby で実践するための n 個の方法
NoSQL を Ruby で実践するための n 個の方法NoSQL を Ruby で実践するための n 個の方法
NoSQL を Ruby で実践するための n 個の方法
Tomohiro Nishimura
 
MongoDB .local Toronto 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
MongoDB .local Toronto 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...MongoDB .local Toronto 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
MongoDB .local Toronto 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
MongoDB
 
MongoDB .local Chicago 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
MongoDB .local Chicago 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...MongoDB .local Chicago 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
MongoDB .local Chicago 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
MongoDB
 
Embedding a language into string interpolator
Embedding a language into string interpolatorEmbedding a language into string interpolator
Embedding a language into string interpolator
Michael Limansky
 
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 

Similar to ETL for Pros: Getting Data Into MongoDB (20)

[MongoDB.local Bengaluru 2018] Keynote
[MongoDB.local Bengaluru 2018] Keynote[MongoDB.local Bengaluru 2018] Keynote
[MongoDB.local Bengaluru 2018] Keynote
MongoDB
 
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
Daniel Cousineau
 
MongoDB World 2018: Keynote
MongoDB World 2018: KeynoteMongoDB World 2018: Keynote
MongoDB World 2018: Keynote
MongoDB
 
Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...
Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...
Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...
festival ICT 2016
 
MongoDB Solution for Internet of Things and Big Data
MongoDB Solution for Internet of Things and Big DataMongoDB Solution for Internet of Things and Big Data
MongoDB Solution for Internet of Things and Big Data
Stefano Dindo
 
Simplifying & accelerating application development with MongoDB's intelligent...
Simplifying & accelerating application development with MongoDB's intelligent...Simplifying & accelerating application development with MongoDB's intelligent...
Simplifying & accelerating application development with MongoDB's intelligent...
Maxime Beugnet
 
MongoDB Meetup
MongoDB MeetupMongoDB Meetup
MongoDB Meetup
Maxime Beugnet
 
Intro to MongoDB (Extended Session)
Intro to MongoDB (Extended Session)Intro to MongoDB (Extended Session)
Intro to MongoDB (Extended Session)
All Things Open
 
Jumpstart: Introduction to Schema Design
Jumpstart: Introduction to Schema DesignJumpstart: Introduction to Schema Design
Jumpstart: Introduction to Schema Design
MongoDB
 
MongoDB and Schema Design
MongoDB and Schema DesignMongoDB and Schema Design
MongoDB and Schema Design
Matias Cascallares
 
MongoDB World 2019: Benchmarking Transactions: MongoDB Meets TPC-C
MongoDB World 2019: Benchmarking Transactions: MongoDB Meets TPC-C MongoDB World 2019: Benchmarking Transactions: MongoDB Meets TPC-C
MongoDB World 2019: Benchmarking Transactions: MongoDB Meets TPC-C
MongoDB
 
Intro to MongoDB Workshop
Intro to MongoDB WorkshopIntro to MongoDB Workshop
Intro to MongoDB Workshop
Lauren Hayward Schaefer
 
Webinar: Position and Trade Management with MongoDB
Webinar: Position and Trade Management with MongoDBWebinar: Position and Trade Management with MongoDB
Webinar: Position and Trade Management with MongoDB
MongoDB
 
Agile Database Development with JSON
Agile Database Development with JSONAgile Database Development with JSON
Agile Database Development with JSON
Chris Saxon
 
From SQL to MongoDB
From SQL to MongoDBFrom SQL to MongoDB
From SQL to MongoDB
christkv
 
How to leverage what's new in MongoDB 3.6
How to leverage what's new in MongoDB 3.6How to leverage what's new in MongoDB 3.6
How to leverage what's new in MongoDB 3.6
Maxime Beugnet
 
Webinar: Schema Patterns and Your Storage Engine
Webinar: Schema Patterns and Your Storage EngineWebinar: Schema Patterns and Your Storage Engine
Webinar: Schema Patterns and Your Storage Engine
MongoDB
 
Introduction to MongoDB at IGDTUW
Introduction to MongoDB at IGDTUWIntroduction to MongoDB at IGDTUW
Introduction to MongoDB at IGDTUW
Ankur Raina
 
MongoDB in FS
MongoDB in FSMongoDB in FS
MongoDB in FS
MongoDB
 
Online | MongoDB Atlas on GCP Workshop
Online | MongoDB Atlas on GCP Workshop Online | MongoDB Atlas on GCP Workshop
Online | MongoDB Atlas on GCP Workshop
Natasha Wilson
 
[MongoDB.local Bengaluru 2018] Keynote
[MongoDB.local Bengaluru 2018] Keynote[MongoDB.local Bengaluru 2018] Keynote
[MongoDB.local Bengaluru 2018] Keynote
MongoDB
 
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
Daniel Cousineau
 
MongoDB World 2018: Keynote
MongoDB World 2018: KeynoteMongoDB World 2018: Keynote
MongoDB World 2018: Keynote
MongoDB
 
Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...
Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...
Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...
festival ICT 2016
 
MongoDB Solution for Internet of Things and Big Data
MongoDB Solution for Internet of Things and Big DataMongoDB Solution for Internet of Things and Big Data
MongoDB Solution for Internet of Things and Big Data
Stefano Dindo
 
Simplifying & accelerating application development with MongoDB's intelligent...
Simplifying & accelerating application development with MongoDB's intelligent...Simplifying & accelerating application development with MongoDB's intelligent...
Simplifying & accelerating application development with MongoDB's intelligent...
Maxime Beugnet
 
Intro to MongoDB (Extended Session)
Intro to MongoDB (Extended Session)Intro to MongoDB (Extended Session)
Intro to MongoDB (Extended Session)
All Things Open
 
Jumpstart: Introduction to Schema Design
Jumpstart: Introduction to Schema DesignJumpstart: Introduction to Schema Design
Jumpstart: Introduction to Schema Design
MongoDB
 
MongoDB World 2019: Benchmarking Transactions: MongoDB Meets TPC-C
MongoDB World 2019: Benchmarking Transactions: MongoDB Meets TPC-C MongoDB World 2019: Benchmarking Transactions: MongoDB Meets TPC-C
MongoDB World 2019: Benchmarking Transactions: MongoDB Meets TPC-C
MongoDB
 
Webinar: Position and Trade Management with MongoDB
Webinar: Position and Trade Management with MongoDBWebinar: Position and Trade Management with MongoDB
Webinar: Position and Trade Management with MongoDB
MongoDB
 
Agile Database Development with JSON
Agile Database Development with JSONAgile Database Development with JSON
Agile Database Development with JSON
Chris Saxon
 
From SQL to MongoDB
From SQL to MongoDBFrom SQL to MongoDB
From SQL to MongoDB
christkv
 
How to leverage what's new in MongoDB 3.6
How to leverage what's new in MongoDB 3.6How to leverage what's new in MongoDB 3.6
How to leverage what's new in MongoDB 3.6
Maxime Beugnet
 
Webinar: Schema Patterns and Your Storage Engine
Webinar: Schema Patterns and Your Storage EngineWebinar: Schema Patterns and Your Storage Engine
Webinar: Schema Patterns and Your Storage Engine
MongoDB
 
Introduction to MongoDB at IGDTUW
Introduction to MongoDB at IGDTUWIntroduction to MongoDB at IGDTUW
Introduction to MongoDB at IGDTUW
Ankur Raina
 
MongoDB in FS
MongoDB in FSMongoDB in FS
MongoDB in FS
MongoDB
 
Online | MongoDB Atlas on GCP Workshop
Online | MongoDB Atlas on GCP Workshop Online | MongoDB Atlas on GCP Workshop
Online | MongoDB Atlas on GCP Workshop
Natasha Wilson
 
Ad

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDBMongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDBMongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB
 
Ad

ETL for Pros: Getting Data Into MongoDB

  • 1. O C T O B E R 1 2 , 2 0 1 7 | B E S P O K E | S A N F R A N C I S C O # M D B l o c a l ETL for Pros Getting Data into MongoDB
  • 2. # M D B l o c a l Principal Consulting Engineer André Spiegel MongoDB @drmirror
  • 3. # M D B l o c a l Remember this?
  • 4. # M D B l o c a l Sound familiar? At some point, most applications need to batch-load large amounts of data • billions of documents • huge initial load • daily updates
  • 5. # M D B l o c a l Using MongoDB properly means complex documents Sound familiar? { "_id" : "admin.mongo_dba", "user" : "mongo_dba", "db" : "admin", "roles" : [ { "role" : "root", "db" : "admin" }, { "role" : "restore", "db" : "admin" } ] } [ { "$sort" : { "st": 1 } }, { "$group" : { "_id" : "$st", "start" : { "$first" : "$ts" }, "end" : { "$last" : "$ts" } } } ]
  • 6. # M D B l o c a l How do I create these documents from relational tables? Sound familiar?
  • 7. # M D B l o c a l Sound familiar? How do I do it fast? Image: Julian Lim
  • 8. # M D B l o c a l I've done this for a few years I've seen people do it We all make the same mistakes Let's understand them and come up with something better
  • 9. # M D B l o c a l Case Study
  • 10. ORDER S TRACKIN G ITEM S I D FIRST_NA ME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela I D ORDER_I D QT Y DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,00 0 ORDER_I D TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED
  • 11. ORDER S TRACKIN G ITEM S I D FIRST_NA ME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela I D ORDER_I D QT Y DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,00 0 ORDER_I D TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED
  • 12. ORDER S TRACKIN G ITEM S I D FIRST_NA ME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela I D ORDER_I D QT Y DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,00 0 ORDER_I D TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED
  • 13. { "first_name" : "James", "last_name" : "Bond", "address" : "Nassau, Bahamas, US", "items" : [ { "qty": 1, "description" : "Aston Martin", "price" : 120000 }, { "qty": 1, "description" : "Dinner Jacket", "price" : 4000 }, { "qty": 3, "description" : "Champagne Veuve-Cliquot", "price": 200 } ], "tracking" : [ { "timestamp" : "1985-04-30 09:48:00", "status": "ORDERED" } ] }
  • 14. { "first_name" : "James", "last_name" : "Bond", "address" : "Nassau, Bahamas, US", "items" : [ { "qty": 1, "description" : "Aston Martin", "price" : 120000 }, { "qty": 1, "description" : "Dinner Jacket", "price" : 4000 }, { "qty": 3, "description" : "Champagne Veuve-Cliquot", "price": 200 } ], "tracking" : [ { "timestamp" : "1985-04-30 09:48:00", "status": "ORDERED" } ] }
  • 15. { "first_name" : "James", "last_name" : "Bond", "address" : "Nassau, Bahamas, US", "items" : [ { "qty": 1, "description" : "Aston Martin", "price" : 120000 }, { "qty": 1, "description" : "Dinner Jacket", "price" : 4000 }, { "qty": 3, "description" : "Champagne Veuve-Cliquot", "price": 200 } ], "tracking" : [ { "timestamp" : "1985-04-30 09:48:00", "status": "ORDERED" } ] }
  • 16. { "first_name" : "James", "last_name" : "Bond", "address" : "Nassau, Bahamas, US", "items" : [ { "qty": 1, "description" : "Aston Martin", "price" : 120000 }, { "qty": 1, "description" : "Dinner Jacket", "price" : 4000 }, { "qty": 3, "description" : "Champagne Veuve-Cliquot", "price": 200 } ], "tracking" : [ { "timestamp" : "1985-04-30 09:48:00", "status": "ORDERED" } ] }
  • 17. # M D B l o c a l ETL Tools: Talend, Pentaho, Informatica, ... • Gretchen's Question: How do you handle arrays? How do I get from relational to JSON?
  • 18. # M D B l o c a l WYOC (Write Your Own Code) • More challenging, but you've got ultimate control How do I get from relational to JSON?
  • 19. # M D B l o c a l • Any operation in the CPU is on the order of nanoseconds: 0.000 000 001s • typically tens of nanoseconds per high-level operation • Any roundtrip to the database is on the order of milliseconds: 0.001s • typically just under 1 millisecond at the minimum • mostly due to network protocol stack latency • faster networks don't help • in-memory storage does not help Orders of Magnitude
  • 20. # M D B l o c a l A Gallery of Mistakes
  • 21. ORDER S TRACKIN G ITEM S I D FIRST_NA ME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela I D ORDER_I D QT Y DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,00 0 ORDER_I D TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED
  • 22. # M D B l o c a l Mistake #1 – Nested queries for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } for y in SELECT * FROM ITEMS WHERE ORDER_ID = x.order_id doc.items.push (y) for z in SELECT * FROM TRACKING WHERE ORDER_ID = x.order_id doc.tracking.push (y) mongodb.insert (doc)
  • 23. # M D B l o c a l Mistake #1 – Nested queries for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } for y in SELECT * FROM ITEMS WHERE ORDER_ID = x.order_id doc.items.push (y) for z in SELECT * FROM TRACKING WHERE ORDER_ID = x.order_id doc.tracking.push (y) mongodb.insert (doc)
  • 24. # M D B l o c a l Mistake #1 – Nested queries for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } for y in SELECT * FROM ITEMS WHERE ORDER_ID = x.order_id doc.items.push (y) for z in SELECT * FROM TRACKING WHERE ORDER_ID = x.order_id doc.tracking.push (y) mongodb.insert (doc)
  • 25. # M D B l o c a l Mistake #1 – Nested queries for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } for y in SELECT * FROM ITEMS WHERE ORDER_ID = x.order_id doc.items.push (y) for z in SELECT * FROM TRACKING WHERE ORDER_ID = x.order_id doc.tracking.push (y) mongodb.insert (doc)
  • 26. # M D B l o c a l Mistake #1 – Nested queries for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } for y in SELECT * FROM ITEMS WHERE ORDER_ID = x.order_id doc.items.push (y) for z in SELECT * FROM TRACKING WHERE ORDER_ID = x.order_id doc.tracking.push (y) mongodb.insert (doc)
  • 27. # M D B l o c a l Mistake #1 – Nested queries for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } for y in SELECT * FROM ITEMS WHERE ORDER_ID = x.order_id doc.items.push (y) for z in SELECT * FROM TRACKING WHERE ORDER_ID = x.order_id doc.tracking.push (y) mongodb.insert (doc)
  • 28. # M D B l o c a l Mistake #1 – Nested queries for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } for y in SELECT * FROM ITEMS WHERE ORDER_ID = x.order_id doc.items.push (y) for z in SELECT * FROM TRACKING WHERE ORDER_ID = x.order_id doc.tracking.push (y) mongodb.insert (doc)
  • 29. # M D B l o c a l Results 14.5 0 2 4 6 8 10 12 14 16 Time (min) Nested Queries • 1 million orders • 10 million line items • 3 million tracking states • MySQL (local) to MongoDB (local) • Python
  • 30. # M D B l o c a l Fan-In and Fan-out ETL Job Number of Database Operations per MongoDB Document 1/n + 2 1
  • 31. # M D B l o c a l Mistake #2 – Build documents in DB for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } mongodb.insert (doc) for y in SELECT * FROM ITEMS mongodb.update ({"_id" : y.order_id}, {"$push" : {"items" : y}}) for z in SELECT * FROM TRACKING mongodb.update ({"_id" : z.order_id}, {"$push" : {"tracking" : z}})
  • 32. # M D B l o c a l Mistake #2 – Build documents in DB for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } mongodb.insert (doc) for y in SELECT * FROM ITEMS mongodb.update ({"_id" : y.order_id}, {"$push" : {"items" : y}}) for z in SELECT * FROM TRACKING mongodb.update ({"_id" : z.order_id}, {"$push" : {"tracking" : z}})
  • 33. # M D B l o c a l Mistake #2 – Build documents in DB for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } mongodb.insert (doc) for y in SELECT * FROM ITEMS mongodb.update ({"_id" : y.order_id}, {"$push" : {"items" : y}}) for z in SELECT * FROM TRACKING mongodb.update ({"_id" : z.order_id}, {"$push" : {"tracking" : z}})
  • 34. # M D B l o c a l Mistake #2 – Build documents in DB for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } mongodb.insert (doc) for y in SELECT * FROM ITEMS mongodb.update ({"_id" : y.order_id}, {"$push" : {"items" : y}}) for z in SELECT * FROM TRACKING mongodb.update ({"_id" : z.order_id}, {"$push" : {"tracking" : z}})
  • 35. # M D B l o c a l Mistake #2 – Build documents in DB for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } mongodb.insert (doc) for y in SELECT * FROM ITEMS mongodb.update ({"_id" : y.order_id}, {"$push" : {"items" : y}}) for z in SELECT * FROM TRACKING mongodb.update ({"_id" : z.order_id}, {"$push" : {"tracking" : z}})
  • 36. # M D B l o c a l Mistake #2 – Build documents in DB for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } mongodb.insert (doc) for y in SELECT * FROM ITEMS mongodb.update ({"_id" : y.order_id}, {"$push" : {"items" : y}}) for z in SELECT * FROM TRACKING mongodb.update ({"_id" : z.order_id}, {"$push" : {"tracking" : z}})
  • 37. # M D B l o c a l Mistake #2 – Build documents in DB for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } mongodb.insert (doc) for y in SELECT * FROM ITEMS mongodb.update ({"_id" : y.order_id}, {"$push" : {"items" : y}}) for z in SELECT * FROM TRACKING mongodb.update ({"_id" : z.order_id}, {"$push" : {"tracking" : z}})
  • 38. # M D B l o c a l Fan-In and Fan-out ETL Job Number of Database Operations per MongoDB Document 3/n 1 + p + q
  • 39. # M D B l o c a l Results 14.5 95.9 0 20 40 60 80 100 120 Time (min) Nested Queries Build in DB
  • 40. # M D B l o c a l Mistake #3 – Load it all into memory db_items = SELECT * FROM ITEMS db_tracking = SELECT * FROM TRACKING for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } doc.items.pushAll (db_items.getAll(x.order_id)) doc.tracking.pushAll (db_tracking.getAll(x.order_id)) mongodb.insert (doc)
  • 41. # M D B l o c a l Mistake #3 – Load it all into memory db_items = SELECT * FROM ITEMS db_tracking = SELECT * FROM TRACKING for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } doc.items.pushAll (db_items.getAll(x.order_id)) doc.tracking.pushAll (db_tracking.getAll(x.order_id)) mongodb.insert (doc)
  • 42. # M D B l o c a l Mistake #3 – Load it all into memory db_items = SELECT * FROM ITEMS db_tracking = SELECT * FROM TRACKING for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } doc.items.pushAll (db_items.getAll(x.order_id)) doc.tracking.pushAll (db_tracking.getAll(x.order_id)) mongodb.insert (doc)
  • 43. # M D B l o c a l Mistake #3 – Load it all into memory db_items = SELECT * FROM ITEMS db_tracking = SELECT * FROM TRACKING for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } doc.items.pushAll (db_items.getAll(x.order_id)) doc.tracking.pushAll (db_tracking.getAll(x.order_id)) mongodb.insert (doc)
  • 44. # M D B l o c a l Mistake #3 – Load it all into memory db_items = SELECT * FROM ITEMS db_tracking = SELECT * FROM TRACKING for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } doc.items.pushAll (db_items.getAll(x.order_id)) doc.tracking.pushAll (db_tracking.getAll(x.order_id)) mongodb.insert (doc)
  • 45. # M D B l o c a l Mistake #3 – Load it all into memory db_items = SELECT * FROM ITEMS db_tracking = SELECT * FROM TRACKING for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } doc.items.pushAll (db_items.getAll(x.order_id)) doc.tracking.pushAll (db_tracking.getAll(x.order_id)) mongodb.insert (doc)
  • 46. # M D B l o c a l Fan-In and Fan-out ETL Job Number of Database Operations per MongoDB Document 3/n 1
  • 47. # M D B l o c a l Results 14.5 95.9 8.5 0 20 40 60 80 100 120 Time (min) Nested Queries Build in DB Lookup from Memory
  • 48. # M D B l o c a l Getting it right: Co-Iteration
  • 49. ORDER S TRACKIN G ITEM S I D FIRST_NA ME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela I D ORDER_I D QT Y DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,00 0 ORDER_I D TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED
  • 50. ORDER S TRACKIN G ITEM S I D FIRST_NA ME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela I D ORDER_I D QT Y DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,00 0 ORDER_I D TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED
  • 51. ORDER S TRACKIN G ITEM S I D FIRST_NA ME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela I D ORDER_I D QT Y DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,00 0 ORDER_I D TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED { "first_name" : "James", "last_name" : "Bond", "address" : "Nassau, Bahamas, US" }
  • 52. ORDER S TRACKIN G ITEM S I D FIRST_NA ME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela I D ORDER_I D QT Y DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,00 0 ORDER_I D TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED { "first_name" : "James", "last_name" : "Bond", "address" : "Nassau, Bahamas, US", "items" : [ { ..., "description" : "Aston Martin", ... } ] }
  • 53. ORDER S TRACKIN G ITEM S I D FIRST_NA ME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela I D ORDER_I D QT Y DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,00 0 ORDER_I D TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED { "first_name" : "James", "last_name" : "Bond", "address" : "Nassau, Bahamas, US", "items" : [ { ..., "description" : "Aston Martin", ... }, { ..., "description" : "Dinner Jacket", ... } ] }
  • 54. ORDER S TRACKIN G ITEM S I D FIRST_NA ME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela I D ORDER_I D QT Y DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,00 0 ORDER_I D TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED { "first_name" : "James", "last_name" : "Bond", "address" : "Nassau, Bahamas, US", "items" : [ { ..., "description" : "Aston Martin", ... }, { ..., "description" : "Dinner Jacket", ... }, { ..., "description" : "Champagne...", ... } ] }
  • 55. ORDER S TRACKIN G ITEM S I D FIRST_NA ME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela I D ORDER_I D QT Y DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,00 0 ORDER_I D TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED { "first_name" : "James", "last_name" : "Bond", "address" : "Nassau, Bahamas, US", "items" : [ { ..., "description" : "Aston Martin", ... }, { ..., "description" : "Dinner Jacket", ... }, { ..., "description" : "Champagne...", ... } ] }
  • 56. ORDER S TRACKIN G ITEM S I D FIRST_NA ME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela I D ORDER_I D QT Y DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,00 0 ORDER_I D TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED { "first_name" : "James", "last_name" : "Bond", "address" : "Nassau, Bahamas, US", "items" : [ { ..., "description" : "Aston Martin", ... }, { ..., "description" : "Dinner Jacket", ... }, { ..., "description" : "Champagne...", ... } ], "tracking" : [ { ... "1985-04-30 09:48:00", ... "ORDERED" } ] }
  • 57. ORDER S TRACKIN G ITEM S I D FIRST_NA ME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela I D ORDER_I D QT Y DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,00 0 ORDER_I D TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED { "first_name" : "James", "last_name" : "Bond", "address" : "Nassau, Bahamas, US", "items" : [ { ..., "description" : "Aston Martin", ... }, { ..., "description" : "Dinner Jacket", ... }, { ..., "description" : "Champagne...", ... } ], "tracking" : [ { ... "1985-04-30 09:48:00", ... "ORDERED" } ] }
  • 58. ORDER S TRACKIN G ITEM S I D FIRST_NA ME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela I D ORDER_I D QT Y DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,00 0 ORDER_I D TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED
  • 59. ORDER S TRACKIN G ITEM S I D FIRST_NA ME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela I D ORDER_I D QT Y DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,00 0 ORDER_I D TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED
  • 60. ORDER S TRACKIN G ITEM S I D FIRST_NA ME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela I D ORDER_I D QT Y DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,00 0 ORDER_I D TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED { "first_name" : "Ernst", "last_name" : "Blofeldt", "address" : "Caracas, Venezuela" }
  • 61. ORDER S TRACKIN G ITEM S I D FIRST_NA ME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela I D ORDER_I D QT Y DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,00 0 ORDER_I D TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED { "first_name" : "Ernst", "last_name" : "Blofeldt", "address" : "Caracas, Venezuela", "items" : [ { ..., "description" : "Cat Food", ... } ] }
  • 62. ORDER S TRACKIN G ITEM S I D FIRST_NA ME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela I D ORDER_I D QT Y DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,00 0 ORDER_I D TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED { "first_name" : "Ernst", "last_name" : "Blofeldt", "address" : "Caracas, Venezuela", "items" : [ { ..., "description" : "Cat Food", ... }, { ..., "description" : "Launch Pad", ... } ] }
  • 63. ORDER S TRACKIN G ITEM S I D FIRST_NA ME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela I D ORDER_I D QT Y DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,00 0 ORDER_I D TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED { "first_name" : "Ernst", "last_name" : "Blofeldt", "address" : "Caracas, Venezuela", "items" : [ { ..., "description" : "Cat Food", ... }, { ..., "description" : "Launch Pad", ... } ] }
  • 64. ORDER S TRACKIN G ITEM S I D FIRST_NA ME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela I D ORDER_I D QT Y DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,00 0 ORDER_I D TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED { "first_name" : "Ernst", "last_name" : "Blofeldt", "address" : "Caracas, Venezuela", "items" : [ { ..., "description" : "Cat Food", ... }, { ..., "description" : "Launch Pad", ... } ], "tracking" : [ { ... "1985-04-23 01:30:22", ... "ORDERED" } ] }
  • 65. ORDER S TRACKIN G ITEM S I D FIRST_NA ME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela I D ORDER_I D QT Y DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,00 0 ORDER_I D TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED { "first_name" : "Ernst", "last_name" : "Blofeldt", "address" : "Caracas, Venezuela", "items" : [ { ..., "description" : "Cat Food", ... }, { ..., "description" : "Launch Pad", ... } ], "tracking" : [ { ... "1985-04-23 01:30:22", ... "ORDERED" }, { ... "1985-04-25 08:30:00", ... "SHIPPED" } ] }
  • 66. ORDER S TRACKIN G ITEM S I D FIRST_NA ME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela I D ORDER_I D QT Y DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,00 0 ORDER_I D TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED { "first_name" : "Ernst", "last_name" : "Blofeldt", "address" : "Caracas, Venezuela", "items" : [ { ..., "description" : "Cat Food", ... }, { ..., "description" : "Launch Pad", ... } ], "tracking" : [ { ... "1985-04-23 01:30:22", ... "ORDERED" }, { ... "1985-04-25 08:30:00", ... "SHIPPED" }, { ... "1985-05-14 21:37:00", .. "DELIVERED" } ] }
  • 67. ORDER S TRACKIN G ITEM S I D FIRST_NA ME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela I D ORDER_I D QT Y DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,00 0 ORDER_I D TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED { "first_name" : "Ernst", "last_name" : "Blofeldt", "address" : "Caracas, Venezuela", "items" : [ { ..., "description" : "Cat Food", ... }, { ..., "description" : "Launch Pad", ... } ], "tracking" : [ { ... "1985-04-23 01:30:22", ... "ORDERED" }, { ... "1985-04-25 08:30:00", ... "SHIPPED" }, { ... "1985-05-14 21:37:00", .. "DELIVERED" } ] }
  • 68. ORDER S TRACKIN G ITEM S I D FIRST_NA ME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela I D ORDER_I D QT Y DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,00 0 ORDER_I D TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED Done!
  • 69. # M D B l o c a l Results 14.5 95.9 8.5 8.1 0 20 40 60 80 100 120 Time (min) Nested Queries Build in DB Lookup from Memory Co-Iteration
  • 70. # M D B l o c a l Fan-In and Fan-Out ETL Job Number of Database Operations per MongoDB Document 3/n 1
  • 71. # M D B l o c a l • Yes. Although not as straightforward as you might think. Did you just explain to me what a JOIN is? • No. Co-Iteration works from multiple data sources. NAME ITEM TRACKING James Bond Aston Martin ORDERED James Bond Aston Martin SHIPPED James Bond Dinner Jacket ORDERED James Bond Dinner Jacket SHIPPED James Bond Champagne ORDERED James Bond Champagne SHIPPED
  • 72. # M D B l o c a l Oh, and one more thing...
  • 73. # M D B l o c a l Threading and Batching batch size threads through put
  • 74. # M D B l o c a l Fan-In and Fan-out ETL Job Number of Database Operations per MongoDB Document 3/n 1/1000
  • 75. # M D B l o c a l Results 14.5 9.1 95.9 36.2 8.5 48.1 3.9 0 20 40 60 80 100 120 Simple Batch = 1000 Nested Queries Build in DB Lookup from Memory Co-Iteration
  • 76. # M D B l o c a l • Common Mistakes to Watch Out For • Nested Queries • Building Documents in the Database • Loading Everything into Memory • The Co-Iteration Pattern • Open All Tables at Once • Perform a Single Pass over Them • Build Documents as You Go Along • Don't Forget Batching and Threading Summary
  • 77. # M D B l o c a l Thank you. github.com/drmirror/etlpro