SlideShare a Scribd company logo
ATechnical Introduction to WiredTiger
info@wiredtiger.com
Presenter
•  Keith Bostic
•  Co-architect WiredTiger
•  Senior Staff Engineer MongoDB
Ask questions as we go, or
keith.bostic@mongodb.com
3
This presentation is not…
•  How to write stand-alone WiredTiger apps
– contact info@wiredtiger.com
•  How to configure MongoDB with WiredTiger for your workload
4
WiredTiger
•  Embedded database engine
– general purpose toolkit
– high performing: scalable throughput with low latency
•  Key-value store (NoSQL)
•  Schema layer
– data typing, indexes
•  Single-node
•  OO APIs
– Python, C, C++, Java
•  Open Source
5
Deployments
•  Amazon AWS
•  ORC/Tbricks: financial trading solution
And, most important of all:
•  MongoDB: next-generation document store
You may have seen this:
or this…
8
MongoDB’s Storage Engine API
•  Allows different storage engines to "plug-in"
– different workloads have different performance characteristics
– mmapV1 is not ideal for all workloads
– more flexibility
•  mix storage engines on same replica set/sharded cluster
•  Opportunity to innovate further
– HDFS, encrypted, other workloads
•  WiredTiger is MongoDB’s general-purpose workhorse
Topics
Ø  WiredTiger Architecture
•  In-memory performance
•  Record-level concurrency
•  Compression
•  Durability and the journal
•  Future features
10
Motivation for WiredTiger
•  Traditional engines struggle with modern hardware:
– lots of CPU cores
– lots of RAM
•  Avoid thread contention for resources
– lock-free algorithms, for example, hazard pointers
– concurrency control without blocking
•  Hotter cache, more work per I/O
– big blocks
– compact file formats
11
WiredTiger Architecture
WiredTiger Engine
Schema &
Cursors
Python API C API Java API
Database
Files
Transactions
Page
read/write
Logging
Column
storage
Block
management
Row
storage
Snapshots
Log Files
Cache
12
Column-store, LSM
•  Column-store
– implemented inside the B+tree
– 64-bit record number keys
– valued by the key’s position in the tree
– variable-length or fixed-length
•  LSM
– forest of B+trees (row-store or column-store)
– bloom filters (fixed-length column-store)
•  Mix-and-match
– sparse, wide table: column-store primary, LSM indexes
Topics
ü  WiredTiger Architecture
Ø  In-memory performance
•  Record-level concurrency
•  Compression
•  Durability and the journal
•  Future features
14
Trees in cache
non-resident
child
ordinary pointer
root page
internal
page
internal
page
root page
leaf page
leaf page leaf page leaf page
15
Hazard Pointers
non-resident
child
ordinary pointer
root page
internal
page
internal
page
root page
leaf page
leaf page leaf page leaf page
1 memory
flush
16
Pages in cache
cache
data files
page images
on-disk
page
image
index
clean
page on-disk
page
image
indexdirty
page
updates
17
Skiplists
•  Updates stored in skiplists
– ordered linked lists with forward “skip” pointers
•  William Pugh, 1989
– simpler, as fast as binary-search, less space
– likely binary-search performance plus cache prefetch
– more space for an existing data set
•  Implementation
– insert without locking
– forward/backward traversal without locking, while inserting
– removal requires locking
18
In-memory performance
•  Cache trees/pages optimized for in-memory access
•  Follow pointers to traverse a tree
•  No locking to read or write
•  Keep updates separate from initial data
– updates are stored in skiplists
– updates are atomic in almost all cases
•  Do structural changes (eviction, splits) in background threads
Topics
ü  WiredTiger Architecture
ü  In-memory performance
Ø  Record-level concurrency
•  Compression
•  Durability and the journal
•  Future features
20
Multiversion Concurrency Control (MVCC)
•  Multiple versions of records maintained in cache
•  Readers see most recently committed version
– read-uncommitted or snapshot isolation available
– configurable per-transaction or per-handle
•  Writers can create new versions concurrent with readers
•  Concurrent updates to a single record cause write conflicts
– one of the updates wins
– other generally retries with back-off
•  No locking, no lock manager
21
Pages in cache
cache
data files
page images
on-disk
page
image
index
clean
page on-disk
page
image
indexdirty
page
updates
skiplist
22
MVCC In Action
on-disk
page
image
index
update1
(txn, value)
on-disk
page
image
index
update2
(txn, value)
update1
(txn, value)
update
on-disk
page
image
index
update
Topics
ü  WiredTiger Architecture
ü  In-memory performance
ü  Record-level concurrency
Ø  Compression
•  Durability and the journal
•  Future features
24
Block manager
•  Block allocation
– fragmentation
– allocation policy
•  Checksums
– block compression is at a higher level
•  Checkpoints
– involved in durability guarantees
•  Opaque address cookie
– stored as internal page key’s “value”
•  Pluggable
25
Write path
cache
data files
page images
on-disk
page
image
index
clean
page on-disk
page
image
indexdirty
page
updates
reconciled
during write
26
In-memory Compression
•  Prefix compression
– index keys usually have a common prefix
– rolling, per-block, requires instantiation for performance
•  Huffman/static encoding
– burns CPU
•  Dictionary lookup
– single value per page
•  Run-length encoding
– column-store values
27
On-disk Compression
•  Compression algorithms:
– snappy [default]: good compression, low overhead
– LZ4: good compression, low overhead, better page layout
– zlib: better compression, high overhead
– pluggable
•  Optional
– compressing filesystem instead
28
Compression in Action
Flights database
Topics
ü  WiredTiger Architecture
ü  In-memory performance
ü  Record-level concurrency
ü  Compression
Ø  Durability and the journal
•  Future features
30
Journal and Recovery
•  Write-ahead logging (aka journal) enabled by default
•  Only written at transaction commit
– only write redo records
•  Log records are compressed
•  Group commit for concurrency
•  Automatic log archival / removal
– bounded by checkpoint frequency
•  On startup, find a consistent checkpoint in the metadata
– use the checkpoint to figure out how much to roll forward
31
Durability without Journaling
•  MongoDB’s MMAP storage requires the journal for consistency
– running with “nojournal” is unsafe
•  WiredTiger is a no-overwrite data store
– with “nojournal”, updates since the last checkpoint may be lost
– data will still be consistent
– checkpoints every N seconds by default
•  Replication can guarantee durability
– the network is generally faster than disk I/O
Topics
ü  WiredTiger Architecture
ü  In-memory performance
ü  Record-level concurrency
ü  Compression
ü  Durability and the journal
Ø  Future features
33
34
What’s next for WiredTiger?
•  Our Big Year of Tuning
– applications doing “interesting” things
– stalls during checkpoints with 100GB+ caches
– MongoDB capped collections
•  Encryption
•  Advanced transactional semantics
– updates not stable until confirmed by replica majority
35
WiredTiger LSM support
•  Random insert workloads
•  Data set much larger than cache
•  Query performance less important
•  Background maintenance overhead acceptable
•  Bloom filters
36
37
Benchmarks
Mark Callaghan at Facebook: http://smalldatum.blogspot.com/
Thanks!
Questions?
Keith Bostic
keith.bostic@mongodb.com
Ad

More Related Content

What's hot (20)

Using LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache ArrowUsing LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache Arrow
DataWorks Summit
 
InnoDB MVCC Architecture (by 권건우)
InnoDB MVCC Architecture (by 권건우)InnoDB MVCC Architecture (by 권건우)
InnoDB MVCC Architecture (by 권건우)
I Goo Lee.
 
Bloat and Fragmentation in PostgreSQL
Bloat and Fragmentation in PostgreSQLBloat and Fragmentation in PostgreSQL
Bloat and Fragmentation in PostgreSQL
Masahiko Sawada
 
Monitoring Oracle Database Instances with Zabbix
Monitoring Oracle Database Instances with ZabbixMonitoring Oracle Database Instances with Zabbix
Monitoring Oracle Database Instances with Zabbix
Gerger
 
The InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQLThe InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQL
Morgan Tocker
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
MongodB Internals
MongodB InternalsMongodB Internals
MongodB Internals
Norberto Leite
 
TiDBのトランザクション
TiDBのトランザクションTiDBのトランザクション
TiDBのトランザクション
Akio Mitobe
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detail
MIJIN AN
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
EDB
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
MongoDB
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
Databricks
 
PostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsPostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability Methods
Mydbops
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
hadooparchbook
 
Tuning Autovacuum in Postgresql
Tuning Autovacuum in PostgresqlTuning Autovacuum in Postgresql
Tuning Autovacuum in Postgresql
Mydbops
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
Alexey Grishchenko
 
mongodb와 mysql의 CRUD 연산의 성능 비교
mongodb와 mysql의 CRUD 연산의 성능 비교mongodb와 mysql의 CRUD 연산의 성능 비교
mongodb와 mysql의 CRUD 연산의 성능 비교
Woo Yeong Choi
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
Jurriaan Persyn
 
Consumer offset management in Kafka
Consumer offset management in KafkaConsumer offset management in Kafka
Consumer offset management in Kafka
Joel Koshy
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Mike Dirolf
 
Using LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache ArrowUsing LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache Arrow
DataWorks Summit
 
InnoDB MVCC Architecture (by 권건우)
InnoDB MVCC Architecture (by 권건우)InnoDB MVCC Architecture (by 권건우)
InnoDB MVCC Architecture (by 권건우)
I Goo Lee.
 
Bloat and Fragmentation in PostgreSQL
Bloat and Fragmentation in PostgreSQLBloat and Fragmentation in PostgreSQL
Bloat and Fragmentation in PostgreSQL
Masahiko Sawada
 
Monitoring Oracle Database Instances with Zabbix
Monitoring Oracle Database Instances with ZabbixMonitoring Oracle Database Instances with Zabbix
Monitoring Oracle Database Instances with Zabbix
Gerger
 
The InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQLThe InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQL
Morgan Tocker
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
TiDBのトランザクション
TiDBのトランザクションTiDBのトランザクション
TiDBのトランザクション
Akio Mitobe
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detail
MIJIN AN
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
EDB
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
MongoDB
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
Databricks
 
PostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsPostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability Methods
Mydbops
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
hadooparchbook
 
Tuning Autovacuum in Postgresql
Tuning Autovacuum in PostgresqlTuning Autovacuum in Postgresql
Tuning Autovacuum in Postgresql
Mydbops
 
mongodb와 mysql의 CRUD 연산의 성능 비교
mongodb와 mysql의 CRUD 연산의 성능 비교mongodb와 mysql의 CRUD 연산의 성능 비교
mongodb와 mysql의 CRUD 연산의 성능 비교
Woo Yeong Choi
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
Jurriaan Persyn
 
Consumer offset management in Kafka
Consumer offset management in KafkaConsumer offset management in Kafka
Consumer offset management in Kafka
Joel Koshy
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Mike Dirolf
 

Similar to A Technical Introduction to WiredTiger (20)

MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage Engine
MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage EngineMongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage Engine
MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage Engine
MongoDB
 
https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...
https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...
https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...
MongoDB
 
WiredTiger & What's New in 3.0
WiredTiger & What's New in 3.0WiredTiger & What's New in 3.0
WiredTiger & What's New in 3.0
MongoDB
 
Innodb 和 XtraDB 结构和性能优化
Innodb 和 XtraDB 结构和性能优化Innodb 和 XtraDB 结构和性能优化
Innodb 和 XtraDB 结构和性能优化
YUCHENG HU
 
Let the Tiger Roar - MongoDB 3.0
Let the Tiger Roar - MongoDB 3.0Let the Tiger Roar - MongoDB 3.0
Let the Tiger Roar - MongoDB 3.0
Norberto Leite
 
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Alluxio, Inc.
 
What'sNnew in 3.0 Webinar
What'sNnew in 3.0 WebinarWhat'sNnew in 3.0 Webinar
What'sNnew in 3.0 Webinar
MongoDB
 
InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)
Ontico
 
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Lars Marowsky-Brée
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
Andriy Zabavskyy
 
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Community
 
Running MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSRunning MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWS
MongoDB
 
9.6_Course Material-Postgresql_002.pdf
9.6_Course Material-Postgresql_002.pdf9.6_Course Material-Postgresql_002.pdf
9.6_Course Material-Postgresql_002.pdf
sreedb2
 
InnoDB Architecture and Performance Optimization, Peter Zaitsev
InnoDB Architecture and Performance Optimization, Peter ZaitsevInnoDB Architecture and Performance Optimization, Peter Zaitsev
InnoDB Architecture and Performance Optimization, Peter Zaitsev
Fuenteovejuna
 
MongoDB 3.0 and WiredTiger (Event: An Evening with MongoDB Dallas 3/10/15)
MongoDB 3.0 and WiredTiger (Event: An Evening with MongoDB Dallas 3/10/15)MongoDB 3.0 and WiredTiger (Event: An Evening with MongoDB Dallas 3/10/15)
MongoDB 3.0 and WiredTiger (Event: An Evening with MongoDB Dallas 3/10/15)
MongoDB
 
Beyond the Basics 1: Storage Engines
Beyond the Basics 1: Storage Engines	Beyond the Basics 1: Storage Engines
Beyond the Basics 1: Storage Engines
MongoDB
 
SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)
Lars Marowsky-Brée
 
whyPostgres, a presentation on the project choice for a storage system
whyPostgres, a presentation on the project choice for a storage systemwhyPostgres, a presentation on the project choice for a storage system
whyPostgres, a presentation on the project choice for a storage system
amrshams2015as
 
Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014
marvin herrera
 
071410 sun a_1515_feldman_stephen
071410 sun a_1515_feldman_stephen071410 sun a_1515_feldman_stephen
071410 sun a_1515_feldman_stephen
Steve Feldman
 
MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage Engine
MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage EngineMongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage Engine
MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage Engine
MongoDB
 
https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...
https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...
https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...
MongoDB
 
WiredTiger & What's New in 3.0
WiredTiger & What's New in 3.0WiredTiger & What's New in 3.0
WiredTiger & What's New in 3.0
MongoDB
 
Innodb 和 XtraDB 结构和性能优化
Innodb 和 XtraDB 结构和性能优化Innodb 和 XtraDB 结构和性能优化
Innodb 和 XtraDB 结构和性能优化
YUCHENG HU
 
Let the Tiger Roar - MongoDB 3.0
Let the Tiger Roar - MongoDB 3.0Let the Tiger Roar - MongoDB 3.0
Let the Tiger Roar - MongoDB 3.0
Norberto Leite
 
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Alluxio, Inc.
 
What'sNnew in 3.0 Webinar
What'sNnew in 3.0 WebinarWhat'sNnew in 3.0 Webinar
What'sNnew in 3.0 Webinar
MongoDB
 
InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)
Ontico
 
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Lars Marowsky-Brée
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
Andriy Zabavskyy
 
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Community
 
Running MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSRunning MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWS
MongoDB
 
9.6_Course Material-Postgresql_002.pdf
9.6_Course Material-Postgresql_002.pdf9.6_Course Material-Postgresql_002.pdf
9.6_Course Material-Postgresql_002.pdf
sreedb2
 
InnoDB Architecture and Performance Optimization, Peter Zaitsev
InnoDB Architecture and Performance Optimization, Peter ZaitsevInnoDB Architecture and Performance Optimization, Peter Zaitsev
InnoDB Architecture and Performance Optimization, Peter Zaitsev
Fuenteovejuna
 
MongoDB 3.0 and WiredTiger (Event: An Evening with MongoDB Dallas 3/10/15)
MongoDB 3.0 and WiredTiger (Event: An Evening with MongoDB Dallas 3/10/15)MongoDB 3.0 and WiredTiger (Event: An Evening with MongoDB Dallas 3/10/15)
MongoDB 3.0 and WiredTiger (Event: An Evening with MongoDB Dallas 3/10/15)
MongoDB
 
Beyond the Basics 1: Storage Engines
Beyond the Basics 1: Storage Engines	Beyond the Basics 1: Storage Engines
Beyond the Basics 1: Storage Engines
MongoDB
 
SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)
Lars Marowsky-Brée
 
whyPostgres, a presentation on the project choice for a storage system
whyPostgres, a presentation on the project choice for a storage systemwhyPostgres, a presentation on the project choice for a storage system
whyPostgres, a presentation on the project choice for a storage system
amrshams2015as
 
Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014
marvin herrera
 
071410 sun a_1515_feldman_stephen
071410 sun a_1515_feldman_stephen071410 sun a_1515_feldman_stephen
071410 sun a_1515_feldman_stephen
Steve Feldman
 
Ad

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
Ad

Recently uploaded (20)

UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Learn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step GuideLearn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step Guide
Marcel David
 
Asthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdfAsthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdf
VanessaRaudez
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Automation Dreamin': Capture User Feedback From Anywhere
Automation Dreamin': Capture User Feedback From AnywhereAutomation Dreamin': Capture User Feedback From Anywhere
Automation Dreamin': Capture User Feedback From Anywhere
Lynda Kane
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Automation Dreamin' 2022: Sharing Some Gratitude with Your Users
Automation Dreamin' 2022: Sharing Some Gratitude with Your UsersAutomation Dreamin' 2022: Sharing Some Gratitude with Your Users
Automation Dreamin' 2022: Sharing Some Gratitude with Your Users
Lynda Kane
 
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
Lynda Kane
 
Image processinglab image processing image processing
Image processinglab image processing  image processingImage processinglab image processing  image processing
Image processinglab image processing image processing
RaghadHany
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Leading AI Innovation As A Product Manager - Michael Jidael
Leading AI Innovation As A Product Manager - Michael JidaelLeading AI Innovation As A Product Manager - Michael Jidael
Leading AI Innovation As A Product Manager - Michael Jidael
Michael Jidael
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Learn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step GuideLearn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step Guide
Marcel David
 
Asthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdfAsthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdf
VanessaRaudez
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Automation Dreamin': Capture User Feedback From Anywhere
Automation Dreamin': Capture User Feedback From AnywhereAutomation Dreamin': Capture User Feedback From Anywhere
Automation Dreamin': Capture User Feedback From Anywhere
Lynda Kane
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Automation Dreamin' 2022: Sharing Some Gratitude with Your Users
Automation Dreamin' 2022: Sharing Some Gratitude with Your UsersAutomation Dreamin' 2022: Sharing Some Gratitude with Your Users
Automation Dreamin' 2022: Sharing Some Gratitude with Your Users
Lynda Kane
 
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
Lynda Kane
 
Image processinglab image processing image processing
Image processinglab image processing  image processingImage processinglab image processing  image processing
Image processinglab image processing image processing
RaghadHany
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Leading AI Innovation As A Product Manager - Michael Jidael
Leading AI Innovation As A Product Manager - Michael JidaelLeading AI Innovation As A Product Manager - Michael Jidael
Leading AI Innovation As A Product Manager - Michael Jidael
Michael Jidael
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 

A Technical Introduction to WiredTiger

  • 2. Presenter •  Keith Bostic •  Co-architect WiredTiger •  Senior Staff Engineer MongoDB Ask questions as we go, or [email protected]
  • 3. 3 This presentation is not… •  How to write stand-alone WiredTiger apps – contact [email protected] •  How to configure MongoDB with WiredTiger for your workload
  • 4. 4 WiredTiger •  Embedded database engine – general purpose toolkit – high performing: scalable throughput with low latency •  Key-value store (NoSQL) •  Schema layer – data typing, indexes •  Single-node •  OO APIs – Python, C, C++, Java •  Open Source
  • 5. 5 Deployments •  Amazon AWS •  ORC/Tbricks: financial trading solution And, most important of all: •  MongoDB: next-generation document store
  • 6. You may have seen this:
  • 8. 8 MongoDB’s Storage Engine API •  Allows different storage engines to "plug-in" – different workloads have different performance characteristics – mmapV1 is not ideal for all workloads – more flexibility •  mix storage engines on same replica set/sharded cluster •  Opportunity to innovate further – HDFS, encrypted, other workloads •  WiredTiger is MongoDB’s general-purpose workhorse
  • 9. Topics Ø  WiredTiger Architecture •  In-memory performance •  Record-level concurrency •  Compression •  Durability and the journal •  Future features
  • 10. 10 Motivation for WiredTiger •  Traditional engines struggle with modern hardware: – lots of CPU cores – lots of RAM •  Avoid thread contention for resources – lock-free algorithms, for example, hazard pointers – concurrency control without blocking •  Hotter cache, more work per I/O – big blocks – compact file formats
  • 11. 11 WiredTiger Architecture WiredTiger Engine Schema & Cursors Python API C API Java API Database Files Transactions Page read/write Logging Column storage Block management Row storage Snapshots Log Files Cache
  • 12. 12 Column-store, LSM •  Column-store – implemented inside the B+tree – 64-bit record number keys – valued by the key’s position in the tree – variable-length or fixed-length •  LSM – forest of B+trees (row-store or column-store) – bloom filters (fixed-length column-store) •  Mix-and-match – sparse, wide table: column-store primary, LSM indexes
  • 13. Topics ü  WiredTiger Architecture Ø  In-memory performance •  Record-level concurrency •  Compression •  Durability and the journal •  Future features
  • 14. 14 Trees in cache non-resident child ordinary pointer root page internal page internal page root page leaf page leaf page leaf page leaf page
  • 15. 15 Hazard Pointers non-resident child ordinary pointer root page internal page internal page root page leaf page leaf page leaf page leaf page 1 memory flush
  • 16. 16 Pages in cache cache data files page images on-disk page image index clean page on-disk page image indexdirty page updates
  • 17. 17 Skiplists •  Updates stored in skiplists – ordered linked lists with forward “skip” pointers •  William Pugh, 1989 – simpler, as fast as binary-search, less space – likely binary-search performance plus cache prefetch – more space for an existing data set •  Implementation – insert without locking – forward/backward traversal without locking, while inserting – removal requires locking
  • 18. 18 In-memory performance •  Cache trees/pages optimized for in-memory access •  Follow pointers to traverse a tree •  No locking to read or write •  Keep updates separate from initial data – updates are stored in skiplists – updates are atomic in almost all cases •  Do structural changes (eviction, splits) in background threads
  • 19. Topics ü  WiredTiger Architecture ü  In-memory performance Ø  Record-level concurrency •  Compression •  Durability and the journal •  Future features
  • 20. 20 Multiversion Concurrency Control (MVCC) •  Multiple versions of records maintained in cache •  Readers see most recently committed version – read-uncommitted or snapshot isolation available – configurable per-transaction or per-handle •  Writers can create new versions concurrent with readers •  Concurrent updates to a single record cause write conflicts – one of the updates wins – other generally retries with back-off •  No locking, no lock manager
  • 21. 21 Pages in cache cache data files page images on-disk page image index clean page on-disk page image indexdirty page updates skiplist
  • 22. 22 MVCC In Action on-disk page image index update1 (txn, value) on-disk page image index update2 (txn, value) update1 (txn, value) update on-disk page image index update
  • 23. Topics ü  WiredTiger Architecture ü  In-memory performance ü  Record-level concurrency Ø  Compression •  Durability and the journal •  Future features
  • 24. 24 Block manager •  Block allocation – fragmentation – allocation policy •  Checksums – block compression is at a higher level •  Checkpoints – involved in durability guarantees •  Opaque address cookie – stored as internal page key’s “value” •  Pluggable
  • 25. 25 Write path cache data files page images on-disk page image index clean page on-disk page image indexdirty page updates reconciled during write
  • 26. 26 In-memory Compression •  Prefix compression – index keys usually have a common prefix – rolling, per-block, requires instantiation for performance •  Huffman/static encoding – burns CPU •  Dictionary lookup – single value per page •  Run-length encoding – column-store values
  • 27. 27 On-disk Compression •  Compression algorithms: – snappy [default]: good compression, low overhead – LZ4: good compression, low overhead, better page layout – zlib: better compression, high overhead – pluggable •  Optional – compressing filesystem instead
  • 29. Topics ü  WiredTiger Architecture ü  In-memory performance ü  Record-level concurrency ü  Compression Ø  Durability and the journal •  Future features
  • 30. 30 Journal and Recovery •  Write-ahead logging (aka journal) enabled by default •  Only written at transaction commit – only write redo records •  Log records are compressed •  Group commit for concurrency •  Automatic log archival / removal – bounded by checkpoint frequency •  On startup, find a consistent checkpoint in the metadata – use the checkpoint to figure out how much to roll forward
  • 31. 31 Durability without Journaling •  MongoDB’s MMAP storage requires the journal for consistency – running with “nojournal” is unsafe •  WiredTiger is a no-overwrite data store – with “nojournal”, updates since the last checkpoint may be lost – data will still be consistent – checkpoints every N seconds by default •  Replication can guarantee durability – the network is generally faster than disk I/O
  • 32. Topics ü  WiredTiger Architecture ü  In-memory performance ü  Record-level concurrency ü  Compression ü  Durability and the journal Ø  Future features
  • 33. 33
  • 34. 34 What’s next for WiredTiger? •  Our Big Year of Tuning – applications doing “interesting” things – stalls during checkpoints with 100GB+ caches – MongoDB capped collections •  Encryption •  Advanced transactional semantics – updates not stable until confirmed by replica majority
  • 35. 35 WiredTiger LSM support •  Random insert workloads •  Data set much larger than cache •  Query performance less important •  Background maintenance overhead acceptable •  Bloom filters
  • 36. 36
  • 37. 37 Benchmarks Mark Callaghan at Facebook: http://smalldatum.blogspot.com/

Editor's Notes

  • #3: Feel free to ask questions as we go, hopefully there will be a few minutes for Q&A at the end. Also happy to discuss by email, let me know how we can help.
  • #5: Build a toolkit: one path is to build special-purpose engines to handle specific workloads, another path is to handle complex/changing workloads.
  • #8: This kind of positive feedback isn’t common in engineering groups.
  • #10: The structure for the rest of the talk.
  • #11: Traditional storage engine designs struggle with modern hardware. I/O, in relationship to memory, a worse outcome than ever before: trade CPU for I/O wherever possible. Big block I/O: if we have to do I/O, bring in a lot of data.
  • #12: Moderately complex inside. Outside APIs, handle + method Top-level schema layer, where every table and associated indexes Operations are transactionally protected, implemented in terms of in-memory snapshots. Operations are to a row-store engine (ordered key/value pair), column-store (key is a 64-bit record number) Block management is intended to be pluggable itself. On disk, key/value pair files and log files.
  • #13: From now on, I’m going to mostly be talking about “trees” in-memory, without distinguishing what type of tree – here’s the overview, after which it’s just a page in-memory.
  • #14: WiredTiger focuses on in-memory performance: I/O means you’ve lost the war, you’re only trying to retreat gracefully.
  • #15: WiredTiger does have root pages and internal pages, with leaf pages at the bottom: binary search of each page yields the child page for the subsequent search. Importantly, pointers in memory are not disk offsets (in many engines in-memory objects find each other use disk offsets, so, for example, a transition from an internal page to a leaf page implies giving the cache subsystem a disk offset, and the cache returns the in-memory pointer, possibly after a read. The WiredTiger in-memory tree is exactly that, it’s an in-memory optimized tree. This is good (fast in-memory operations), this is bad (we have a translation step after reading, and before writing, disk blocks). We knew we wanted to modify our in-memory representation without changing our on-disk format (avoid upgrades!), and we knew a lot of our compression algorithms would require translation before writing anyway (for example, our on-disk pages have no indexing information, saves about 15-20% of the disk footprint in some workloads).
  • #16: To make this efficient, pointers need to be protected: once data is larger than cache, there needs to be a check to ensure a pointer is valid. There’s a background thread doing eviction of pages. Hazard pointers can be thought of as micro-logging. Readers/writers publishes the memory address of a page it wants on a non-shared cache line; after that publish, if the pointer is still valid, can proceed. Eviction threads must check those locations to ensure a page is not currently in use: eviction bears the burden, readers/writers go fast. Design principal: application threads never wait, shift work from application threads to system-internal threads.
  • #17: Different in-memory representation vs on-disk – opportunities to optimize plus some interesting features. Read on-disk page into cache, generally a read-only image. Add indexing information (binary search) on top of that image. Updates are layered on top of that image, including new key/value pairs inserted between existing keys. if the page grows too large, background threads will deal with it. Lots of magic in traversal, threads must go back-and-forth between the original image and the updated image. Writing a page: Combine the previous page image with in-memory changes If the result is too big, split Allocate new space in the file Always write multiples of the allocation size
  • #19: To summarize: Note we haven’t talked about locking at all: application threads can retrieve and update data without every acquiring a lock. Justin Levandoski’s BW-Tree work: they’ve avoided taking pages out of circulation during splits, interesting.
  • #21: Readers don’t block writers, writers don’t block readers, writers don’t block writers, again, no locks have been acquired.
  • #22: If there have been no updates, then it’s easy, the on-page item is the correct item for any query. If there are updates, each update has associated with it a transaction ID, and that transaction ID combined with the transaction’s snapshot, determines the correct value.
  • #23: Index references the original page image, once updates installed readers/writers have to check for updates. The first update in the list is generally the one we want, if it’s not yet visible, other updates are checked. if no update is correct, the value on the original page must be the one we want. All updates done by atomic updates, swapping a new pointer into place, readers concurrent to updates. Writing the page to disk requires processing all of this information, and determining the values to write.
  • #26: Page-Write: transform the in-memory version: selecting values to write, page-splitting, all sorts of compression, checksums. Checkpoints are simply another “snapshot reader”, so they run concurrently with other readers and writers. Writing a page: Combine the previous page image with in-memory changes Allocate new space in the file If the result is too big, split Always write multiples of the allocation size Can configure direct I/O to keep reads and writes out of the filesystem cache
  • #27: All of these apply in-memory and on disk, so we save both on disk and in the cache.
  • #28: in the same code paths, we compress the data.
  • #29: In WiredTiger, compact file sizes, and that certain types of compression cannot be turned off, WiredTiger without “compression”, is 50%.
  • #31: Storage engines are all about not losing your stuff. Pretty standard WAL implementation: before a commit is visible, a log record with all of the changes in the transaction has been flushed to stable storage. Group commit: concurrent log writes are done with a single storage flush. Started with “Scalability of write-ahead logging on multicore and multisocket hardware by Ryan Johnson, Ippokratis Pandis, Radu Stoica, Manos Athanassoulis and Anastasia Ailamaki.”, and there’s lots more engineering there.
  • #32: Checkpoints move from one stable point to another one. Our goal was to build a single-node engine, and for that reason, we had to run without a log without losing durability.
  • #35: Lookaside tables.
  • #37: A very large data-set over time: blue: a btree tails off over time as the probability of a page being found in cache decreases (that’s why the random nature of the insert matters). red/green flatter: only maintain the recent updates in cache, and merge the updates in the background. LSM is write-optimized, though: the reason the btree is primary is that read-mostly workloads generally behave better in a btree than in LSM. What we want to do eventually, is enable the conversion of an LSM tree into a btree (if you think of a forest of btrees collapsing into a single btree, that matches nicely with the typical workload of inserting a lot of data and then processing that data. Ideally, we’d also be able to reverse that process on demand.