MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage Engine

{MongoDB User Group}
Boston
WiredTiger Introduction and Update

Presenter
•  Keith Bostic
•  Senior Staff Engineer MongoDB
Please feel free:
keith.bostic@mongodb.com

3
WiredTiger
•  Embedded database engine
– general purpose toolkit
– high performing: scalable throughput with low latency
•  Standalone API
– key-value store (NoSQL)
– schema layer
– data typing, indexes

4
Deployments
•  Amazon AWS
•  ORC/Tbricks: financial trading solution
And, of course the most important of all:
•  MongoDB: next-generation document store

7
MongoDB’s Storage Engine API
•  Allows different storage engines to "plug-in"
– different workloads have different performance characteristics
– mmapV1 is not ideal for all workloads
– more flexibility
•  mix storage engines on same replica set/sharded cluster
•  Opportunity to innovate further
– HDFS, encrypted, other workloads
•  WiredTiger is MongoDB’s general-purpose workhorse

!  WiredTiger introduction
•  Encryption at rest
•  Pluggable filesystems
•  Queryable restores
•  In-memory storage engine
•  Column-store and LSM

9
Why another engine?
•  Traditional engines struggle with modern hardware:
– lots of CPU cores and lots of RAM, relatively slow I/O
1.  Avoid thread contention for resources
– lock-free algorithms: skiplists, hazard pointers, ticket locks
– concurrency control without blocking
2.  Hotter cache, more work per I/O
– big blocks
– compact file formats, compression

10
In-memory performance
•  Cache trees/pages optimized for in-memory access
•  Follow pointers to traverse a tree
•  No locking to read or write
•  Keep updates separate from initial data
– updates are stored in skiplists
– updates are atomic in almost all cases
•  Do structural changes (eviction, splits) in background threads

11
Multiversion Concurrency Control (MVCC)
•  Multiple versions of records maintained in cache
•  Readers see most recently committed version
– read-uncommitted or snapshot isolation available
– configurable per-transaction or per-cursor
•  Writers can create new versions concurrent with readers
•  Concurrent updates to a single record cause write conflicts
– one of the updates wins
– other generally retries with back-off
•  No locking, no lock manager

12
In-memory Compression
•  Prefix compression
– index keys usually have a common prefix
– rolling, per-block, requires instantiation for performance
•  Huffman/static encoding
– burns CPU
•  Dictionary lookup
– single value per page
•  Run-length encoding
– column-store values

13
On-disk Compression
•  Compression algorithms:
– snappy [default]: good compression, low overhead
– LZ4: good compression, low overhead, better page layout
– zlib: better compression, high overhead
– pluggable
•  Optional
– compressing filesystem instead

14
Compression in Action
Flights database

15
WiredTiger’s big year of tuning
•  Applications doing “interesting” things
•  Stalls during checkpoints with 100GB+ caches
•  Out-of-cache workloads
•  MongoDB capped collections
•  Performance, stability, memory consumption

16
Tickets, lots and lots of JIRA tickets

•  WiredTiger Introduction
!  Encryption at rest

18
Why add encryption to MongoDB?
•  Stop the bad people from reading your stuff!
•  Standards compliance
– FIPS 140-2
– HIPAA/HITECH, FERPA, PCI, SOX, GLBA, ISO 27001, PII

19
Encryption “at rest”
•  Protects data stored on stable storage
– defends against forgetting your laptop on the train
– does not protect data stored in-memory
•  Only one part of a secure solution
– unprotected access to in-memory data
– software bugs remain dangerous
•  Use TLS to encrypt over-the-wire information

20
Encryption implementation
•  Shared secrets maintained by a MongoDB key manager
– KMIP (Key Management Interoperability Protocol)
– key stored in protected file
•  Single master key for each MongoDB database
– master key manipulated in memory
– master key written to swap space

21
Encryption implementation
•  Implemented below the pluggable storage layer
– compatible with compression
– each storage engine has to add support
•  Currently AES-256
– WiredTiger can support multiple encryption algorithms

!  Pluggable filesystems

23
Why pluggable filesystems?
– read-only access to a backup without rewriting MongoDB
•  In-memory filesystems
– avoid a rewrite of the upper-levels of WiredTiger
•  Special-purpose storage devices
– high-density, PCI-attached flash appliance
– non-standard I/O APIs
– non-standard I/O buffers

24
Pluggable filesystem implementation
•  WT_FILE_SYSTEM
– directory contents, file existence test
– file remove, rename, return-size
– file open
•  WT_FILE_HANDLE
– close, read, write, return-size
– map, unmap
– extend, truncate
– flush

!  Queryable restores
•  In-memory deployments

26
Why queryable restores?
•  7TB inactive data sets exist
– where read-only is sufficient
•  Instead of downloading the dataset:
– query for a single document
– Mongodump a single collection
– run a new aggregation on historical data
•  You can run with a real mongod
– existing drivers or connect with a shell

27
Queryable restores
•  Supported for both WiredTiger and mmapV1
•  “QueryableBackupMode”
– read-only mode, disallowing server writes
– unexpectedly useful for accessing damaged databases

!  In-memory storage engine

29
WiredTiger default cache
•  WiredTiger defaults to LRU-style cache eviction
– supports bigger-than-memory workloads
•  Application threads may unexpectedly do I/O
– reads to acquire data not currently in cache
– writes to evict pages when eviction threads can’t keep up
– incompatible with strict latency requirements

30
In-memory storage engine
•  Built on top of WiredTiger
•  Data populated on startup, no subsequent reads or writes
•  Durability provided by another node in the replica set

!  Column-store and LSM

32
Column-store
•  Row-store
–  key and some number of columns
•  Column-store
–  key + column[2], column[3]; key + column[1], column[4-N]
–  cache is hotter, retrieval faster
–  row-retrieval is slower
•  Column-store
–  64-bit record number keys
–  variable-length or fixed-length records
–  run-length encoding for better compression

33
LSM
•  B+tree
– when small, random inserts are fast
– when large, random inserts are slow
•  LSM
– forest of B+trees
– bloom filters
•  Mix-and-match
– sparse, wide table: column-store primary, LSM indexes

Thanks!
Questions?
keith.bostic@mongodb.com

MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage Engine

Recommended

More Related Content

What's hot (20)

Viewers also liked (16)

Similar to MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage Engine (20)

More from MongoDB (20)

MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage Engine