SlideShare a Scribd company logo
Project Voldemort
    Jay Kreps




         19/11/09   1
The Plan

   1. Motivation
   2. Core Concepts
   3. Implementation
   4. In Practice
   5. Results
Motivation
The Team

   •  LinkedIn’s Search, Network, and
      Analytics Team
      •  Project Voldemort
      •  Search Infrastructure: Zoie, Bobo, etc
      •  LinkedIn’s Hadoop system
      •  Recommendation Engine
      •  Data intensive features
         •  People you may know
         •  Who’s viewed my profile
         •  User history service
The Idea of the Relational Database
The Reality of a Modern Web Site
Why did this happen?

•  The internet centralizes computation
•  Specialized systems are efficient (10-100x)
    •  Search: Inverted index
    •  Offline: Hadoop, Terradata, Oracle DWH
    •  Memcached
    •  In memory systems (social graph)
•  Specialized system are scalable
•  New data and problems
    •  Graphs, sequences, and text
Services and Scale Break Relational DBs


•  No joins
•  Lots of denormalization
•  ORM is less helpful
•  No constraints, triggers, etc
•  Caching => key/value model
•  Latency is key
Two Cheers For Relational Databases

•  The relational model is a triumph of computer
   science:
    •  General
    •  Concise
    •  Well understood
•  But then again:
    •  SQL is a pain
    •  Hard to build re-usable data structures
    •  Don’t hide the memory hierarchy!
       Good: Filesystem API
       Bad: SQL, some RPCs
Other Considerations

•  Who is responsible for performance (engineers?
DBA? site operations?)
•  Can you do capacity planning?
•  Can you simulate the problem early in the design
phase?
•  How do you do upgrades?
•  Can you mock your database?
Some motivating factors

•  This is a latency-oriented system
•  Data set is large and persistent
     •  Cannot be all in memory
•  Performance considerations
     •  Partition data
     •  Delay writes
     •  Eliminate network hops
•  80% of caching tiers are fixing problems that shouldn’t
exist
•  Need control over system availability and data durability
     •  Must replicate data on multiple machines
•  Cost of scalability can’t be too high
Inspired By Amazon Dynamo & Memcached

•  Amazon’s Dynamo storage system
    •  Works across data centers
    •  Eventual consistency
    •  Commodity hardware
    •  Not too hard to build
  Memcached
    –  Actually works
    –  Really fast
    –  Really simple
  Decisions:
    –  Multiple reads/writes
    –  Consistent hashing for data distribution
    –  Key-Value model
    –  Data versioning
Priorities

1.  Performance and scalability
2.  Actually works
3.  Community
4.  Data consistency
5.  Flexible & Extensible
6.  Everything else
Why Is This Hard?

•  Failures in a distributed system are much more
   complicated
   •  A can talk to B does not imply B can talk to A
   •  A can talk to B does not imply C can talk to B
•  Getting a consistent view of the cluster is as hard as
   getting a consistent view of the data
•  Nodes will fail and come back to life with stale data
•  I/O has high request latency variance
•  I/O on commodity disks is even worse
•  Intermittent failures are common
•  User must be isolated from these problems
•  There are fundamental trade-offs between availability and
   consistency
Core Concepts
Core Concepts - I


     ACID
       –  Great for single centralized server.
     CAP Theorem
       –     Consistency (Strict), Availability , Partition Tolerance
       –     Impossible to achieve all three at same time in distributed platform
       –     Can choose 2 out of 3
       –     Dynamo chooses High Availability and Partition Tolerance
              by sacrificing Strict Consistency to Eventual consistency

     Consistency Models
       –  Strict consistency
              2 Phase Commits
              PAXOS : distributed algorithm to ensure quorum for consistency
       –  Eventual consistency
              Different nodes can have different views of value
              In a steady state system will return last written value.
              BUT Can have much strong guarantees.


Proprietary & Confidential                              19/11/09                    16
Core Concept - II


     Consistent Hashing
     Key space is Partitioned
       –  Many small partitions
     Partitions never change
       –  Partitions ownership can change
     Replication
       –  Each partition is stored by ‘N’ nodes
     Node Failures
       –  Transient (short term)
       –  Long term
              Needs faster bootstrapping




Proprietary & Confidential                        19/11/09   17
Core Concept - III


   •  N - The replication factor
   •  R - The number of blocking reads
   •  W - The number of blocking writes
   •  If             R+W > N
        •     then we have a quorum-like algorithm
        •     Guarantees that we will read latest writes OR fail
   •  R, W, N can be tuned for different use cases
        •     W = 1, Highly available writes
        •     R = 1, Read intensive workloads
        •     Knobs to tune performance, durability and availability




Proprietary & Confidential                        19/11/09             18
Core Concepts - IV


   •  Vector Clock [Lamport] provides way to order events in a
      distributed system.
   •  A vector clock is a tuple {t1 , t2 , ..., tn } of counters.
   •  Each value update has a master node
       •  When data is written with master node i, it increments ti.
       •  All the replicas will receive the same version
       •  Helps resolving consistency between writes on multiple replicas
   •  If you get network partitions
       •  You can have a case where two vector clocks are not comparable.
       •  In this case Voldemort returns both values to clients for conflict resolution




Proprietary & Confidential                      19/11/09                                  19
Implementation
Voldemort Design
Client API

•  Data is organized into “stores”, i.e. tables
•  Key-value only
    •  But values can be arbitrarily rich or complex
        •  Maps, lists, nested combinations …
•  Four operations
    •  PUT (K, V)
    •  GET (K)
    •  MULTI-GET (Keys),
    •  DELETE (K, Version)
    •  No Range Scans
Versioning & Conflict Resolution


•  Eventual Consistency allows multiple versions of value
    •  Need a way to understand which value is latest
    •  Need a way to say values are not comparable
•  Solutions
    •  Timestamp
    •  Vector clocks
      •  Provides global ordering.
      •  No locking or blocking necessary
Serialization

•  Really important
   •  Few Considerations
      •  Schema free?
      •  Backward/Forward compatible
      •  Real life data structures
      •  Bytes <=> objects <=> strings?
      •  Size (No XML)
•  Many ways to do it -- we allow anything
   •  Compressed JSON, Protocol Buffers,
      Thrift, Voldemort custom serialization
Routing


•  Routing layer hides lot of complexity
    •  Hashing schema
    •  Replication (N, R , W)
    •  Failures
    •  Read-Repair (online repair mechanism)
    •  Hinted Handoff (Long term recovery mechanism)
•  Easy to add domain specific strategies
    •  E.g. only do synchronous operations on nodes in
       the local data center
•  Client Side / Server Side / Hybrid
Voldemort Physical Deployment
Routing With Failures

•  Failure Detection
    • Requirements
         • Need to be very very fast
         •  View of server state may be inconsistent
                  •  A can talk to B but C cannot
                  •  A can talk to C , B can talk to A but not to C
    •  Currently done by routing layer (request timeouts)
         •  Periodically retries failed nodes.
         •  All requests must have hard SLAs
    • Other possible solutions
         •  Central server
         •  Gossip protocol
         •  Need to look more into this.
Repair Mechanism


     Read Repair
       –  Online repair mechanism
              Routing client receives values from multiple node
              Notify a node if you see an old value
              Only works for keys which are read after failures

     Hinted Handoff
       –  If a write fails write it to any random node
       –  Just mark the write as a special write
       –  Each node periodically tries to get rid of all special entries
     Bootstrapping mechanism (We don’t have it yet)
       –  If a node was down for long time
              Hinted handoff can generate ton of traffic
              Need a better way to bootstrap and clear hinted handoff tables




Proprietary & Confidential                           19/11/09                   28
Network Layer


•  Network is the major bottleneck in many uses
•  Client performance turns out to be harder than server
(client must wait!)
     •  Lots of issue with socket buffer size/socket pool
•  Server is also a Client
•  Two implementations
     •  HTTP + servlet container
     •  Simple socket protocol + custom server
•  HTTP server is great, but http client is 5-10X slower
•  Socket protocol is what we use in production
•  Recently added a non-blocking version of the server
Persistence


•  Single machine key-value storage is a commodity
•  Plugins are better than tying yourself to a single strategy
     •  Different use cases
          •  optimize reads
          •  optimize writes
          •  large vs small values
     •  SSDs may completely change this layer
     •  Better filesystems may completely change this layer
•  Couple of different options
     •  BDB, MySQL and mmap’d file implementations
     •  Berkeley DBs most popular
     •  In memory plugin for testing
•  Btrees are still the best all-purpose structure
•  No flush on write is a huge, huge win
In Practice
LinkedIn problems we wanted to solve

•    Application Examples
      •  People You May Know
      •  Item-Item Recommendations
      •  Member and Company Derived Data
      •  User’s network statistics
      •  Who Viewed My Profile?
      •  Abuse detection
      •  User’s History Service
      •  Relevance data
      •  Crawler detection
      •  Many others have come up since
•    Some data is batch computed and served as read only
•    Some data is very high write load
•    Latency is key
Key-Value Design Example


     How to build a fast, scalable comment system?
     One approach
       –  (post_id, page) => [comment_id_1, comment_id_2, …]
       –  comment_id => comment_body
     GET comment_ids by post and page
     MULTIGET comment bodies
     Threaded, paginated comments left as an exercise 




Proprietary & Confidential              19/11/09               33
Hadoop and Voldemort sitting in a tree…

  Hadoop can generate a lot of data
  Bottleneck 1: Getting the data out of hadoop
  Bottleneck 2: Transfer to DB
  Bottleneck 3: Index building
  We had a critical process where this process took a DBA
   a week to run!
  Index building is a batch operation




                               19/11/09                      34
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Read-only storage engine

    Throughput vs. Latency
    Index building done in Hadoop
    Fully parallel transfer
    Very efficient on-disk structure
    Heavy reliance on OS pagecache
    Rollback!
Voldemort At LinkedIn


•  4 Clusters, 4 teams
     •  Wide variety of data sizes, clients, needs
•  My team:
     •  12 machines
     •  Nice servers
     •  500M operations/day
     •  ~4 billion events in 10 stores (one per event type)
     •  Peak load > 10k operations / second
•  Other teams: news article data, email related data, UI
settings
Results
Some performance numbers

•  Production stats
     •  Median: 0.1 ms
     •  99.9 percentile GET: 3 ms
•  Single node max throughput (1 client node, 1 server
node):
     •  19,384 reads/sec
     •  16,559 writes/sec
•  These numbers are for mostly in-memory problems
Glaring Weaknesses

•  Not nearly enough documentation
•  No online cluster expansion (without reduced
guarantees)
•  Need more clients in other languages (Java,
Python, Ruby, and C++ currently)
•  Better tools for cluster-wide control and
monitoring
State of the Project

•  Active mailing list
•  4-5 regular committers outside LinkedIn
•  Lots of contributors
•  Equal contribution from in and out of LinkedIn
•  Project basics
      •  IRC
      •  Some documentation
      •  Lots more to do
•  > 300 unit tests that run on every checkin (and pass)
•  Pretty clean code
•  Moved to GitHub (by popular demand)
•  Production usage at a half dozen companies
•  Not just a LinkedIn project anymore
•  But LinkedIn is really committed to it (and we are hiring to work on it)
Some new & upcoming things


 •  New
     •  Python, Ruby clients
     •  Non-blocking socket server
     •  Alpha round on online cluster expansion
     •  Read-only store and Hadoop integration
     •  Improved monitoring stats
     •  Distributed testing infrastructure
     •  Compression
 •  Future
     •  Publish/Subscribe model to track changes
     •  Improved failure detection
Socket Server Scalability




Proprietary & Confidential     19/11/09   43
Testing and releases


     Testing “in the cloud”
              Distributed systems have complex failure scenarios
              A storage system, above all, must be stable
              Automated testing allows rapid iteration while maintaining confidence in
               systems’ correctness and stability

     EC2-based testing framework
         Tests are invoked programmatically
         Contributed by Kirk True
         Adaptable to other cloud hosting providers
     Regular releases for new features and bugs
     Trunk stays stable




Proprietary & Confidential                            19/11/09                            44
Shameless promotion

•  Check it out: project-voldemort.com
•  We love getting patches.
•  We kind of love getting bug reports.
•  LinkedIn is hiring, so you can work on this full time.
     •  Email me if interested
     •  jkreps@linkedin.com
The End
Ad

More Related Content

What's hot (20)

How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...
How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...
How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...
HostedbyConfluent
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
confluent
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
confluent
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
Discover Pinterest
 
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
Optimizing Servers for High-Throughput and Low-Latency at DropboxOptimizing Servers for High-Throughput and Low-Latency at Dropbox
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
ScyllaDB
 
Restoring Restoration's Reputation in Kafka Streams with Bruno Cadonna & Luca...
Restoring Restoration's Reputation in Kafka Streams with Bruno Cadonna & Luca...Restoring Restoration's Reputation in Kafka Streams with Bruno Cadonna & Luca...
Restoring Restoration's Reputation in Kafka Streams with Bruno Cadonna & Luca...
HostedbyConfluent
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward
 
Node Labels in YARN
Node Labels in YARNNode Labels in YARN
Node Labels in YARN
DataWorks Summit
 
Sami honkonen scheduling work in kanban
Sami honkonen   scheduling work in kanbanSami honkonen   scheduling work in kanban
Sami honkonen scheduling work in kanban
AGILEMinds
 
Serverless integration with Knative and Apache Camel on Kubernetes
Serverless integration with Knative and Apache Camel on KubernetesServerless integration with Knative and Apache Camel on Kubernetes
Serverless integration with Knative and Apache Camel on Kubernetes
Claus Ibsen
 
Kafka Overview
Kafka OverviewKafka Overview
Kafka Overview
iamtodor
 
Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...
Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...
Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...
Vietnam Open Infrastructure User Group
 
Room 2 - 6 - Đinh TuẼn Phong - Migrate opensource database to Kubernetes easi...
Room 2 - 6 - Đinh TuẼn Phong - Migrate opensource database to Kubernetes easi...Room 2 - 6 - Đinh TuẼn Phong - Migrate opensource database to Kubernetes easi...
Room 2 - 6 - Đinh TuẼn Phong - Migrate opensource database to Kubernetes easi...
Vietnam Open Infrastructure User Group
 
Data Loss and Duplication in Kafka
Data Loss and Duplication in KafkaData Loss and Duplication in Kafka
Data Loss and Duplication in Kafka
Jayesh Thakrar
 
Running Kafka as a Native Binary Using GraalVM with Ozan GĂźnalp
Running Kafka as a Native Binary Using GraalVM with Ozan GĂźnalpRunning Kafka as a Native Binary Using GraalVM with Ozan GĂźnalp
Running Kafka as a Native Binary Using GraalVM with Ozan GĂźnalp
HostedbyConfluent
 
The Nextcloud Roadmap for Secure Team Collaboration
The Nextcloud Roadmap for Secure Team CollaborationThe Nextcloud Roadmap for Secure Team Collaboration
The Nextcloud Roadmap for Secure Team Collaboration
Univention GmbH
 
Introduction to JIRA & Agile Project Management
Introduction to JIRA & Agile Project ManagementIntroduction to JIRA & Agile Project Management
Introduction to JIRA & Agile Project Management
Dan Chuparkoff
 
Introduction to GitHub Actions
Introduction to GitHub ActionsIntroduction to GitHub Actions
Introduction to GitHub Actions
Knoldus Inc.
 
Scrum In Ten Slides
Scrum In Ten SlidesScrum In Ten Slides
Scrum In Ten Slides
pmengal
 
Protocol Buffers and Hadoop at Twitter
Protocol Buffers and Hadoop at TwitterProtocol Buffers and Hadoop at Twitter
Protocol Buffers and Hadoop at Twitter
Kevin Weil
 
How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...
How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...
How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...
HostedbyConfluent
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
confluent
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
confluent
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
Discover Pinterest
 
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
Optimizing Servers for High-Throughput and Low-Latency at DropboxOptimizing Servers for High-Throughput and Low-Latency at Dropbox
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
ScyllaDB
 
Restoring Restoration's Reputation in Kafka Streams with Bruno Cadonna & Luca...
Restoring Restoration's Reputation in Kafka Streams with Bruno Cadonna & Luca...Restoring Restoration's Reputation in Kafka Streams with Bruno Cadonna & Luca...
Restoring Restoration's Reputation in Kafka Streams with Bruno Cadonna & Luca...
HostedbyConfluent
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward
 
Sami honkonen scheduling work in kanban
Sami honkonen   scheduling work in kanbanSami honkonen   scheduling work in kanban
Sami honkonen scheduling work in kanban
AGILEMinds
 
Serverless integration with Knative and Apache Camel on Kubernetes
Serverless integration with Knative and Apache Camel on KubernetesServerless integration with Knative and Apache Camel on Kubernetes
Serverless integration with Knative and Apache Camel on Kubernetes
Claus Ibsen
 
Kafka Overview
Kafka OverviewKafka Overview
Kafka Overview
iamtodor
 
Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...
Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...
Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...
Vietnam Open Infrastructure User Group
 
Room 2 - 6 - Đinh TuẼn Phong - Migrate opensource database to Kubernetes easi...
Room 2 - 6 - Đinh TuẼn Phong - Migrate opensource database to Kubernetes easi...Room 2 - 6 - Đinh TuẼn Phong - Migrate opensource database to Kubernetes easi...
Room 2 - 6 - Đinh TuẼn Phong - Migrate opensource database to Kubernetes easi...
Vietnam Open Infrastructure User Group
 
Data Loss and Duplication in Kafka
Data Loss and Duplication in KafkaData Loss and Duplication in Kafka
Data Loss and Duplication in Kafka
Jayesh Thakrar
 
Running Kafka as a Native Binary Using GraalVM with Ozan GĂźnalp
Running Kafka as a Native Binary Using GraalVM with Ozan GĂźnalpRunning Kafka as a Native Binary Using GraalVM with Ozan GĂźnalp
Running Kafka as a Native Binary Using GraalVM with Ozan GĂźnalp
HostedbyConfluent
 
The Nextcloud Roadmap for Secure Team Collaboration
The Nextcloud Roadmap for Secure Team CollaborationThe Nextcloud Roadmap for Secure Team Collaboration
The Nextcloud Roadmap for Secure Team Collaboration
Univention GmbH
 
Introduction to JIRA & Agile Project Management
Introduction to JIRA & Agile Project ManagementIntroduction to JIRA & Agile Project Management
Introduction to JIRA & Agile Project Management
Dan Chuparkoff
 
Introduction to GitHub Actions
Introduction to GitHub ActionsIntroduction to GitHub Actions
Introduction to GitHub Actions
Knoldus Inc.
 
Scrum In Ten Slides
Scrum In Ten SlidesScrum In Ten Slides
Scrum In Ten Slides
pmengal
 
Protocol Buffers and Hadoop at Twitter
Protocol Buffers and Hadoop at TwitterProtocol Buffers and Hadoop at Twitter
Protocol Buffers and Hadoop at Twitter
Kevin Weil
 

Viewers also liked (9)

Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
Hadoop User Group
 
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
Data Con LA
 
Spark Meetup at Uber
Spark Meetup at UberSpark Meetup at Uber
Spark Meetup at Uber
Databricks
 
Bases de Datos No Relacionales (NoSQL): Cassandra, CouchDB, MongoDB y Neo4j
Bases de Datos No Relacionales (NoSQL): Cassandra, CouchDB, MongoDB y Neo4jBases de Datos No Relacionales (NoSQL): Cassandra, CouchDB, MongoDB y Neo4j
Bases de Datos No Relacionales (NoSQL): Cassandra, CouchDB, MongoDB y Neo4j
Diego LĂłpez-de-IpiĂąa GonzĂĄlez-de-Artaza
 
Enterprise Architectures with Ruby (and Rails)
Enterprise Architectures with Ruby (and Rails)Enterprise Architectures with Ruby (and Rails)
Enterprise Architectures with Ruby (and Rails)
Konstantin Gredeskoul
 
LinkedIn's Q3 Earnings Call
LinkedIn's Q3 Earnings CallLinkedIn's Q3 Earnings Call
LinkedIn's Q3 Earnings Call
LinkedIn
 
LinkedIn’s First Earnings Announcement Deck, Q2 2011
LinkedIn’s First Earnings Announcement Deck, Q2 2011LinkedIn’s First Earnings Announcement Deck, Q2 2011
LinkedIn’s First Earnings Announcement Deck, Q2 2011
LinkedIn
 
Volunteer marketing strategist posting example
Volunteer marketing strategist posting exampleVolunteer marketing strategist posting example
Volunteer marketing strategist posting example
LinkedIn for Good
 
The Book That Changed Me Infographic
The Book That Changed Me InfographicThe Book That Changed Me Infographic
The Book That Changed Me Infographic
LinkedIn
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
Hadoop User Group
 
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
Data Con LA
 
Spark Meetup at Uber
Spark Meetup at UberSpark Meetup at Uber
Spark Meetup at Uber
Databricks
 
Enterprise Architectures with Ruby (and Rails)
Enterprise Architectures with Ruby (and Rails)Enterprise Architectures with Ruby (and Rails)
Enterprise Architectures with Ruby (and Rails)
Konstantin Gredeskoul
 
LinkedIn's Q3 Earnings Call
LinkedIn's Q3 Earnings CallLinkedIn's Q3 Earnings Call
LinkedIn's Q3 Earnings Call
LinkedIn
 
LinkedIn’s First Earnings Announcement Deck, Q2 2011
LinkedIn’s First Earnings Announcement Deck, Q2 2011LinkedIn’s First Earnings Announcement Deck, Q2 2011
LinkedIn’s First Earnings Announcement Deck, Q2 2011
LinkedIn
 
Volunteer marketing strategist posting example
Volunteer marketing strategist posting exampleVolunteer marketing strategist posting example
Volunteer marketing strategist posting example
LinkedIn for Good
 
The Book That Changed Me Infographic
The Book That Changed Me InfographicThe Book That Changed Me Infographic
The Book That Changed Me Infographic
LinkedIn
 
Ad

Similar to Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn (20)

Voldemort Nosql
Voldemort NosqlVoldemort Nosql
Voldemort Nosql
elliando dias
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
Ruben BadarĂł
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Bob Pusateri
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
Jonas BonĂŠr
 
Large-scale projects development (scaling LAMP)
Large-scale projects development (scaling LAMP)Large-scale projects development (scaling LAMP)
Large-scale projects development (scaling LAMP)
Alexey Rybak
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Bob Pusateri
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
xlight
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
liujianrong
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
Roger Xia
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
Don Demcsak
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
Don Demcsak
 
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Bob Pusateri
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
David MartĂ­nez Rego
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
John Adams
 
NoSql
NoSqlNoSql
NoSql
Girish Khanzode
 
PayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL ClusterPayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL Cluster
Mat Keep
 
Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)
Ben Stopford
 
High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2
ScribbleLive
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
SpringPeople
 
Voldemort Nosql
Voldemort NosqlVoldemort Nosql
Voldemort Nosql
elliando dias
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
Ruben BadarĂł
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Bob Pusateri
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
Jonas BonĂŠr
 
Large-scale projects development (scaling LAMP)
Large-scale projects development (scaling LAMP)Large-scale projects development (scaling LAMP)
Large-scale projects development (scaling LAMP)
Alexey Rybak
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Bob Pusateri
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
xlight
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
liujianrong
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
Roger Xia
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
Don Demcsak
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
Don Demcsak
 
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Bob Pusateri
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
David MartĂ­nez Rego
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
John Adams
 
PayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL ClusterPayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL Cluster
Mat Keep
 
Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)
Ben Stopford
 
High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2
ScribbleLive
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
SpringPeople
 
Ad

More from LinkedIn (20)

How LinkedIn is Transforming Businesses
How LinkedIn is Transforming BusinessesHow LinkedIn is Transforming Businesses
How LinkedIn is Transforming Businesses
LinkedIn
 
Networking on LinkedIn 101
Networking on LinkedIn 101Networking on LinkedIn 101
Networking on LinkedIn 101
LinkedIn
 
5 تحديثات على ملفك في 5 دقائق
5 تحديثات على ملفك في 5 دقائق5 تحديثات على ملفك في 5 دقائق
5 تحديثات على ملفك في 5 دقائق
LinkedIn
 
5 LinkedIn Profile Updates in 5 Minutes
5 LinkedIn Profile Updates in 5 Minutes5 LinkedIn Profile Updates in 5 Minutes
5 LinkedIn Profile Updates in 5 Minutes
LinkedIn
 
The Student's Guide to LinkedIn
The Student's Guide to LinkedInThe Student's Guide to LinkedIn
The Student's Guide to LinkedIn
LinkedIn
 
The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017
LinkedIn
 
Accelerating LinkedIn’s Vision Through Innovation
Accelerating LinkedIn’s Vision Through InnovationAccelerating LinkedIn’s Vision Through Innovation
Accelerating LinkedIn’s Vision Through Innovation
LinkedIn
 
How To Tell Your #workstory
How To Tell Your #workstoryHow To Tell Your #workstory
How To Tell Your #workstory
LinkedIn
 
LinkedIn Q1 2016 Earnings Call
LinkedIn Q1 2016 Earnings CallLinkedIn Q1 2016 Earnings Call
LinkedIn Q1 2016 Earnings Call
LinkedIn
 
The 2016 LinkedIn Job Search Guide
The 2016 LinkedIn Job Search GuideThe 2016 LinkedIn Job Search Guide
The 2016 LinkedIn Job Search Guide
LinkedIn
 
LinkedIn Q4 2015 Earnings Call
LinkedIn Q4 2015 Earnings CallLinkedIn Q4 2015 Earnings Call
LinkedIn Q4 2015 Earnings Call
LinkedIn
 
Banish The Buzzwords
Banish The BuzzwordsBanish The Buzzwords
Banish The Buzzwords
LinkedIn
 
LinkedIn Bring In Your Parents Day 2015 - Your Parents' Best Career Advice
LinkedIn Bring In Your Parents Day 2015 - Your Parents' Best Career AdviceLinkedIn Bring In Your Parents Day 2015 - Your Parents' Best Career Advice
LinkedIn Bring In Your Parents Day 2015 - Your Parents' Best Career Advice
LinkedIn
 
LinkedIn Q3 2015 Earnings Call
LinkedIn Q3 2015 Earnings CallLinkedIn Q3 2015 Earnings Call
LinkedIn Q3 2015 Earnings Call
LinkedIn
 
LinkedIn Economic Graph Research: Toronto
LinkedIn Economic Graph Research: TorontoLinkedIn Economic Graph Research: Toronto
LinkedIn Economic Graph Research: Toronto
LinkedIn
 
Freelancers Are LinkedIn Power Users [Infographic]
Freelancers Are LinkedIn Power Users [Infographic]Freelancers Are LinkedIn Power Users [Infographic]
Freelancers Are LinkedIn Power Users [Infographic]
LinkedIn
 
Top Industries for Freelancers on LinkedIn [Infographic]
Top Industries for Freelancers on LinkedIn [Infographic]Top Industries for Freelancers on LinkedIn [Infographic]
Top Industries for Freelancers on LinkedIn [Infographic]
LinkedIn
 
LinkedIn Quiz: Which Parent Are You When It Comes to Helping Guide Your Child...
LinkedIn Quiz: Which Parent Are You When It Comes to Helping Guide Your Child...LinkedIn Quiz: Which Parent Are You When It Comes to Helping Guide Your Child...
LinkedIn Quiz: Which Parent Are You When It Comes to Helping Guide Your Child...
LinkedIn
 
LinkedIn Connect to Opportunity™ -- Stories of Discovery
LinkedIn Connect to Opportunity™ -- Stories of DiscoveryLinkedIn Connect to Opportunity™ -- Stories of Discovery
LinkedIn Connect to Opportunity™ -- Stories of Discovery
LinkedIn
 
LinkedIn Q2 2015 Earnings Call
LinkedIn Q2 2015 Earnings CallLinkedIn Q2 2015 Earnings Call
LinkedIn Q2 2015 Earnings Call
LinkedIn
 
How LinkedIn is Transforming Businesses
How LinkedIn is Transforming BusinessesHow LinkedIn is Transforming Businesses
How LinkedIn is Transforming Businesses
LinkedIn
 
Networking on LinkedIn 101
Networking on LinkedIn 101Networking on LinkedIn 101
Networking on LinkedIn 101
LinkedIn
 
5 تحديثات على ملفك في 5 دقائق
5 تحديثات على ملفك في 5 دقائق5 تحديثات على ملفك في 5 دقائق
5 تحديثات على ملفك في 5 دقائق
LinkedIn
 
5 LinkedIn Profile Updates in 5 Minutes
5 LinkedIn Profile Updates in 5 Minutes5 LinkedIn Profile Updates in 5 Minutes
5 LinkedIn Profile Updates in 5 Minutes
LinkedIn
 
The Student's Guide to LinkedIn
The Student's Guide to LinkedInThe Student's Guide to LinkedIn
The Student's Guide to LinkedIn
LinkedIn
 
The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017
LinkedIn
 
Accelerating LinkedIn’s Vision Through Innovation
Accelerating LinkedIn’s Vision Through InnovationAccelerating LinkedIn’s Vision Through Innovation
Accelerating LinkedIn’s Vision Through Innovation
LinkedIn
 
How To Tell Your #workstory
How To Tell Your #workstoryHow To Tell Your #workstory
How To Tell Your #workstory
LinkedIn
 
LinkedIn Q1 2016 Earnings Call
LinkedIn Q1 2016 Earnings CallLinkedIn Q1 2016 Earnings Call
LinkedIn Q1 2016 Earnings Call
LinkedIn
 
The 2016 LinkedIn Job Search Guide
The 2016 LinkedIn Job Search GuideThe 2016 LinkedIn Job Search Guide
The 2016 LinkedIn Job Search Guide
LinkedIn
 
LinkedIn Q4 2015 Earnings Call
LinkedIn Q4 2015 Earnings CallLinkedIn Q4 2015 Earnings Call
LinkedIn Q4 2015 Earnings Call
LinkedIn
 
Banish The Buzzwords
Banish The BuzzwordsBanish The Buzzwords
Banish The Buzzwords
LinkedIn
 
LinkedIn Bring In Your Parents Day 2015 - Your Parents' Best Career Advice
LinkedIn Bring In Your Parents Day 2015 - Your Parents' Best Career AdviceLinkedIn Bring In Your Parents Day 2015 - Your Parents' Best Career Advice
LinkedIn Bring In Your Parents Day 2015 - Your Parents' Best Career Advice
LinkedIn
 
LinkedIn Q3 2015 Earnings Call
LinkedIn Q3 2015 Earnings CallLinkedIn Q3 2015 Earnings Call
LinkedIn Q3 2015 Earnings Call
LinkedIn
 
LinkedIn Economic Graph Research: Toronto
LinkedIn Economic Graph Research: TorontoLinkedIn Economic Graph Research: Toronto
LinkedIn Economic Graph Research: Toronto
LinkedIn
 
Freelancers Are LinkedIn Power Users [Infographic]
Freelancers Are LinkedIn Power Users [Infographic]Freelancers Are LinkedIn Power Users [Infographic]
Freelancers Are LinkedIn Power Users [Infographic]
LinkedIn
 
Top Industries for Freelancers on LinkedIn [Infographic]
Top Industries for Freelancers on LinkedIn [Infographic]Top Industries for Freelancers on LinkedIn [Infographic]
Top Industries for Freelancers on LinkedIn [Infographic]
LinkedIn
 
LinkedIn Quiz: Which Parent Are You When It Comes to Helping Guide Your Child...
LinkedIn Quiz: Which Parent Are You When It Comes to Helping Guide Your Child...LinkedIn Quiz: Which Parent Are You When It Comes to Helping Guide Your Child...
LinkedIn Quiz: Which Parent Are You When It Comes to Helping Guide Your Child...
LinkedIn
 
LinkedIn Connect to Opportunity™ -- Stories of Discovery
LinkedIn Connect to Opportunity™ -- Stories of DiscoveryLinkedIn Connect to Opportunity™ -- Stories of Discovery
LinkedIn Connect to Opportunity™ -- Stories of Discovery
LinkedIn
 
LinkedIn Q2 2015 Earnings Call
LinkedIn Q2 2015 Earnings CallLinkedIn Q2 2015 Earnings Call
LinkedIn Q2 2015 Earnings Call
LinkedIn
 

Recently uploaded (20)

Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 

Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn

  • 1. Project Voldemort Jay Kreps 19/11/09 1
  • 2. The Plan 1. Motivation 2. Core Concepts 3. Implementation 4. In Practice 5. Results
  • 4. The Team •  LinkedIn’s Search, Network, and Analytics Team •  Project Voldemort •  Search Infrastructure: Zoie, Bobo, etc •  LinkedIn’s Hadoop system •  Recommendation Engine •  Data intensive features •  People you may know •  Who’s viewed my profile •  User history service
  • 5. The Idea of the Relational Database
  • 6. The Reality of a Modern Web Site
  • 7. Why did this happen? •  The internet centralizes computation •  Specialized systems are efficient (10-100x) •  Search: Inverted index •  Offline: Hadoop, Terradata, Oracle DWH •  Memcached •  In memory systems (social graph) •  Specialized system are scalable •  New data and problems •  Graphs, sequences, and text
  • 8. Services and Scale Break Relational DBs •  No joins •  Lots of denormalization •  ORM is less helpful •  No constraints, triggers, etc •  Caching => key/value model •  Latency is key
  • 9. Two Cheers For Relational Databases •  The relational model is a triumph of computer science: •  General •  Concise •  Well understood •  But then again: •  SQL is a pain •  Hard to build re-usable data structures •  Don’t hide the memory hierarchy! Good: Filesystem API Bad: SQL, some RPCs
  • 10. Other Considerations •  Who is responsible for performance (engineers? DBA? site operations?) •  Can you do capacity planning? •  Can you simulate the problem early in the design phase? •  How do you do upgrades? •  Can you mock your database?
  • 11. Some motivating factors •  This is a latency-oriented system •  Data set is large and persistent •  Cannot be all in memory •  Performance considerations •  Partition data •  Delay writes •  Eliminate network hops •  80% of caching tiers are fixing problems that shouldn’t exist •  Need control over system availability and data durability •  Must replicate data on multiple machines •  Cost of scalability can’t be too high
  • 12. Inspired By Amazon Dynamo & Memcached •  Amazon’s Dynamo storage system •  Works across data centers •  Eventual consistency •  Commodity hardware •  Not too hard to build   Memcached –  Actually works –  Really fast –  Really simple   Decisions: –  Multiple reads/writes –  Consistent hashing for data distribution –  Key-Value model –  Data versioning
  • 13. Priorities 1.  Performance and scalability 2.  Actually works 3.  Community 4.  Data consistency 5.  Flexible & Extensible 6.  Everything else
  • 14. Why Is This Hard? •  Failures in a distributed system are much more complicated •  A can talk to B does not imply B can talk to A •  A can talk to B does not imply C can talk to B •  Getting a consistent view of the cluster is as hard as getting a consistent view of the data •  Nodes will fail and come back to life with stale data •  I/O has high request latency variance •  I/O on commodity disks is even worse •  Intermittent failures are common •  User must be isolated from these problems •  There are fundamental trade-offs between availability and consistency
  • 16. Core Concepts - I   ACID –  Great for single centralized server.   CAP Theorem –  Consistency (Strict), Availability , Partition Tolerance –  Impossible to achieve all three at same time in distributed platform –  Can choose 2 out of 3 –  Dynamo chooses High Availability and Partition Tolerance   by sacrificing Strict Consistency to Eventual consistency   Consistency Models –  Strict consistency   2 Phase Commits   PAXOS : distributed algorithm to ensure quorum for consistency –  Eventual consistency   Different nodes can have different views of value   In a steady state system will return last written value.   BUT Can have much strong guarantees. Proprietary & Confidential 19/11/09 16
  • 17. Core Concept - II   Consistent Hashing   Key space is Partitioned –  Many small partitions   Partitions never change –  Partitions ownership can change   Replication –  Each partition is stored by ‘N’ nodes   Node Failures –  Transient (short term) –  Long term   Needs faster bootstrapping Proprietary & Confidential 19/11/09 17
  • 18. Core Concept - III •  N - The replication factor •  R - The number of blocking reads •  W - The number of blocking writes •  If R+W > N •  then we have a quorum-like algorithm •  Guarantees that we will read latest writes OR fail •  R, W, N can be tuned for different use cases •  W = 1, Highly available writes •  R = 1, Read intensive workloads •  Knobs to tune performance, durability and availability Proprietary & Confidential 19/11/09 18
  • 19. Core Concepts - IV •  Vector Clock [Lamport] provides way to order events in a distributed system. •  A vector clock is a tuple {t1 , t2 , ..., tn } of counters. •  Each value update has a master node •  When data is written with master node i, it increments ti. •  All the replicas will receive the same version •  Helps resolving consistency between writes on multiple replicas •  If you get network partitions •  You can have a case where two vector clocks are not comparable. •  In this case Voldemort returns both values to clients for conflict resolution Proprietary & Confidential 19/11/09 19
  • 22. Client API •  Data is organized into “stores”, i.e. tables •  Key-value only •  But values can be arbitrarily rich or complex •  Maps, lists, nested combinations … •  Four operations •  PUT (K, V) •  GET (K) •  MULTI-GET (Keys), •  DELETE (K, Version) •  No Range Scans
  • 23. Versioning & Conflict Resolution •  Eventual Consistency allows multiple versions of value •  Need a way to understand which value is latest •  Need a way to say values are not comparable •  Solutions •  Timestamp •  Vector clocks •  Provides global ordering. •  No locking or blocking necessary
  • 24. Serialization •  Really important •  Few Considerations •  Schema free? •  Backward/Forward compatible •  Real life data structures •  Bytes <=> objects <=> strings? •  Size (No XML) •  Many ways to do it -- we allow anything •  Compressed JSON, Protocol Buffers, Thrift, Voldemort custom serialization
  • 25. Routing •  Routing layer hides lot of complexity •  Hashing schema •  Replication (N, R , W) •  Failures •  Read-Repair (online repair mechanism) •  Hinted Handoff (Long term recovery mechanism) •  Easy to add domain specific strategies •  E.g. only do synchronous operations on nodes in the local data center •  Client Side / Server Side / Hybrid
  • 27. Routing With Failures •  Failure Detection • Requirements • Need to be very very fast •  View of server state may be inconsistent •  A can talk to B but C cannot •  A can talk to C , B can talk to A but not to C •  Currently done by routing layer (request timeouts) •  Periodically retries failed nodes. •  All requests must have hard SLAs • Other possible solutions •  Central server •  Gossip protocol •  Need to look more into this.
  • 28. Repair Mechanism   Read Repair –  Online repair mechanism   Routing client receives values from multiple node   Notify a node if you see an old value   Only works for keys which are read after failures   Hinted Handoff –  If a write fails write it to any random node –  Just mark the write as a special write –  Each node periodically tries to get rid of all special entries   Bootstrapping mechanism (We don’t have it yet) –  If a node was down for long time   Hinted handoff can generate ton of traffic   Need a better way to bootstrap and clear hinted handoff tables Proprietary & Confidential 19/11/09 28
  • 29. Network Layer •  Network is the major bottleneck in many uses •  Client performance turns out to be harder than server (client must wait!) •  Lots of issue with socket buffer size/socket pool •  Server is also a Client •  Two implementations •  HTTP + servlet container •  Simple socket protocol + custom server •  HTTP server is great, but http client is 5-10X slower •  Socket protocol is what we use in production •  Recently added a non-blocking version of the server
  • 30. Persistence •  Single machine key-value storage is a commodity •  Plugins are better than tying yourself to a single strategy •  Different use cases •  optimize reads •  optimize writes •  large vs small values •  SSDs may completely change this layer •  Better filesystems may completely change this layer •  Couple of different options •  BDB, MySQL and mmap’d file implementations •  Berkeley DBs most popular •  In memory plugin for testing •  Btrees are still the best all-purpose structure •  No flush on write is a huge, huge win
  • 32. LinkedIn problems we wanted to solve •  Application Examples •  People You May Know •  Item-Item Recommendations •  Member and Company Derived Data •  User’s network statistics •  Who Viewed My Profile? •  Abuse detection •  User’s History Service •  Relevance data •  Crawler detection •  Many others have come up since •  Some data is batch computed and served as read only •  Some data is very high write load •  Latency is key
  • 33. Key-Value Design Example   How to build a fast, scalable comment system?   One approach –  (post_id, page) => [comment_id_1, comment_id_2, …] –  comment_id => comment_body   GET comment_ids by post and page   MULTIGET comment bodies   Threaded, paginated comments left as an exercise  Proprietary & Confidential 19/11/09 33
  • 34. Hadoop and Voldemort sitting in a tree…   Hadoop can generate a lot of data   Bottleneck 1: Getting the data out of hadoop   Bottleneck 2: Transfer to DB   Bottleneck 3: Index building   We had a critical process where this process took a DBA a week to run!   Index building is a batch operation 19/11/09 34
  • 36. Read-only storage engine   Throughput vs. Latency   Index building done in Hadoop   Fully parallel transfer   Very efficient on-disk structure   Heavy reliance on OS pagecache   Rollback!
  • 37. Voldemort At LinkedIn •  4 Clusters, 4 teams •  Wide variety of data sizes, clients, needs •  My team: •  12 machines •  Nice servers •  500M operations/day •  ~4 billion events in 10 stores (one per event type) •  Peak load > 10k operations / second •  Other teams: news article data, email related data, UI settings
  • 39. Some performance numbers •  Production stats •  Median: 0.1 ms •  99.9 percentile GET: 3 ms •  Single node max throughput (1 client node, 1 server node): •  19,384 reads/sec •  16,559 writes/sec •  These numbers are for mostly in-memory problems
  • 40. Glaring Weaknesses •  Not nearly enough documentation •  No online cluster expansion (without reduced guarantees) •  Need more clients in other languages (Java, Python, Ruby, and C++ currently) •  Better tools for cluster-wide control and monitoring
  • 41. State of the Project •  Active mailing list •  4-5 regular committers outside LinkedIn •  Lots of contributors •  Equal contribution from in and out of LinkedIn •  Project basics •  IRC •  Some documentation •  Lots more to do •  > 300 unit tests that run on every checkin (and pass) •  Pretty clean code •  Moved to GitHub (by popular demand) •  Production usage at a half dozen companies •  Not just a LinkedIn project anymore •  But LinkedIn is really committed to it (and we are hiring to work on it)
  • 42. Some new & upcoming things •  New •  Python, Ruby clients •  Non-blocking socket server •  Alpha round on online cluster expansion •  Read-only store and Hadoop integration •  Improved monitoring stats •  Distributed testing infrastructure •  Compression •  Future •  Publish/Subscribe model to track changes •  Improved failure detection
  • 43. Socket Server Scalability Proprietary & Confidential 19/11/09 43
  • 44. Testing and releases   Testing “in the cloud”   Distributed systems have complex failure scenarios   A storage system, above all, must be stable   Automated testing allows rapid iteration while maintaining confidence in systems’ correctness and stability   EC2-based testing framework   Tests are invoked programmatically   Contributed by Kirk True   Adaptable to other cloud hosting providers   Regular releases for new features and bugs   Trunk stays stable Proprietary & Confidential 19/11/09 44
  • 45. Shameless promotion •  Check it out: project-voldemort.com •  We love getting patches. •  We kind of love getting bug reports. •  LinkedIn is hiring, so you can work on this full time. •  Email me if interested •  [email protected]