SlideShare a Scribd company logo
Fahd Siddiqui
Bazaarvoice
Serving a billion shoppers for Black Friday on a distributed data store using
Cassandra
Fahd Siddiqui
Senior Staff Software Engineer, Data
Infrastructure
Bazaarvoice
linkedin.com/in/fahdsiddiqui
twitter.com/fahdsiddiqui
fahd.siddiqui@bazaarvoice.com
$ whoami
SaaS serving software that collects
and displays user generated content,
crunches analytics, and extracts
insights.
Thousands of clients
Hundreds of millions of pieces of content
Hundreds of millions of unique visitors per month
Tens of billions of pageviews per month
Austin-based company founded in 2005
Austin New YorkEngineering offices
Global Monthly Unique Visitors
1B
1B
500M
1B
400M
200M
250M
450M
1B
600M
1 Problem/Motivation: Data sharing across services
2 EmoDB – Distributed Data Store (System of Record)
3 EmoDB – Bulk use case
4 EmoDB – Databus
5 Summary
5
Problem/Motivation
Data Sharing Across Services
Microservices are great !
- Deployment expediency
- Developer velocity
- Decentralized governance, data
management
- Scale individual services as needed
- Fault isolation
7
Svc BSvc A
Svc DSvc C
Thorny Issue of Data Sharing
- Service end points for data access
- Providing data access end points can hardly cover
canned queries, let alone adhoc ones
- Dependency on many services for data joins can
lead to unpredictable latencies
- Service to Service communication does not scale
- Total connections / SLA contracts = n(n-1)/2 =>
O(n^2)
8
Svc BSvc A
Svc DSvc C
Data Source Sharing?
- Sharing internal data sources with other
services is a bad idea
- Fractured data model
- Conflicting optimization
- Complicated ETLs
- System of Record? Ownership of data?
9
Svc BSvc A
Svc D
Svc C
ETL
SOA and bulk use case
- Heavy analytical jobs that need to be done
every day on *all* data
10
Svc BSvc A
HDFS/S3
Recap: Each service needs to :
- Get data from N other services as needed.
- Share its data with real time streaming
consumers
- Take care of bulk use case. Export all data
periodically for MapR jobs
- Come up with a process that establishes some
resemblance of System of Record, so future
development becomes easier.
11
Streams to the rescue – a good solution
- Common theme proliferated by LinkedIn is to use streams that flow through your organization
(Das et al., “All aboard the Databus!,” SoCC '12)
- Publish data to a stream. Listen on a stream.
- Several distributed queueing systems out there, including Kafka which presents a compelling
solution
12
Message Bus
ElasticSearch Mongo Real Time Svc
Service Service
Implementation details of message bus(es)
- Message bus for multiple consumers and producers presents some challenges
- Producers would still have to publish to certain pre-determined topics (or queues).
- Leads to tight coupling.
- Increased burden on producers to push to several topics as needed by various consumers.
- Services may have to listen to several topics.
- Authorization and access handling of events on the message bus.
13
Goal
Build out a distributed data store with a built in streaming data
platform
EmoDB – System of Record relationship with
Databus
- System of Record (SoR) and databus are intricately linked.
- Updates to SoR automatically feed into the databus
- Producers only care about updating SoR, and completely unaware of underlying messaging
system
- But first … let’s talk about SoR
15
EmoDB – System of Record
Emo’s Key/Key/Value store
EmoDB – System of Record (SoR)
- A simple Key/Value store that holds JSON documents.
- Globally distributed (Multi-data-center)
- Multi-master writes and reads
- Fault tolerant
- Horizontal Read/Write Scale
- Massive global writes (non-blocking, conflict-free)
- Flexible JSON content
- Incremental updates without reads (including nested JSON
attributes)
- Granular Optimistic Concurrency Control (OCC)
17
EmoDB – Based on Cassandra
18
Relies on Cassandra for
persistence and cross data center
replication
Flexible JSON Content in Cassandra
- Should be able to make an incremental change without reading the document
- Should be able to add/delete/modify any nested attribute to a JSON document
- Should be able to converge to a consistent state (eventual consistency)
- Example use-cases:
- Given {“x”: 1, “y”: { “a”: 2}} , can we add a new attribute “b” in y such that y[“b”] = 3 ?
- Can concurrent writers write to the same JSON document from different data centers ?
19
Delta - A Conflict-free Replicated Data Type
- Emo Delta is a JSON spec that can represent an incremental update to any JSON document
- Emo Deltas is a crucial building block for the entire infrastructure. Not only for SoR, but also for
the Databus.
- Emo Row is a sequence of such immutable deltas. Writers simply append a delta (even for
deletes). Readers resolve them into a single JSON document upon reads.
20
Delta - A Conflict-free Replicated Data Type
- Delta Example:
21
Δ1 { "rating": 4, "text": "I like it."}
Δ2 { .., "status": "APPROVED" }
Δ3 { .., "status": if "PENDING" then
"REJECTED" end }
{ "rating": 4,
"text": "I like it.",
"status":
"APPROVED" }
MIME Type: application/x.json-delta MIME Type: application/json
Delta - A Conflict-free Replicated Data Type
- Cassandra table structure:
22
CREATE TABLE sor_deltas (
rowkey blob,
changeid timeuuid,
delta text,
PRIMARY KEY (rowkey, changeid)
) WITH COMPACT STORAGE
AND gc_grace_seconds = 172800 ...
Note: changeid is a Time UUID, and is a clustering key
A Conflict-free Replicated Data Type (Delta)
Sample Emo Row in Cassandra
23
EmoDB resolves the above CQL rows of deltas
into a JSON document:
EmoDB – System of Record (SoR)
- A simple Key/Value store that holds JSON documents.
- Globally distributed (Multi-data-center)
- Multi-master writes and reads
- Fault tolerant
- Horizontal Read/Write Scale
- Massive global writes (non-blocking, conflict-free)
- Flexible JSON content
- Incremental updates without reads (including nested JSON
attributes)
- Granular Optimistic Concurrency Control (OCC)
24
Emo Delta
Deltas and Conflict Resolution
25
CASSANDRA
T1:∆1 T1:∆1
T2:∆1 ∆3 T2:∆1 ∆2
US EU
Databus Databus
App_US App_EU
∆2∆3
∆1 = { “rating”: 5, “text”: “love it” }
∆2 = { ..,“status”: “rejected” }
∆3 = { ..,“status”: if “PENDING” then “APPROVED” end }
{“rating”: 5, “text”: “love it”} {“rating”: 5, “text”: “love it”}{“rating”: 5, “text”: “love it”, “status”: “rejected”}{“rating”: 5, “text”: “love it”}{“rating”: 5, “text”: “love it”, “status”: “rejected”}
T3:∆1 ∆2 ∆3 T3:∆1 ∆2 ∆3
EmoDB – System of Record (SoR)
- A simple Key/Value store that holds JSON documents.
- Globally distributed (Multi-data-center)
- Multi-master writes and reads
- Fault tolerant
- Horizontal Read/Write Scale
- Massive global writes (non-blocking, conflict-free)
- Flexible JSON content
- Incremental updates without reads (including nested JSON
attributes)
- Granular Optimistic Concurrency Control (OCC)
26
Emo Delta
Deltas and Granular OCC
- Optimistic Concurrency Control (OCC) is usually done by asserting the document
version/signature hasn’t changed since our last read
- Example:
- {"x": 0, "version": 1}
- {"x": 0, "y": 1, "version": 2}
- {"x": 1, "y": 1, "version": 2} (assertion: version:1)  This fails as version has changed
- Emo does not have row level locking, and only provides OCC
27
Deltas and Granular OCC
- “Conditional” Delta type allows writers to append a delta that is based on any arbitrary existing
attribute of the document for OCC.
- Example:
- P1 reads: {"x": 0, "version": 1}
- P2 writes: {"x": 0, "y": 1, "version": 2}
- P1 writes: {"x": {..,if 0 then 1}}  This delta passes even when another irrelevant attribute is updated
- Hence, “granular” OCC
- The document may have changed for other attributes, but as long as the attribute we care about remains the
same
- By the way, EmoDB also provides “~version” and “~signature” hash too if that’s your jam.
28
EmoDB – System of Record (SoR)
- A simple Key/Value store that holds JSON documents.
- Globally distributed (Multi-data-center)
- Multi-master writes and reads
- Fault tolerant
- Horizontal Read/Write Scale
- Massive global writes (non-blocking, conflict-free)
- Flexible JSON content
- Incremental updates without reads (including nested JSON
attributes)
- Granular Optimistic Concurrency Control (OCC)
29
Emo Delta
Reads - Distributed Compaction of Deltas
- Over time, deltas will accrue and reads would take
longer to resolve
- EmoDB needs to compact these deltas by resolving and
replacing them with a single resolved delta
- In a distributed environment, this can lead to data loss if
an out-of-order delta is in flight.
30
DC-1
In-flight delta
Distributed Compaction Solution
- If only we could tell exactly which Cassandra columns in a row (deltas) are fully consistent on all
nodes
- Let’s call this Full Consistency Timestamp (FCT), such that any column with time UUID before
FCT is fully consistent on all nodes.
- More formally,
- given C, the set of consistent events, and FCT, the full consistency timestamp,
- for all events e with timestamp te, if te < FCT, e ∈ C
- Aka: ∀ e, te | te < FCT => e ∈ C
31
Distributed Compaction Solution
© DataStax, All Rights Reserved.32
Distributed Compaction Solution
- Exploit Cassandra’s Hinted Handoff feature
- Let’s take a look at the Hints table (C* 2.2.4)
cqlsh:system> desc table hints;
CREATE TABLE system.hints (
target_id uuid,
hint_id timeuuid,
message_version int, mutation blob,
PRIMARY KEY (target_id, hint_id, message_version) ) WITH COMPACT STORAGE ..
33
Note: hint_id is a timeuuid, and also a
clustering key
Distributed Compaction Solution - FCT
- Determining the oldest hint on a node is trivial.
cqlsh:system> SELECT min(hint_id) as oldest_hint from hints;
- Oldest hint timestamp tells us that any update before that is fully consistent (edge cases do exist,
but resolvable)
- So, knowing the oldest hints on all nodes will give us our FCT. If no hints exist, take current time
as oldest hint time. Finally:
FCT = min(oldhint_1, oldhint_2,…,old_hint_n) – (2 * rpc_timeout)
34
Distributed Compaction Solution
- Equipped with FCT, we know exactly which deltas any node can compact independently and
concurrently without fearing data loss
- Basically, any delta with a time UUID less than FCT can be compacted away
- Other applications of FCT include monitoring cluster health, global consistency for databus, and
synchronizing events cross data center
35
EmoDB is:
• Globally distributed JSON key/value store
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui, Bazaarvoice) | Cassandra Summit 2016
EmoDB – Bulk use case
38
- Export all data daily to HDFS/S3 for bulk
jobs
- No freezing of writes allowed
- while providing efficient reads
EmoStash – Global Snapshots
• Another application of Emo Deltas – global snapshots without freezing writes !
• The snapshot process simply ignores any deltas later than the “snapshot” time to capture and
upload the exact view of the globally distributed data store at that moment.
39
Snapshot cutoff
EmoDB is:
• Globally distributed JSON key/value store
• , which offers global consistent snapshots
EmoDB – Real time streaming use case
41
EmoDB Databus
- Producers should only update System of Record, not to individual queues or topics
- Infinite playback to bootstrap new data sources
- Subscribers should only get events they care about
- Complete flexibility in creating subscriptions
- Multi-data-center guarantees
- Global databus
42
EmoDB Databus
- Databus is a special kind of queue. No one explicitly puts events on it.
- Databus is tied to System of Record and *all* updates automatically generate an event
- one “system” queue which fans out to the subscriptions
- Fanout process also “authorizes” events based on subscription owners
- Subscribers can simply create a “subscription” with a filter on what tables and/or tags they want
to follow.
- Emo then starts “DVRing” updates for that subscription
- Subscription + Scan = Bootstrap a new datastore
- Producers are completely decoupled from subscribers.
43
Databus - Subscriptions
// To create a subscription to *all* tables
$ curl -s -XPUT -H "Content-Type: application/x.json-condition” 
http://localhost:8080/bus/1/demo-app --data-binary 'alwaysTrue()'
// To create a subscription to only review and catalog tables
--data-binary 'intrinsic("~table:"review:client", "catalog:client")'
// To create a subscription to all tables of a “review” type
--data-binary '{..,"type":"review"}'
// To create a subscription to follow a “moderate” tag regardless of the table
--data-binary '{.., "~tags":contains("moderate")}'
44
EmoDB Databus
- Producers should only update System of Record, not to individual queues or topics ✓
- Infinite playback to bootstrap new data sources ✓
- Subscribers should only get events they care about ✓
- Complete flexibility in creating subscriptions
- Multi-data-center guarantees
- Global databus
45
Databus – Multi-Data Center Guarantees
46
US EU
Emo (Cassandra) DatabusDatabus
Consumer Producer
Updates SoR with tag “HANDLE_THIS”Subscribed to tag “HANDLE_THIS”
Databus only notifies US consumer when the update is replicated to US
Databus – Multi-Data Center Guarantees
• Emo does not put the entire document on the data bus. Only the key and “changeId” of the
update delta goes on the bus.
• EmoDB checks the following before handing out the event :
• Either: the document as it appears in the local DC contains the changeID of the update
• Or: the changeID is older than FCT (Full Consistency Timestamp)
• Applications can achieve global consistency without requiring a GLOBAL level write!
• EmoDB Databus replicates globally
• And notifies you only when the changes are available with “local_quorum” consistency
47
EmoDB is:
• Globally distributed JSON key/value store
• , which offers global consistent snapshots
• with a built-in streaming data platform.
But wait … there’s more !
• Queue Service
• Features “Dedup” queue service
• Blob Store – To store big files (photos, pdf, html files, etc.)
• API Key Management system – out of the box
• All based on Cassandra – Use your existing operational Cassandra skills and tools to maintain
an Emo cluster
49
EmoDB is open source!
• https://bazaarvoice.github.io/emodb
• https://github.com/bazaarvoice/emodb
• EmoDB provides “out of the box” data infrastructure for a globally distributed service oriented
architecture
• It abstracts out many implementation details, and makes it easy for services to share data
seamlessly, all the while making sure a consistent and comprehensive System of Record is
available.
50
EmoDB is open source!
51
Thank You !
@Bazaarvoice
@BazaarvoiceDev
http://www.bazaarvoice.com/
http://blog.developer.bazaarvoice.com/
Learn
more
Ad

More Related Content

What's hot (20)

Cassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsCassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentials
Julien Anguenot
 
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
DataStax
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
Joe Stein
 
Cassandra background-and-architecture
Cassandra background-and-architectureCassandra background-and-architecture
Cassandra background-and-architecture
Markus Klems
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
Gokhan Atil
 
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
DataStax
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
DataStax
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
Christian Johannsen
 
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
DataStax
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamo
jbellis
 
Apache Cassandra at Macys
Apache Cassandra at MacysApache Cassandra at Macys
Apache Cassandra at Macys
DataStax Academy
 
Load testing Cassandra applications
Load testing Cassandra applicationsLoad testing Cassandra applications
Load testing Cassandra applications
Ben Slater
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
DataStax
 
Voldemort on Solid State Drives
Voldemort on Solid State DrivesVoldemort on Solid State Drives
Voldemort on Solid State Drives
Vinoth Chandar
 
Develop Scalable Applications with DataStax Drivers (Alex Popescu, Bulat Shak...
Develop Scalable Applications with DataStax Drivers (Alex Popescu, Bulat Shak...Develop Scalable Applications with DataStax Drivers (Alex Popescu, Bulat Shak...
Develop Scalable Applications with DataStax Drivers (Alex Popescu, Bulat Shak...
DataStax
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for Sysadmins
Nathan Milford
 
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
DataStax
 
Introduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyIntroduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and Consistency
Benjamin Black
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
DataStax Academy
 
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
ScyllaDB
 
Cassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsCassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentials
Julien Anguenot
 
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
DataStax
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
Joe Stein
 
Cassandra background-and-architecture
Cassandra background-and-architectureCassandra background-and-architecture
Cassandra background-and-architecture
Markus Klems
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
Gokhan Atil
 
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
DataStax
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
DataStax
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
Christian Johannsen
 
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
DataStax
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamo
jbellis
 
Load testing Cassandra applications
Load testing Cassandra applicationsLoad testing Cassandra applications
Load testing Cassandra applications
Ben Slater
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
DataStax
 
Voldemort on Solid State Drives
Voldemort on Solid State DrivesVoldemort on Solid State Drives
Voldemort on Solid State Drives
Vinoth Chandar
 
Develop Scalable Applications with DataStax Drivers (Alex Popescu, Bulat Shak...
Develop Scalable Applications with DataStax Drivers (Alex Popescu, Bulat Shak...Develop Scalable Applications with DataStax Drivers (Alex Popescu, Bulat Shak...
Develop Scalable Applications with DataStax Drivers (Alex Popescu, Bulat Shak...
DataStax
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for Sysadmins
Nathan Milford
 
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
DataStax
 
Introduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyIntroduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and Consistency
Benjamin Black
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
DataStax Academy
 
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
ScyllaDB
 

Similar to One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui, Bazaarvoice) | Cassandra Summit 2016 (20)

Real time data-pipeline from inception to production
Real time data-pipeline from inception to productionReal time data-pipeline from inception to production
Real time data-pipeline from inception to production
Shreya Mukhopadhyay
 
About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"
Jihyun Ahn
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
Ding Li
 
quickguide-einnovator-5-springxd
quickguide-einnovator-5-springxdquickguide-einnovator-5-springxd
quickguide-einnovator-5-springxd
jorgesimao71
 
Distributed Systems: How to connect your real-time applications
Distributed Systems: How to connect your real-time applicationsDistributed Systems: How to connect your real-time applications
Distributed Systems: How to connect your real-time applications
Jaime Martin Losa
 
DDS Advanced Tutorial - OMG June 2013 Berlin Meeting
DDS Advanced Tutorial - OMG June 2013 Berlin MeetingDDS Advanced Tutorial - OMG June 2013 Berlin Meeting
DDS Advanced Tutorial - OMG June 2013 Berlin Meeting
Jaime Martin Losa
 
Azure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep DiveAzure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep Dive
Andre Essing
 
Splunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the messageSplunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the message
Damien Dallimore
 
NoSql Database
NoSql DatabaseNoSql Database
NoSql Database
Suresh Parmar
 
Confluent & MongoDB APAC Lunch & Learn
Confluent & MongoDB APAC Lunch & LearnConfluent & MongoDB APAC Lunch & Learn
Confluent & MongoDB APAC Lunch & Learn
confluent
 
Cassandra Consistency: Tradeoffs and Limitations
Cassandra Consistency: Tradeoffs and LimitationsCassandra Consistency: Tradeoffs and Limitations
Cassandra Consistency: Tradeoffs and Limitations
Panagiotis Papadopoulos
 
Escalando Aplicaciones Web
Escalando Aplicaciones WebEscalando Aplicaciones Web
Escalando Aplicaciones Web
Santiago Coffey
 
Atom The Redis Streams-Powered Microservices SDK: Dan Pipemazo
Atom The Redis Streams-Powered Microservices SDK: Dan PipemazoAtom The Redis Streams-Powered Microservices SDK: Dan Pipemazo
Atom The Redis Streams-Powered Microservices SDK: Dan Pipemazo
Redis Labs
 
Distribute key value_store
Distribute key value_storeDistribute key value_store
Distribute key value_store
drewz lin
 
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf
 
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaA noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
Data Con LA
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
Alex Ivy
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
DataStax Academy
 
Deploy data analysis pipeline with mesos and docker
Deploy data analysis pipeline with mesos and dockerDeploy data analysis pipeline with mesos and docker
Deploy data analysis pipeline with mesos and docker
Vu Nguyen Duy
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Andrey Vykhodtsev
 
Real time data-pipeline from inception to production
Real time data-pipeline from inception to productionReal time data-pipeline from inception to production
Real time data-pipeline from inception to production
Shreya Mukhopadhyay
 
About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"
Jihyun Ahn
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
Ding Li
 
quickguide-einnovator-5-springxd
quickguide-einnovator-5-springxdquickguide-einnovator-5-springxd
quickguide-einnovator-5-springxd
jorgesimao71
 
Distributed Systems: How to connect your real-time applications
Distributed Systems: How to connect your real-time applicationsDistributed Systems: How to connect your real-time applications
Distributed Systems: How to connect your real-time applications
Jaime Martin Losa
 
DDS Advanced Tutorial - OMG June 2013 Berlin Meeting
DDS Advanced Tutorial - OMG June 2013 Berlin MeetingDDS Advanced Tutorial - OMG June 2013 Berlin Meeting
DDS Advanced Tutorial - OMG June 2013 Berlin Meeting
Jaime Martin Losa
 
Azure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep DiveAzure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep Dive
Andre Essing
 
Splunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the messageSplunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the message
Damien Dallimore
 
Confluent & MongoDB APAC Lunch & Learn
Confluent & MongoDB APAC Lunch & LearnConfluent & MongoDB APAC Lunch & Learn
Confluent & MongoDB APAC Lunch & Learn
confluent
 
Cassandra Consistency: Tradeoffs and Limitations
Cassandra Consistency: Tradeoffs and LimitationsCassandra Consistency: Tradeoffs and Limitations
Cassandra Consistency: Tradeoffs and Limitations
Panagiotis Papadopoulos
 
Escalando Aplicaciones Web
Escalando Aplicaciones WebEscalando Aplicaciones Web
Escalando Aplicaciones Web
Santiago Coffey
 
Atom The Redis Streams-Powered Microservices SDK: Dan Pipemazo
Atom The Redis Streams-Powered Microservices SDK: Dan PipemazoAtom The Redis Streams-Powered Microservices SDK: Dan Pipemazo
Atom The Redis Streams-Powered Microservices SDK: Dan Pipemazo
Redis Labs
 
Distribute key value_store
Distribute key value_storeDistribute key value_store
Distribute key value_store
drewz lin
 
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf
 
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaA noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
Data Con LA
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
Alex Ivy
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
DataStax Academy
 
Deploy data analysis pipeline with mesos and docker
Deploy data analysis pipeline with mesos and dockerDeploy data analysis pipeline with mesos and docker
Deploy data analysis pipeline with mesos and docker
Vu Nguyen Duy
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Andrey Vykhodtsev
 
Ad

More from DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
DataStax
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
DataStax
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
DataStax
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
DataStax
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
DataStax
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
DataStax
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
DataStax
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
DataStax
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
DataStax
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
DataStax
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
DataStax
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
DataStax
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
DataStax
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
DataStax
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
DataStax
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
DataStax
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
DataStax
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
DataStax
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
DataStax
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
DataStax
 
Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
DataStax
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
DataStax
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
DataStax
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
DataStax
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
DataStax
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
DataStax
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
DataStax
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
DataStax
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
DataStax
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
DataStax
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
DataStax
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
DataStax
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
DataStax
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
DataStax
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
DataStax
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
DataStax
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
DataStax
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
DataStax
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
DataStax
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
DataStax
 
Ad

Recently uploaded (20)

Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New VersionPixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
saimabibi60507
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
Landscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature ReviewLandscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature Review
Hironori Washizaki
 
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Orangescrum
 
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
ssuserb14185
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
University of Hawai‘i at Mānoa
 
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
steaveroggers
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& ConsiderationsDesigning AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Dinusha Kumarasiri
 
Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025
kashifyounis067
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025
kashifyounis067
 
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New VersionPixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
saimabibi60507
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
Landscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature ReviewLandscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature Review
Hironori Washizaki
 
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Orangescrum
 
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
ssuserb14185
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
University of Hawai‘i at Mānoa
 
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
steaveroggers
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& ConsiderationsDesigning AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Dinusha Kumarasiri
 
Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025
kashifyounis067
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025
kashifyounis067
 
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 

One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui, Bazaarvoice) | Cassandra Summit 2016

  • 1. Fahd Siddiqui Bazaarvoice Serving a billion shoppers for Black Friday on a distributed data store using Cassandra
  • 2. Fahd Siddiqui Senior Staff Software Engineer, Data Infrastructure Bazaarvoice linkedin.com/in/fahdsiddiqui twitter.com/fahdsiddiqui [email protected] $ whoami
  • 3. SaaS serving software that collects and displays user generated content, crunches analytics, and extracts insights. Thousands of clients Hundreds of millions of pieces of content Hundreds of millions of unique visitors per month Tens of billions of pageviews per month Austin-based company founded in 2005 Austin New YorkEngineering offices
  • 4. Global Monthly Unique Visitors 1B 1B 500M 1B 400M 200M 250M 450M 1B 600M
  • 5. 1 Problem/Motivation: Data sharing across services 2 EmoDB – Distributed Data Store (System of Record) 3 EmoDB – Bulk use case 4 EmoDB – Databus 5 Summary 5
  • 7. Microservices are great ! - Deployment expediency - Developer velocity - Decentralized governance, data management - Scale individual services as needed - Fault isolation 7 Svc BSvc A Svc DSvc C
  • 8. Thorny Issue of Data Sharing - Service end points for data access - Providing data access end points can hardly cover canned queries, let alone adhoc ones - Dependency on many services for data joins can lead to unpredictable latencies - Service to Service communication does not scale - Total connections / SLA contracts = n(n-1)/2 => O(n^2) 8 Svc BSvc A Svc DSvc C
  • 9. Data Source Sharing? - Sharing internal data sources with other services is a bad idea - Fractured data model - Conflicting optimization - Complicated ETLs - System of Record? Ownership of data? 9 Svc BSvc A Svc D Svc C ETL
  • 10. SOA and bulk use case - Heavy analytical jobs that need to be done every day on *all* data 10 Svc BSvc A HDFS/S3
  • 11. Recap: Each service needs to : - Get data from N other services as needed. - Share its data with real time streaming consumers - Take care of bulk use case. Export all data periodically for MapR jobs - Come up with a process that establishes some resemblance of System of Record, so future development becomes easier. 11
  • 12. Streams to the rescue – a good solution - Common theme proliferated by LinkedIn is to use streams that flow through your organization (Das et al., “All aboard the Databus!,” SoCC '12) - Publish data to a stream. Listen on a stream. - Several distributed queueing systems out there, including Kafka which presents a compelling solution 12 Message Bus ElasticSearch Mongo Real Time Svc Service Service
  • 13. Implementation details of message bus(es) - Message bus for multiple consumers and producers presents some challenges - Producers would still have to publish to certain pre-determined topics (or queues). - Leads to tight coupling. - Increased burden on producers to push to several topics as needed by various consumers. - Services may have to listen to several topics. - Authorization and access handling of events on the message bus. 13
  • 14. Goal Build out a distributed data store with a built in streaming data platform
  • 15. EmoDB – System of Record relationship with Databus - System of Record (SoR) and databus are intricately linked. - Updates to SoR automatically feed into the databus - Producers only care about updating SoR, and completely unaware of underlying messaging system - But first … let’s talk about SoR 15
  • 16. EmoDB – System of Record Emo’s Key/Key/Value store
  • 17. EmoDB – System of Record (SoR) - A simple Key/Value store that holds JSON documents. - Globally distributed (Multi-data-center) - Multi-master writes and reads - Fault tolerant - Horizontal Read/Write Scale - Massive global writes (non-blocking, conflict-free) - Flexible JSON content - Incremental updates without reads (including nested JSON attributes) - Granular Optimistic Concurrency Control (OCC) 17
  • 18. EmoDB – Based on Cassandra 18 Relies on Cassandra for persistence and cross data center replication
  • 19. Flexible JSON Content in Cassandra - Should be able to make an incremental change without reading the document - Should be able to add/delete/modify any nested attribute to a JSON document - Should be able to converge to a consistent state (eventual consistency) - Example use-cases: - Given {“x”: 1, “y”: { “a”: 2}} , can we add a new attribute “b” in y such that y[“b”] = 3 ? - Can concurrent writers write to the same JSON document from different data centers ? 19
  • 20. Delta - A Conflict-free Replicated Data Type - Emo Delta is a JSON spec that can represent an incremental update to any JSON document - Emo Deltas is a crucial building block for the entire infrastructure. Not only for SoR, but also for the Databus. - Emo Row is a sequence of such immutable deltas. Writers simply append a delta (even for deletes). Readers resolve them into a single JSON document upon reads. 20
  • 21. Delta - A Conflict-free Replicated Data Type - Delta Example: 21 Δ1 { "rating": 4, "text": "I like it."} Δ2 { .., "status": "APPROVED" } Δ3 { .., "status": if "PENDING" then "REJECTED" end } { "rating": 4, "text": "I like it.", "status": "APPROVED" } MIME Type: application/x.json-delta MIME Type: application/json
  • 22. Delta - A Conflict-free Replicated Data Type - Cassandra table structure: 22 CREATE TABLE sor_deltas ( rowkey blob, changeid timeuuid, delta text, PRIMARY KEY (rowkey, changeid) ) WITH COMPACT STORAGE AND gc_grace_seconds = 172800 ... Note: changeid is a Time UUID, and is a clustering key
  • 23. A Conflict-free Replicated Data Type (Delta) Sample Emo Row in Cassandra 23 EmoDB resolves the above CQL rows of deltas into a JSON document:
  • 24. EmoDB – System of Record (SoR) - A simple Key/Value store that holds JSON documents. - Globally distributed (Multi-data-center) - Multi-master writes and reads - Fault tolerant - Horizontal Read/Write Scale - Massive global writes (non-blocking, conflict-free) - Flexible JSON content - Incremental updates without reads (including nested JSON attributes) - Granular Optimistic Concurrency Control (OCC) 24 Emo Delta
  • 25. Deltas and Conflict Resolution 25 CASSANDRA T1:∆1 T1:∆1 T2:∆1 ∆3 T2:∆1 ∆2 US EU Databus Databus App_US App_EU ∆2∆3 ∆1 = { “rating”: 5, “text”: “love it” } ∆2 = { ..,“status”: “rejected” } ∆3 = { ..,“status”: if “PENDING” then “APPROVED” end } {“rating”: 5, “text”: “love it”} {“rating”: 5, “text”: “love it”}{“rating”: 5, “text”: “love it”, “status”: “rejected”}{“rating”: 5, “text”: “love it”}{“rating”: 5, “text”: “love it”, “status”: “rejected”} T3:∆1 ∆2 ∆3 T3:∆1 ∆2 ∆3
  • 26. EmoDB – System of Record (SoR) - A simple Key/Value store that holds JSON documents. - Globally distributed (Multi-data-center) - Multi-master writes and reads - Fault tolerant - Horizontal Read/Write Scale - Massive global writes (non-blocking, conflict-free) - Flexible JSON content - Incremental updates without reads (including nested JSON attributes) - Granular Optimistic Concurrency Control (OCC) 26 Emo Delta
  • 27. Deltas and Granular OCC - Optimistic Concurrency Control (OCC) is usually done by asserting the document version/signature hasn’t changed since our last read - Example: - {"x": 0, "version": 1} - {"x": 0, "y": 1, "version": 2} - {"x": 1, "y": 1, "version": 2} (assertion: version:1)  This fails as version has changed - Emo does not have row level locking, and only provides OCC 27
  • 28. Deltas and Granular OCC - “Conditional” Delta type allows writers to append a delta that is based on any arbitrary existing attribute of the document for OCC. - Example: - P1 reads: {"x": 0, "version": 1} - P2 writes: {"x": 0, "y": 1, "version": 2} - P1 writes: {"x": {..,if 0 then 1}}  This delta passes even when another irrelevant attribute is updated - Hence, “granular” OCC - The document may have changed for other attributes, but as long as the attribute we care about remains the same - By the way, EmoDB also provides “~version” and “~signature” hash too if that’s your jam. 28
  • 29. EmoDB – System of Record (SoR) - A simple Key/Value store that holds JSON documents. - Globally distributed (Multi-data-center) - Multi-master writes and reads - Fault tolerant - Horizontal Read/Write Scale - Massive global writes (non-blocking, conflict-free) - Flexible JSON content - Incremental updates without reads (including nested JSON attributes) - Granular Optimistic Concurrency Control (OCC) 29 Emo Delta
  • 30. Reads - Distributed Compaction of Deltas - Over time, deltas will accrue and reads would take longer to resolve - EmoDB needs to compact these deltas by resolving and replacing them with a single resolved delta - In a distributed environment, this can lead to data loss if an out-of-order delta is in flight. 30 DC-1 In-flight delta
  • 31. Distributed Compaction Solution - If only we could tell exactly which Cassandra columns in a row (deltas) are fully consistent on all nodes - Let’s call this Full Consistency Timestamp (FCT), such that any column with time UUID before FCT is fully consistent on all nodes. - More formally, - given C, the set of consistent events, and FCT, the full consistency timestamp, - for all events e with timestamp te, if te < FCT, e ∈ C - Aka: ∀ e, te | te < FCT => e ∈ C 31
  • 32. Distributed Compaction Solution © DataStax, All Rights Reserved.32
  • 33. Distributed Compaction Solution - Exploit Cassandra’s Hinted Handoff feature - Let’s take a look at the Hints table (C* 2.2.4) cqlsh:system> desc table hints; CREATE TABLE system.hints ( target_id uuid, hint_id timeuuid, message_version int, mutation blob, PRIMARY KEY (target_id, hint_id, message_version) ) WITH COMPACT STORAGE .. 33 Note: hint_id is a timeuuid, and also a clustering key
  • 34. Distributed Compaction Solution - FCT - Determining the oldest hint on a node is trivial. cqlsh:system> SELECT min(hint_id) as oldest_hint from hints; - Oldest hint timestamp tells us that any update before that is fully consistent (edge cases do exist, but resolvable) - So, knowing the oldest hints on all nodes will give us our FCT. If no hints exist, take current time as oldest hint time. Finally: FCT = min(oldhint_1, oldhint_2,…,old_hint_n) – (2 * rpc_timeout) 34
  • 35. Distributed Compaction Solution - Equipped with FCT, we know exactly which deltas any node can compact independently and concurrently without fearing data loss - Basically, any delta with a time UUID less than FCT can be compacted away - Other applications of FCT include monitoring cluster health, global consistency for databus, and synchronizing events cross data center 35
  • 36. EmoDB is: • Globally distributed JSON key/value store
  • 38. EmoDB – Bulk use case 38 - Export all data daily to HDFS/S3 for bulk jobs - No freezing of writes allowed - while providing efficient reads
  • 39. EmoStash – Global Snapshots • Another application of Emo Deltas – global snapshots without freezing writes ! • The snapshot process simply ignores any deltas later than the “snapshot” time to capture and upload the exact view of the globally distributed data store at that moment. 39 Snapshot cutoff
  • 40. EmoDB is: • Globally distributed JSON key/value store • , which offers global consistent snapshots
  • 41. EmoDB – Real time streaming use case 41
  • 42. EmoDB Databus - Producers should only update System of Record, not to individual queues or topics - Infinite playback to bootstrap new data sources - Subscribers should only get events they care about - Complete flexibility in creating subscriptions - Multi-data-center guarantees - Global databus 42
  • 43. EmoDB Databus - Databus is a special kind of queue. No one explicitly puts events on it. - Databus is tied to System of Record and *all* updates automatically generate an event - one “system” queue which fans out to the subscriptions - Fanout process also “authorizes” events based on subscription owners - Subscribers can simply create a “subscription” with a filter on what tables and/or tags they want to follow. - Emo then starts “DVRing” updates for that subscription - Subscription + Scan = Bootstrap a new datastore - Producers are completely decoupled from subscribers. 43
  • 44. Databus - Subscriptions // To create a subscription to *all* tables $ curl -s -XPUT -H "Content-Type: application/x.json-condition” http://localhost:8080/bus/1/demo-app --data-binary 'alwaysTrue()' // To create a subscription to only review and catalog tables --data-binary 'intrinsic("~table:"review:client", "catalog:client")' // To create a subscription to all tables of a “review” type --data-binary '{..,"type":"review"}' // To create a subscription to follow a “moderate” tag regardless of the table --data-binary '{.., "~tags":contains("moderate")}' 44
  • 45. EmoDB Databus - Producers should only update System of Record, not to individual queues or topics ✓ - Infinite playback to bootstrap new data sources ✓ - Subscribers should only get events they care about ✓ - Complete flexibility in creating subscriptions - Multi-data-center guarantees - Global databus 45
  • 46. Databus – Multi-Data Center Guarantees 46 US EU Emo (Cassandra) DatabusDatabus Consumer Producer Updates SoR with tag “HANDLE_THIS”Subscribed to tag “HANDLE_THIS” Databus only notifies US consumer when the update is replicated to US
  • 47. Databus – Multi-Data Center Guarantees • Emo does not put the entire document on the data bus. Only the key and “changeId” of the update delta goes on the bus. • EmoDB checks the following before handing out the event : • Either: the document as it appears in the local DC contains the changeID of the update • Or: the changeID is older than FCT (Full Consistency Timestamp) • Applications can achieve global consistency without requiring a GLOBAL level write! • EmoDB Databus replicates globally • And notifies you only when the changes are available with “local_quorum” consistency 47
  • 48. EmoDB is: • Globally distributed JSON key/value store • , which offers global consistent snapshots • with a built-in streaming data platform.
  • 49. But wait … there’s more ! • Queue Service • Features “Dedup” queue service • Blob Store – To store big files (photos, pdf, html files, etc.) • API Key Management system – out of the box • All based on Cassandra – Use your existing operational Cassandra skills and tools to maintain an Emo cluster 49
  • 50. EmoDB is open source! • https://bazaarvoice.github.io/emodb • https://github.com/bazaarvoice/emodb • EmoDB provides “out of the box” data infrastructure for a globally distributed service oriented architecture • It abstracts out many implementation details, and makes it easy for services to share data seamlessly, all the while making sure a consistent and comprehensive System of Record is available. 50
  • 51. EmoDB is open source! 51