SlideShare a Scribd company logo
Real-Time, Exactly-Once Data
Ingestion from Kafka to ClickHouse
Mohammad Roohitavaf, Jun Li
October 21, 2021
The Real-Time Analytics Processing Pipeline
ClickHouse as Real-Time Analytics Database
• ClickHouse: an open-source columnar database
to support OLAP
• Data insertion favors large blocks over individual
rows
• Kafka serves as data buffering
• A Block Aggregator is a data loader to aggregate
Kafka messages into large blocks before loading to
ClickHouse
Block Aggregator Failures
• With respect to block aggregator
• Kafka can fail
• Database backend can fail
• Network connections to Kafka and database can fail
• Block aggregator itself can crash
• Blindly retries on loading data will lead to data loss or data duplication to data
persisted in database
• Kafka transaction mechanism can not be applied here
Our Solution: Exactly-Once Message Delivery to ClickHouse
• To have aggregator to deterministically produce identical blocks to ClickHouse
• With existing runtime supports:
• Kafka metadata store to keep track of execution state, and
• ClickHouse’s block duplication detection
The Outline of the Talk
• The block aggregator developed for multi-DC deployment
• The deterministic message replay protocol in block aggregator
• The runtime verifier as a monitoring/debugging tool for block aggregator
• Issues and experiences in block aggregator’s implementation and deployment
• The block aggregator deployment in production
The Multi-DC Kafka/ClickHouse Deployment
• Each database shard has its own topic
• #partitions in topic = #replicas in shard
• Block aggregator co-located in each
replica (as two containers in a
Kubernetes pod)
• Block aggregator only inserts data to
local database replica (with ClickHouse
replication protocol to replicate data to
other replicas)
• Each block aggregator subscribes to
both Kafka clusters
The Multi-DC Kafka/ClickHouse Failure Scenario (1)
(Kafka DC Down)
The Multi-DC Kafka/ClickHouse Failure Scenario (2)
(DC Down)
(ClickHouse DC
Down)
• ClickHouse insert-quorum = 2
The Multi-DC Kafka/ClickHouse Failure Scenario (3)
(Kafka DC Down)
(ClickHouse
DC Down)
• ClickHouse insert-quorum = 2
Mappings of Topics, Tables, Rows, Messages
• One topic contains messages associated with multiple
tables in database
• One message contains multiple rows belonging to the
same table
• Each message is an opaque byte-array in Kafka based on
the protobuf-based encoding mechanism
• Block aggregator relies on ClickHouse table schema to
decode Kafka messages
• When a new table is added to database, no need to make
schema changes to Kafka clusters
• The number of topics does not grow as the tables continue
to be added
• Table rows constructed from Kafka messages in two Kafka
DCs get merged in database
The Block Aggregator Architecture
The Key Features of Block Aggregator
• Support multi-datacenter deployment model
• Multiple tables per topic/partition
• No data loss/duplication
• Monitoring with over hundred metrics:
• Message processing rates
• Block insertion rate and failure rate
• Block size distribution
• Block loading time distribution
• Kafka metadata commit time and failure rate
• Whether abnormal message consumption behaviors happened (such as message
offset re-wound or skipped)
The Outline of the Talk
• The block aggregator developed for multi-DC deployment
• The deterministic message replay protocol in block aggregator
• The runtime verifier as a monitoring/debugging tool for block aggregator
• Issues and experiences in block aggregator’s implementation and deployment
• The block aggregator deployment in production
A Naïve Way for Block Aggregator to Replay Messages (1)
A Naïve Way for Block Aggregator to Replay Messages (2)
Our Solution: Block-Level Deduplication in ClickHouse (1)
• ClickHouse relies on ZooKeeper to store metadata
• Each block stored contains a hash value
• New blocks to be inserted need to have hash uniqueness checked
• Blocks are identical if
• Having same block size
• Containing same rows
• And rows in same order
Our Solution: Guarantee to Form Identical Blocks (2)
• Store metadata back to Kafka which describes the latest blocks formed for
each table
• In case of failure, the next Block Aggregator that picks up the partition will
know exactly how to reconstruct the latest blocks formed for each table by
the previous Block Aggregator
• The Block Aggregators can be in two different ClickHouse replicas, if Kafka
partition rebalancing happens
The Metadata Structure
For each Kafka connector, the metadata persisted to Kafka, per partition, is:
replica_1,table1,0,29,20,table2,5,20,10
The last block for table1 decided to load to ClickHouse: [0, 29].
Starting offset min = 0, we have consumed 20 messages for table1.
The last block for table2 decided to load to ClickHouse: [5, 20].
Starting offset min = 0, we have consumed 10 messages for table2.
In total, we have consumed all 30 messages from offset min=0 to offset max=29: 20 for table 1 and 10 for table2.
replica-Id, [table-name, begin-msg-offset, end-msg-offset, count]+
Metadata.min = MIN (begin-msg-offset); Metadata.max = MAX(end-msg-offset)
The Metadata Structure for Special Block
• Special block: when begin-msg-offset = end-msg-offset + 1
• Either no message for the table with offset less than begin-msg-offset
• Or any message for the table with offset less than begin-msg-offset has been
received and acknowledged by ClickHouse
• Example: replica_id,table1,30,29,20,table2,5,20,10
• All messages with offset less than 30 for table1 are acknowledged by
ClickHouse
Message Processing Sequence: Consume/Commit/Load
The message processing
shown here is per partition
Two Execution Modes:
• Aggregators starts from the message offset previously committed
• REPLAY: Where aggregator retries sending the last blocks sent for each table to avoid
data loss
• CONSUME: Where aggregator is done with REPLAY and it is in the normal state
• Mode Switching:
DetermineState (current_offset, saved_metadata) {
begin=saved_metadata.min
end = saved_metadata.max
if (current_offset > end) state = CONSUME
else state = REPLAY
}
The Top-Level Processing Loop of A Kafka Connector
• For each Kafka Connector:
while (running){ //outer loop
wait for ClickHouse and Kafka to be healthy and connected
while (running){ // inner loop
batch = read a batch from Kafka if error, break inner loop
for (msg : batch.messages){
partitionHandlers[msg.partition].consume(msg) if error, break
inner loop
}
for (ph : partitionHandlers){
if (ph.state == CONSUME){
ph.checkBuffers() if error, break the inner loop
}
}
}
disconnect from Kafka
clear partitionHandlers
}
Consume loop
Check buffers loop
- Commit to Kafka
- Flush to ClickHouse
- Append message to its
table’s buffer
Elapsed time <= max_poll_interval
Some Clarifications
• Partition handlers can be dynamically created or deleted due to Kafka Broker’s decision
• Under some failure condition, one Kafka Connector can have > 1 partitions assigned
• Partition handler performs metadata commit on the corresponding partition
• Each partition handler can process multiple tables (because a Kafka partition can support
multiple tables)
• At any given time, each partition handler can only have one in-flight block, per table, to
be inserted to ClickHouse
• No new block can be submitted until the current in-flight block gets successful ACK from ClickHouse
• Thus, the metadata committed is just one block per table ahead, i.e., “Write Ahead Logging with One
Block”
• In other words, when replay happens, at most one block per table needs to be replayed
Some Clarifications (cont’d)
• If block insertion to ClickHouse fails,
• The outermost loop will disconnect the Kafka Connector from the Kafka Broker
• The Kafka consumer group rebalancing gets triggered automatically
• A different replica’s Kafka Connector will be assigned for the partition and block insertion
continues at this new replica
• Thus, rebalancing allows “Global Retries with Last Committed State” over multiple replicas
• The same failure handling mechanism can be applied, for example, when metadata
commit to Kafka fails
• Thus, Kafka consumer group rebalancing is an indicator on the situation in which a failure
cannot be recovered by a block aggregator
Example on Partition Rebalancing on Replicas
The following diagram shows two aggregators in one shard being killed (to simulate 1
datacenter down), and block insertion traffic gets picked up by the two remaining
aggregators in the same shard.
The Outline of the Talk
• The block aggregator developed for multi-DC deployment
• The deterministic message replay protocol in block aggregator
• The runtime verifier as a monitoring/debugging tool for block aggregator
• Issues and experiences in block aggregator’s implementation and deployment
• The block aggregator deployment in production
Runtime Verification
•Aggregator Verifier (AV): To check all blocks flushed by all aggregators to
ClickHouse not cause any data loss/duplication
•How can AV know what are the blocks flushed by the aggregators?
• Each aggregator commits metadata to Kafka before flushing anything to ClickHouse, for each
partition
• All metadata records committed by the aggregators will be appended to an internal topic in
Kafka called __consumer_offsets
• Thus, AV needs to subscribe to this topic and learn about all blocks flushed to ClickHouse by all
aggregators
Runtime Verification Algorithm
Let M.t.start and M.t.end be the start offset
and end offset for table t in metadata M,
respectively
For any given metadata instances M and M’,
where M committed happened before M’
committed, in time:
•Backward Anomaly: For some table t,
M’.t.end < M.t.start
•Overlap Anomaly: For some table t,
M.t.start < M’.t.end AND M’.t.start <
M.t.end
Runtime Verifier Implementation
•The verifier reads metadata instances in the commit order to Kafka, stored in the system
topic called _consumer_offset.
•The _consumer_offset is a partitioned topic and Kafka does not guarantee ordering across
partitions.
•We order metadata instances with respect to their commit timestamp at the brokers. This
approach requires the clock of the Kafka brokers to be synchronized with an uncertainty
window less than the time between committing two metadata instances. Thus, we should
not commit metadata to Kafka too frequently.
•This is not a problem in block aggregator, as it commits metadata to Kafka for each block
every several seconds, which is not very frequent compared to the clock skew.
The Outline of the Talk
• The block aggregator developed for multi-DC deployment
• The deterministic message replay protocol in block aggregator
• The runtime verifier as a monitoring/debugging tool for block aggregator
• Issues and experiences in block aggregator’s implementation and deployment
• The block aggregator deployment in production
Compile and Link ClickHouse into Block Aggregator
• Instead of using the C++ client library at the ClickHouse repo, we compiled
and linked the entire ClickHouse codebase to block aggregator
• It allows us to leverage the native ClickHouse implementation:
• Native TCP/IP communication protocol (with TLS and connection pooling)
• Select query capabilities just like ClickHouse-Client (for testing purpose)
• Table schema retrieval, and block header construction from schema
• Column construction from protobuf-based Kafka message deserialization
• Column default expression evaluation
• ZooKeeper client for distributed locking
Dynamic Table Schema Update
• To dynamically update a table schema:
• Step 1: Table schema is updated to each ClickHouse shard
• Step 2: Block aggregators in each shard is restarted, thus to load updated schema from the
co-located ClickHouse replica
• Step 3: With offline confirmation on schema update, the client application updates its
application logic to follow the updated schema to produce new Kafka messages
• Requirement: Block aggregator needs to be able to deserialize the Kafka
messages into blocks, for the messages with or without the updated schema
• Solution: to enforce that columns in a table schema can only be added and
can not be deleted afterwards
Multiple ZooKeeper Clusters for One ClickHouse Cluster
• ClickHouse relies on ZooKeeper as metadata store and replication coordination
• Each block insertion takes roughly 15 remote calls to ZooKeeper server cluster
• Block insertion is performed per table
• Our ZooKeeper (with 3.5.8) cluster is deployed across three datacenters with ~ 20 ms cross-
datacenter communication latency
• For a large ClickHouse cluster with 250 shards (with each shard having 4 replicas), a single
ZooKeeper deployment can introduce high ZooKeeper “hardware exception” rate
• The exception due to ZooKeeper session frequently expired
• Multiple ZooKeeper clusters are deployed instead, with each allocated with a subset of the
ClickHouse shards
• In our deployment, 50 shards share one ZK cluster
• It depends on block insertion rate per table, and total number of tables involved in real-time
insertion
Distributed Locking at Block Aggregator
• Before “insert_quorum_parallel” is introduced in ClickHouse,
• In each shard, for each table, only one replica is allowed to perform data insertion
• Distributed locking is used to coordinate block insertion at block aggregators
• The ZooKeeper locking implementation in ClickHouse is used
• More recent ClickHouse version has “insert_quorum_parallel” introduced
• The default value is true
• According to the Altinity blog article, current ClickHouse implementation breaks
sequential consistency and may have other side effects
• In our recent product release based on ClickHouse 21.8, we turned this option off
• And we still enforce distributed locking at block aggregator
Testing on Block Aggregator
• Resiliency Testing (in an 8-shard cluster with 32 replicas )
• Follow the “Chaos Monkey” approach
• Kill: individual processes and individual containers, across ZooKeeper, ClickHouse, Block Aggregator
• Kill: all processes and containers in one datacenter, across ZooKeeper, ClickHouse, Block Aggregator
• To validate whether data loading can recover and continue
• Smaller-scale integration testing
• The whole cluster runs on a single machine with multiple processes from ZooKeeper, ClickHouse and
Block Aggregators
• Programmatically control process start/stop, along with small table insertion
• In addition, to turn on fault injection at predefined points in Block Aggregator code
- For example, to not accept Kafka messages deliberately for 10 seconds
• Validate whether data loss and data duplication happens
ClickHouse Troubleshooting and Remediation
• The setting “insert_quorum = 2” is to guarantee high data reliability
• ClickHouse Exception (with error code = 286) can happen occasionally:
2021.04.10 16:26:38.896509 [ 59963 ] {8421e4d6-43f0-4792-8570-7ef2bf8f595a} <Error> executeQuery: Code: 286, e.displayText()
= DB::Exception: Quorum for previous write has not been satisfied yet. Status: version: 1
part_name: 20210410-0_990_990_0
required_number_of_replicas: 2
actual_number_of_replicas: 1
replicas: SLC-74137
Data insertion in the whole shard stops
when this exception happens!
ClickHouse Troubleshooting and Remediation (cont’d)
• An inhouse tool is developed to:
• scan ZooKeeper subtree associated with log replication queues
• inspect why queued commands cannot be performed
• Once queued commands all get cleared, the quorum then automatically gets satisfied
• Afterwards, data insertion resumes in the shard
• Real-time alerts are defined:
• Long duration time that a shard does not have block insertion
• Block insertion experiences non-zero failure rate with error code = 286
• Some replicas have their replication queues too large
The Outline of the Talk
• The block aggregator developed for multi-DC deployment
• The deterministic message replay protocol in block aggregator
• The runtime verifier as a monitoring/debugging tool for block aggregator
• Issues and experiences in block aggregator’s implementation and deployment
• The block aggregator deployment in production
Block Aggregator Deployment in Production
One Example Deployment
Kafka Clusters: 2 Datacenters
The ClickHouse Cluster:
*2 datacenters
*250 shards
*Each shard having 4 replicas (2 replica
per DC)
*Each aggregator co-located in each
replica
Metric Measured Result
Total messages processed/sec (peak) 280 K
Total message bytes processed/sec (peak) 220 MB/sec
95%-tile block insertion time (quorum=2) 3.8 sec (for table 1)
1.1 sec (for table 2)
4.0 sec (for table 3)
95%-tile block size 0.16 MB (for table 1)
0.03 MB (for table 2)
0.46 MB (for table 3)
95%-tile number of rows in a block 1358 rows (for table 1)
1.8 rows (for table 2)
1894 rows (for table 3)
95%-tile Kafka commit time 64 ms
End-to-end message consumption Lag time < 30 sec
Block Aggregator Deployment in Production
•The block insertion rate at the shard level in a 24-hour window
Block Aggregator Deployment in Production
•The message consumption LAG time at the shard level captured in a 24-hour window
Block Aggregator Deployment in Production
•The Kafka Group Rebalance Rate at the shard level in a 24-hour window (always 0)
Block Aggregator Deployment in Production
•The ZooKeeper hardware exception in a 24-hour window (close to 0)
Summary
•Using streaming platforms like Kafka is one standard way to transfer data across data
processing systems
•For Columnar DB, block loading is more efficient than loading individual records
•Under failure conditions, replaying Kafka messages may cause data loss or data duplication at
block loaders
•Our solution is to deterministically produce identical blocks under various failure conditions so
that the backend Columnar DB can detect and remove duplicated blocks
•The same solution allows us to verify that blocks are always produced correctly under failure
conditions
•This solution has been developed and deployed into production
Ad

More Related Content

What's hot (20)

Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouse
Altinity Ltd
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Altinity Ltd
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Altinity Ltd
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
Altinity Ltd
 
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Altinity Ltd
 
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Mydbops
 
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
Altinity Ltd
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
Altinity Ltd
 
Altinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouseAltinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouse
Altinity Ltd
 
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
HostedbyConfluent
 
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesA Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
Altinity Ltd
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
Altinity Ltd
 
ClickHouse Monitoring 101: What to monitor and how
ClickHouse Monitoring 101: What to monitor and howClickHouse Monitoring 101: What to monitor and how
ClickHouse Monitoring 101: What to monitor and how
Altinity Ltd
 
ClickHouse Intro
ClickHouse IntroClickHouse Intro
ClickHouse Intro
Yegor Andreenko
 
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlareClickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Altinity Ltd
 
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
Altinity Ltd
 
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEOTricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Altinity Ltd
 
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Altinity Ltd
 
[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouse[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouse
Vianney FOUCAULT
 
Inside the InfluxDB storage engine
Inside the InfluxDB storage engineInside the InfluxDB storage engine
Inside the InfluxDB storage engine
InfluxData
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouse
Altinity Ltd
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Altinity Ltd
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Altinity Ltd
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
Altinity Ltd
 
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Altinity Ltd
 
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Mydbops
 
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
Altinity Ltd
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
Altinity Ltd
 
Altinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouseAltinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouse
Altinity Ltd
 
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
HostedbyConfluent
 
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesA Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
Altinity Ltd
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
Altinity Ltd
 
ClickHouse Monitoring 101: What to monitor and how
ClickHouse Monitoring 101: What to monitor and howClickHouse Monitoring 101: What to monitor and how
ClickHouse Monitoring 101: What to monitor and how
Altinity Ltd
 
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlareClickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Altinity Ltd
 
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
Altinity Ltd
 
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEOTricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Altinity Ltd
 
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Altinity Ltd
 
[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouse[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouse
Vianney FOUCAULT
 
Inside the InfluxDB storage engine
Inside the InfluxDB storage engineInside the InfluxDB storage engine
Inside the InfluxDB storage engine
InfluxData
 

Similar to Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay (20)

Swift container sync
Swift container syncSwift container sync
Swift container sync
Open Stack
 
Building an Event Bus at Scale
Building an Event Bus at ScaleBuilding an Event Bus at Scale
Building an Event Bus at Scale
jimriecken
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache Kafka
Angelo Cesaro
 
Kafka zero to hero
Kafka zero to heroKafka zero to hero
Kafka zero to hero
Avi Levi
 
Apache Kafka - From zero to hero
Apache Kafka - From zero to heroApache Kafka - From zero to hero
Apache Kafka - From zero to hero
Apache Kafka TLV
 
Strict-Data-Consistency-in-Distrbuted-Systems-With-Failures
Strict-Data-Consistency-in-Distrbuted-Systems-With-FailuresStrict-Data-Consistency-in-Distrbuted-Systems-With-Failures
Strict-Data-Consistency-in-Distrbuted-Systems-With-Failures
Slava Imeshev
 
Deep dive into Apache Kafka consumption
Deep dive into Apache Kafka consumptionDeep dive into Apache Kafka consumption
Deep dive into Apache Kafka consumption
Alexandre Tamborrino
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017
Gwen (Chen) Shapira
 
Exactly-once Stream Processing Done Right with Matthias J Sax
Exactly-once Stream Processing Done Right with Matthias J SaxExactly-once Stream Processing Done Right with Matthias J Sax
Exactly-once Stream Processing Done Right with Matthias J Sax
HostedbyConfluent
 
Linked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache PulsarLinked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache Pulsar
Karthik Ramasamy
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streams
Yoni Farin
 
Stateful streaming and the challenge of state
Stateful streaming and the challenge of stateStateful streaming and the challenge of state
Stateful streaming and the challenge of state
Yoni Farin
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
Chhavi Parasher
 
Kafka101
Kafka101Kafka101
Kafka101
Aparna Pillai
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
SudheerKumar499932
 
OSMC 2016 | Monasca: Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
OSMC 2016 | Monasca: Monitoring-as-a-Service (at-Scale) by Roland HochmuthOSMC 2016 | Monasca: Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
OSMC 2016 | Monasca: Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
NETWAYS
 
OSMC 2016 - Monasca - Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
OSMC 2016 - Monasca - Monitoring-as-a-Service (at-Scale) by Roland HochmuthOSMC 2016 - Monasca - Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
OSMC 2016 - Monasca - Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
NETWAYS
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Srikrishna k
 
Disaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache KafkaDisaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache Kafka
confluent
 
Modern Distributed Messaging and RPC
Modern Distributed Messaging and RPCModern Distributed Messaging and RPC
Modern Distributed Messaging and RPC
Max Alexejev
 
Swift container sync
Swift container syncSwift container sync
Swift container sync
Open Stack
 
Building an Event Bus at Scale
Building an Event Bus at ScaleBuilding an Event Bus at Scale
Building an Event Bus at Scale
jimriecken
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache Kafka
Angelo Cesaro
 
Kafka zero to hero
Kafka zero to heroKafka zero to hero
Kafka zero to hero
Avi Levi
 
Apache Kafka - From zero to hero
Apache Kafka - From zero to heroApache Kafka - From zero to hero
Apache Kafka - From zero to hero
Apache Kafka TLV
 
Strict-Data-Consistency-in-Distrbuted-Systems-With-Failures
Strict-Data-Consistency-in-Distrbuted-Systems-With-FailuresStrict-Data-Consistency-in-Distrbuted-Systems-With-Failures
Strict-Data-Consistency-in-Distrbuted-Systems-With-Failures
Slava Imeshev
 
Deep dive into Apache Kafka consumption
Deep dive into Apache Kafka consumptionDeep dive into Apache Kafka consumption
Deep dive into Apache Kafka consumption
Alexandre Tamborrino
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017
Gwen (Chen) Shapira
 
Exactly-once Stream Processing Done Right with Matthias J Sax
Exactly-once Stream Processing Done Right with Matthias J SaxExactly-once Stream Processing Done Right with Matthias J Sax
Exactly-once Stream Processing Done Right with Matthias J Sax
HostedbyConfluent
 
Linked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache PulsarLinked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache Pulsar
Karthik Ramasamy
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streams
Yoni Farin
 
Stateful streaming and the challenge of state
Stateful streaming and the challenge of stateStateful streaming and the challenge of state
Stateful streaming and the challenge of state
Yoni Farin
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
Chhavi Parasher
 
OSMC 2016 | Monasca: Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
OSMC 2016 | Monasca: Monitoring-as-a-Service (at-Scale) by Roland HochmuthOSMC 2016 | Monasca: Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
OSMC 2016 | Monasca: Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
NETWAYS
 
OSMC 2016 - Monasca - Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
OSMC 2016 - Monasca - Monitoring-as-a-Service (at-Scale) by Roland HochmuthOSMC 2016 - Monasca - Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
OSMC 2016 - Monasca - Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
NETWAYS
 
Disaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache KafkaDisaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache Kafka
confluent
 
Modern Distributed Messaging and RPC
Modern Distributed Messaging and RPCModern Distributed Messaging and RPC
Modern Distributed Messaging and RPC
Max Alexejev
 
Ad

More from Altinity Ltd (20)

Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptxBuilding an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Altinity Ltd
 
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Altinity Ltd
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open SourceBuilding an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Altinity Ltd
 
Fun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdfFun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdf
Altinity Ltd
 
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdfCloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Altinity Ltd
 
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Altinity Ltd
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Altinity Ltd
 
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdfOwn your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Altinity Ltd
 
ClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom AppsClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom Apps
Altinity Ltd
 
Adventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAdventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree Engine
Altinity Ltd
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Altinity Ltd
 
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdfAltinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Ltd
 
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
Altinity Ltd
 
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdfOSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
Altinity Ltd
 
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
Altinity Ltd
 
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
Altinity Ltd
 
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
Altinity Ltd
 
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
Altinity Ltd
 
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
Altinity Ltd
 
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdfOSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
Altinity Ltd
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptxBuilding an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Altinity Ltd
 
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Altinity Ltd
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open SourceBuilding an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Altinity Ltd
 
Fun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdfFun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdf
Altinity Ltd
 
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdfCloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Altinity Ltd
 
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Altinity Ltd
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Altinity Ltd
 
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdfOwn your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Altinity Ltd
 
ClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom AppsClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom Apps
Altinity Ltd
 
Adventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAdventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree Engine
Altinity Ltd
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Altinity Ltd
 
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdfAltinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Ltd
 
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
Altinity Ltd
 
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdfOSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
Altinity Ltd
 
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
Altinity Ltd
 
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
Altinity Ltd
 
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
Altinity Ltd
 
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
Altinity Ltd
 
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
Altinity Ltd
 
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdfOSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
Altinity Ltd
 
Ad

Recently uploaded (20)

F-Secure Freedome VPN 2025 Crack Plus Activation New Version
F-Secure Freedome VPN 2025 Crack Plus Activation  New VersionF-Secure Freedome VPN 2025 Crack Plus Activation  New Version
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
saimabibi60507
 
Exploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the FutureExploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the Future
ICS
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& ConsiderationsDesigning AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Dinusha Kumarasiri
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
EASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License CodeEASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License Code
aneelaramzan63
 
Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025
kashifyounis067
 
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
Douwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License codeDouwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License code
aneelaramzan63
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
steaveroggers
 
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Lionel Briand
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage DashboardsAdobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
BradBedford3
 
Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
Societal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainabilitySocietal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainability
Jordi Cabot
 
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
F-Secure Freedome VPN 2025 Crack Plus Activation  New VersionF-Secure Freedome VPN 2025 Crack Plus Activation  New Version
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
saimabibi60507
 
Exploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the FutureExploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the Future
ICS
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& ConsiderationsDesigning AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Dinusha Kumarasiri
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
EASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License CodeEASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License Code
aneelaramzan63
 
Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025
kashifyounis067
 
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
Douwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License codeDouwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License code
aneelaramzan63
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
steaveroggers
 
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Lionel Briand
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage DashboardsAdobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
BradBedford3
 
Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
Societal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainabilitySocietal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainability
Jordi Cabot
 

Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay

  • 1. Real-Time, Exactly-Once Data Ingestion from Kafka to ClickHouse Mohammad Roohitavaf, Jun Li October 21, 2021
  • 2. The Real-Time Analytics Processing Pipeline
  • 3. ClickHouse as Real-Time Analytics Database • ClickHouse: an open-source columnar database to support OLAP • Data insertion favors large blocks over individual rows • Kafka serves as data buffering • A Block Aggregator is a data loader to aggregate Kafka messages into large blocks before loading to ClickHouse
  • 4. Block Aggregator Failures • With respect to block aggregator • Kafka can fail • Database backend can fail • Network connections to Kafka and database can fail • Block aggregator itself can crash • Blindly retries on loading data will lead to data loss or data duplication to data persisted in database • Kafka transaction mechanism can not be applied here
  • 5. Our Solution: Exactly-Once Message Delivery to ClickHouse • To have aggregator to deterministically produce identical blocks to ClickHouse • With existing runtime supports: • Kafka metadata store to keep track of execution state, and • ClickHouse’s block duplication detection
  • 6. The Outline of the Talk • The block aggregator developed for multi-DC deployment • The deterministic message replay protocol in block aggregator • The runtime verifier as a monitoring/debugging tool for block aggregator • Issues and experiences in block aggregator’s implementation and deployment • The block aggregator deployment in production
  • 7. The Multi-DC Kafka/ClickHouse Deployment • Each database shard has its own topic • #partitions in topic = #replicas in shard • Block aggregator co-located in each replica (as two containers in a Kubernetes pod) • Block aggregator only inserts data to local database replica (with ClickHouse replication protocol to replicate data to other replicas) • Each block aggregator subscribes to both Kafka clusters
  • 8. The Multi-DC Kafka/ClickHouse Failure Scenario (1) (Kafka DC Down)
  • 9. The Multi-DC Kafka/ClickHouse Failure Scenario (2) (DC Down) (ClickHouse DC Down) • ClickHouse insert-quorum = 2
  • 10. The Multi-DC Kafka/ClickHouse Failure Scenario (3) (Kafka DC Down) (ClickHouse DC Down) • ClickHouse insert-quorum = 2
  • 11. Mappings of Topics, Tables, Rows, Messages • One topic contains messages associated with multiple tables in database • One message contains multiple rows belonging to the same table • Each message is an opaque byte-array in Kafka based on the protobuf-based encoding mechanism • Block aggregator relies on ClickHouse table schema to decode Kafka messages • When a new table is added to database, no need to make schema changes to Kafka clusters • The number of topics does not grow as the tables continue to be added • Table rows constructed from Kafka messages in two Kafka DCs get merged in database
  • 12. The Block Aggregator Architecture
  • 13. The Key Features of Block Aggregator • Support multi-datacenter deployment model • Multiple tables per topic/partition • No data loss/duplication • Monitoring with over hundred metrics: • Message processing rates • Block insertion rate and failure rate • Block size distribution • Block loading time distribution • Kafka metadata commit time and failure rate • Whether abnormal message consumption behaviors happened (such as message offset re-wound or skipped)
  • 14. The Outline of the Talk • The block aggregator developed for multi-DC deployment • The deterministic message replay protocol in block aggregator • The runtime verifier as a monitoring/debugging tool for block aggregator • Issues and experiences in block aggregator’s implementation and deployment • The block aggregator deployment in production
  • 15. A Naïve Way for Block Aggregator to Replay Messages (1)
  • 16. A Naïve Way for Block Aggregator to Replay Messages (2)
  • 17. Our Solution: Block-Level Deduplication in ClickHouse (1) • ClickHouse relies on ZooKeeper to store metadata • Each block stored contains a hash value • New blocks to be inserted need to have hash uniqueness checked • Blocks are identical if • Having same block size • Containing same rows • And rows in same order
  • 18. Our Solution: Guarantee to Form Identical Blocks (2) • Store metadata back to Kafka which describes the latest blocks formed for each table • In case of failure, the next Block Aggregator that picks up the partition will know exactly how to reconstruct the latest blocks formed for each table by the previous Block Aggregator • The Block Aggregators can be in two different ClickHouse replicas, if Kafka partition rebalancing happens
  • 19. The Metadata Structure For each Kafka connector, the metadata persisted to Kafka, per partition, is: replica_1,table1,0,29,20,table2,5,20,10 The last block for table1 decided to load to ClickHouse: [0, 29]. Starting offset min = 0, we have consumed 20 messages for table1. The last block for table2 decided to load to ClickHouse: [5, 20]. Starting offset min = 0, we have consumed 10 messages for table2. In total, we have consumed all 30 messages from offset min=0 to offset max=29: 20 for table 1 and 10 for table2. replica-Id, [table-name, begin-msg-offset, end-msg-offset, count]+ Metadata.min = MIN (begin-msg-offset); Metadata.max = MAX(end-msg-offset)
  • 20. The Metadata Structure for Special Block • Special block: when begin-msg-offset = end-msg-offset + 1 • Either no message for the table with offset less than begin-msg-offset • Or any message for the table with offset less than begin-msg-offset has been received and acknowledged by ClickHouse • Example: replica_id,table1,30,29,20,table2,5,20,10 • All messages with offset less than 30 for table1 are acknowledged by ClickHouse
  • 21. Message Processing Sequence: Consume/Commit/Load The message processing shown here is per partition
  • 22. Two Execution Modes: • Aggregators starts from the message offset previously committed • REPLAY: Where aggregator retries sending the last blocks sent for each table to avoid data loss • CONSUME: Where aggregator is done with REPLAY and it is in the normal state • Mode Switching: DetermineState (current_offset, saved_metadata) { begin=saved_metadata.min end = saved_metadata.max if (current_offset > end) state = CONSUME else state = REPLAY }
  • 23. The Top-Level Processing Loop of A Kafka Connector • For each Kafka Connector: while (running){ //outer loop wait for ClickHouse and Kafka to be healthy and connected while (running){ // inner loop batch = read a batch from Kafka if error, break inner loop for (msg : batch.messages){ partitionHandlers[msg.partition].consume(msg) if error, break inner loop } for (ph : partitionHandlers){ if (ph.state == CONSUME){ ph.checkBuffers() if error, break the inner loop } } } disconnect from Kafka clear partitionHandlers } Consume loop Check buffers loop - Commit to Kafka - Flush to ClickHouse - Append message to its table’s buffer Elapsed time <= max_poll_interval
  • 24. Some Clarifications • Partition handlers can be dynamically created or deleted due to Kafka Broker’s decision • Under some failure condition, one Kafka Connector can have > 1 partitions assigned • Partition handler performs metadata commit on the corresponding partition • Each partition handler can process multiple tables (because a Kafka partition can support multiple tables) • At any given time, each partition handler can only have one in-flight block, per table, to be inserted to ClickHouse • No new block can be submitted until the current in-flight block gets successful ACK from ClickHouse • Thus, the metadata committed is just one block per table ahead, i.e., “Write Ahead Logging with One Block” • In other words, when replay happens, at most one block per table needs to be replayed
  • 25. Some Clarifications (cont’d) • If block insertion to ClickHouse fails, • The outermost loop will disconnect the Kafka Connector from the Kafka Broker • The Kafka consumer group rebalancing gets triggered automatically • A different replica’s Kafka Connector will be assigned for the partition and block insertion continues at this new replica • Thus, rebalancing allows “Global Retries with Last Committed State” over multiple replicas • The same failure handling mechanism can be applied, for example, when metadata commit to Kafka fails • Thus, Kafka consumer group rebalancing is an indicator on the situation in which a failure cannot be recovered by a block aggregator
  • 26. Example on Partition Rebalancing on Replicas The following diagram shows two aggregators in one shard being killed (to simulate 1 datacenter down), and block insertion traffic gets picked up by the two remaining aggregators in the same shard.
  • 27. The Outline of the Talk • The block aggregator developed for multi-DC deployment • The deterministic message replay protocol in block aggregator • The runtime verifier as a monitoring/debugging tool for block aggregator • Issues and experiences in block aggregator’s implementation and deployment • The block aggregator deployment in production
  • 28. Runtime Verification •Aggregator Verifier (AV): To check all blocks flushed by all aggregators to ClickHouse not cause any data loss/duplication •How can AV know what are the blocks flushed by the aggregators? • Each aggregator commits metadata to Kafka before flushing anything to ClickHouse, for each partition • All metadata records committed by the aggregators will be appended to an internal topic in Kafka called __consumer_offsets • Thus, AV needs to subscribe to this topic and learn about all blocks flushed to ClickHouse by all aggregators
  • 29. Runtime Verification Algorithm Let M.t.start and M.t.end be the start offset and end offset for table t in metadata M, respectively For any given metadata instances M and M’, where M committed happened before M’ committed, in time: •Backward Anomaly: For some table t, M’.t.end < M.t.start •Overlap Anomaly: For some table t, M.t.start < M’.t.end AND M’.t.start < M.t.end
  • 30. Runtime Verifier Implementation •The verifier reads metadata instances in the commit order to Kafka, stored in the system topic called _consumer_offset. •The _consumer_offset is a partitioned topic and Kafka does not guarantee ordering across partitions. •We order metadata instances with respect to their commit timestamp at the brokers. This approach requires the clock of the Kafka brokers to be synchronized with an uncertainty window less than the time between committing two metadata instances. Thus, we should not commit metadata to Kafka too frequently. •This is not a problem in block aggregator, as it commits metadata to Kafka for each block every several seconds, which is not very frequent compared to the clock skew.
  • 31. The Outline of the Talk • The block aggregator developed for multi-DC deployment • The deterministic message replay protocol in block aggregator • The runtime verifier as a monitoring/debugging tool for block aggregator • Issues and experiences in block aggregator’s implementation and deployment • The block aggregator deployment in production
  • 32. Compile and Link ClickHouse into Block Aggregator • Instead of using the C++ client library at the ClickHouse repo, we compiled and linked the entire ClickHouse codebase to block aggregator • It allows us to leverage the native ClickHouse implementation: • Native TCP/IP communication protocol (with TLS and connection pooling) • Select query capabilities just like ClickHouse-Client (for testing purpose) • Table schema retrieval, and block header construction from schema • Column construction from protobuf-based Kafka message deserialization • Column default expression evaluation • ZooKeeper client for distributed locking
  • 33. Dynamic Table Schema Update • To dynamically update a table schema: • Step 1: Table schema is updated to each ClickHouse shard • Step 2: Block aggregators in each shard is restarted, thus to load updated schema from the co-located ClickHouse replica • Step 3: With offline confirmation on schema update, the client application updates its application logic to follow the updated schema to produce new Kafka messages • Requirement: Block aggregator needs to be able to deserialize the Kafka messages into blocks, for the messages with or without the updated schema • Solution: to enforce that columns in a table schema can only be added and can not be deleted afterwards
  • 34. Multiple ZooKeeper Clusters for One ClickHouse Cluster • ClickHouse relies on ZooKeeper as metadata store and replication coordination • Each block insertion takes roughly 15 remote calls to ZooKeeper server cluster • Block insertion is performed per table • Our ZooKeeper (with 3.5.8) cluster is deployed across three datacenters with ~ 20 ms cross- datacenter communication latency • For a large ClickHouse cluster with 250 shards (with each shard having 4 replicas), a single ZooKeeper deployment can introduce high ZooKeeper “hardware exception” rate • The exception due to ZooKeeper session frequently expired • Multiple ZooKeeper clusters are deployed instead, with each allocated with a subset of the ClickHouse shards • In our deployment, 50 shards share one ZK cluster • It depends on block insertion rate per table, and total number of tables involved in real-time insertion
  • 35. Distributed Locking at Block Aggregator • Before “insert_quorum_parallel” is introduced in ClickHouse, • In each shard, for each table, only one replica is allowed to perform data insertion • Distributed locking is used to coordinate block insertion at block aggregators • The ZooKeeper locking implementation in ClickHouse is used • More recent ClickHouse version has “insert_quorum_parallel” introduced • The default value is true • According to the Altinity blog article, current ClickHouse implementation breaks sequential consistency and may have other side effects • In our recent product release based on ClickHouse 21.8, we turned this option off • And we still enforce distributed locking at block aggregator
  • 36. Testing on Block Aggregator • Resiliency Testing (in an 8-shard cluster with 32 replicas ) • Follow the “Chaos Monkey” approach • Kill: individual processes and individual containers, across ZooKeeper, ClickHouse, Block Aggregator • Kill: all processes and containers in one datacenter, across ZooKeeper, ClickHouse, Block Aggregator • To validate whether data loading can recover and continue • Smaller-scale integration testing • The whole cluster runs on a single machine with multiple processes from ZooKeeper, ClickHouse and Block Aggregators • Programmatically control process start/stop, along with small table insertion • In addition, to turn on fault injection at predefined points in Block Aggregator code - For example, to not accept Kafka messages deliberately for 10 seconds • Validate whether data loss and data duplication happens
  • 37. ClickHouse Troubleshooting and Remediation • The setting “insert_quorum = 2” is to guarantee high data reliability • ClickHouse Exception (with error code = 286) can happen occasionally: 2021.04.10 16:26:38.896509 [ 59963 ] {8421e4d6-43f0-4792-8570-7ef2bf8f595a} <Error> executeQuery: Code: 286, e.displayText() = DB::Exception: Quorum for previous write has not been satisfied yet. Status: version: 1 part_name: 20210410-0_990_990_0 required_number_of_replicas: 2 actual_number_of_replicas: 1 replicas: SLC-74137 Data insertion in the whole shard stops when this exception happens!
  • 38. ClickHouse Troubleshooting and Remediation (cont’d) • An inhouse tool is developed to: • scan ZooKeeper subtree associated with log replication queues • inspect why queued commands cannot be performed • Once queued commands all get cleared, the quorum then automatically gets satisfied • Afterwards, data insertion resumes in the shard • Real-time alerts are defined: • Long duration time that a shard does not have block insertion • Block insertion experiences non-zero failure rate with error code = 286 • Some replicas have their replication queues too large
  • 39. The Outline of the Talk • The block aggregator developed for multi-DC deployment • The deterministic message replay protocol in block aggregator • The runtime verifier as a monitoring/debugging tool for block aggregator • Issues and experiences in block aggregator’s implementation and deployment • The block aggregator deployment in production
  • 40. Block Aggregator Deployment in Production One Example Deployment Kafka Clusters: 2 Datacenters The ClickHouse Cluster: *2 datacenters *250 shards *Each shard having 4 replicas (2 replica per DC) *Each aggregator co-located in each replica Metric Measured Result Total messages processed/sec (peak) 280 K Total message bytes processed/sec (peak) 220 MB/sec 95%-tile block insertion time (quorum=2) 3.8 sec (for table 1) 1.1 sec (for table 2) 4.0 sec (for table 3) 95%-tile block size 0.16 MB (for table 1) 0.03 MB (for table 2) 0.46 MB (for table 3) 95%-tile number of rows in a block 1358 rows (for table 1) 1.8 rows (for table 2) 1894 rows (for table 3) 95%-tile Kafka commit time 64 ms End-to-end message consumption Lag time < 30 sec
  • 41. Block Aggregator Deployment in Production •The block insertion rate at the shard level in a 24-hour window
  • 42. Block Aggregator Deployment in Production •The message consumption LAG time at the shard level captured in a 24-hour window
  • 43. Block Aggregator Deployment in Production •The Kafka Group Rebalance Rate at the shard level in a 24-hour window (always 0)
  • 44. Block Aggregator Deployment in Production •The ZooKeeper hardware exception in a 24-hour window (close to 0)
  • 45. Summary •Using streaming platforms like Kafka is one standard way to transfer data across data processing systems •For Columnar DB, block loading is more efficient than loading individual records •Under failure conditions, replaying Kafka messages may cause data loss or data duplication at block loaders •Our solution is to deterministically produce identical blocks under various failure conditions so that the backend Columnar DB can detect and remove duplicated blocks •The same solution allows us to verify that blocks are always produced correctly under failure conditions •This solution has been developed and deployed into production