SlideShare a Scribd company logo
© 2022 Ververica
Apache Kafka’s Transactions in the
Wild! Developing an exactly-once
KafkaSink in Apache Flink
Fabian Paul, Ververica - Kafka Summit London 2022
© 2022 Ververica
About Ververica
Original creators of
Apache Flink®
Complete Stream
Processing Infrastructure
© 2022 Ververica
● Apache Kafka is one of the the most widely used tools to support
stream processing use cases with Apache Flink
● Different delivery guarantees in processing framework
○ At most once
○ At least once
○ Exactly once
Motivation
© 2022 Ververica
● Apache Kafka is one of the the most widely used tools to support
stream processing use cases with Apache Flink
● Different delivery guarantees in processing framework
○ At most once
○ At least once
○ Exactly once
● Demand for streaming applications with stronger guarantees
increases constantly i.e. financial data processing
Motivation
© 2022 Ververica
Recap Apache Flink
© 2022 Ververica
Apache Flink
Apache Flink, Flink®, Apache®, the squirrel logo, and the Apache feather logo are either registered trademarks or trademarks of The Apache Software Foundation.
© 2022 Ververica
Apache Flink
Apache Flink, Flink®, Apache®, the squirrel logo, and the Apache feather logo are either registered trademarks or trademarks of The Apache Software Foundation.
© 2022 Ververica
Apache Flink
● Many functions are stateful
○ Streaming data arrives over time
○ Functions need to remember records or temporary results
● Any variable that lives across function invocations is state
○ State must not be lost in case of a failure
Fault Tolerance
© 2022 Ververica
Apache Flink
● Many functions are stateful
○ Streaming data arrives over time
○ Functions need to remember records or temporary results
● Any variable that lives across function invocations is state
○ State must not be lost in case of a failure
● Periodically ingest checkpoint barriers into the data stream
● Save task state independently but at the same event time
Fault Tolerance
© 2022 Ververica
Apache Flink
Fault Tolerance
© 2022 Ververica
Apache Flink
Fault Tolerance
© 2022 Ververica
Apache Flink
Fault Tolerance
© 2022 Ververica
Apache Flink
Fault Tolerance
© 2022 Ververica
Apache Flink
Fault Tolerance
© 2022 Ververica
Apache Flink
Fault Tolerance
© 2022 Ververica
Apache Flink
● Supports writing data to external system for streaming and batch
applications without dedicated implementations
● Offers different mixin interfaces for stateless and stateful sinks
● Utilizes a two phase commit protocol between a Writer and
Committer Operator to ensure exactly once guarantees
Unified Sink Framework
© 2022 Ververica
Apache Flink
Unified Sink Framework
void write(InputT element, Context context) throws IOException, InterruptedException;
void flush(boolean endOfInput) throws IOException, InterruptedException;
Collection<CommT> prepareCommit() throws IOException, InterruptedException;
List<WriterStateT> snapshotState(long checkpointId) throws IOException;
© 2022 Ververica
Apache Flink
Unified Sink Framework
void write(InputT element, Context context) throws IOException, InterruptedException;
void flush(boolean endOfInput) throws IOException, InterruptedException;
Collection<CommT> prepareCommit() throws IOException, InterruptedException;
List<WriterStateT> snapshotState(long checkpointId) throws IOException;
Triggered for every
incoming element and
supposed to write to the
external system
© 2022 Ververica
Apache Flink
Unified Sink Framework
void write(InputT element, Context context) throws IOException, InterruptedException;
void flush(boolean endOfInput) throws IOException, InterruptedException;
Collection<CommT> prepareCommit() throws IOException, InterruptedException;
List<WriterStateT> snapshotState(long checkpointId) throws IOException;
Triggered before receiving the
checkpoint barrier. After the
method completes all records
need to be persisted in the
external system.
© 2022 Ververica
Apache Flink
Unified Sink Framework
void write(InputT element, Context context) throws IOException, InterruptedException;
void flush(boolean endOfInput) throws IOException, InterruptedException;
Collection<CommT> prepareCommit() throws IOException, InterruptedException;
List<WriterStateT> snapshotState(long checkpointId) throws IOException;
Triggered before the
checkpoint barrier. The
returned collection is
forwarded to the committer
operator.
© 2022 Ververica
Apache Flink
Unified Sink Framework
void write(InputT element, Context context) throws IOException, InterruptedException;
void flush(boolean endOfInput) throws IOException, InterruptedException;
Collection<CommT> prepareCommit() throws IOException, InterruptedException;
List<WriterStateT> snapshotState(long checkpointId) throws IOException;
Triggered before the
checkpoint barrier. The
returned collection (i.e. open
transactions) is forwarded to
the committer operator.
© 2022 Ververica
Apache Flink
Unified Sink Framework
void write(InputT element, Context context) throws IOException, InterruptedException;
void flush(boolean endOfInput) throws IOException, InterruptedException;
Collection<CommT> prepareCommit() throws IOException, InterruptedException;
List<WriterStateT> snapshotState(long checkpointId) throws IOException;
Triggered before the
checkpoint barrier. The
returned collection is persisted
into the state.
© 2022 Ververica
Apache Flink
Unified Sink Framework
void commit(Collection<CommitRequest<CommT>> committables) throws IOException, InterruptedException;
© 2022 Ververica
Apache Flink
Unified Sink Framework
void commit(Collection<CommitRequest<CommT>> committables) throws IOException, InterruptedException;
If all operators in the current
job successfully checkpoint
the commit is triggered. The
implementation decides on
potential retries.
© 2022 Ververica
Apache Kafka Transactions Recap
© 2022 Ververica
Transactions in Apache Kafka
● Atomic writes to multiple topics/partitions
○ All messages are part of a transaction and all or none are
written
● Transactional consumers only read messages that are part of a
committed transaction
● Users need to set a dedicated transaction descriptor:
transactional.id
© 2022 Ververica
Transactions in Apache Kafka
Properties producerProps = new Properties();
producerProps.put("bootstrap.servers", "localhost:9092");
producerProps.put("transactional.id", "prod-1");
KafkaProducer<String, String> producer = new KafkaProducer<>(producerProps);
producer.initTransactions();
producer.beginTransaction();
producer.send(new ProducerRecord<>("counts", "value"));
producer.commitTransaction();
API
© 2022 Ververica
Apache Flink’s KafkaSink
© 2022 Ververica
Apache Flink’s KafkaSink
Committer
Writer
Kafka - Records
Opens a transaction during
creation
© 2022 Ververica
Apache Flink’s KafkaSink
Committer
Writer
Kafka - Records
Writes records continuously in
the open transaction
© 2022 Ververica
Apache Flink’s KafkaSink
Committer
Writer
Kafka - Records
- Committables
Flush all outstanding
messages
Forward transaction handle
(i.e. transactional.id)
© 2022 Ververica
Apache Flink’s KafkaSink
Committer
Writer
Kafka - Records
- Committables
Writes received committable
into state
© 2022 Ververica
Apache Flink’s KafkaSink
Committer
Writer
Kafka - Records
- Committables
All tasks have successfully
checkpointed
Finish all transactions based
on the received committables
© 2022 Ververica
How to choose the correct transactional.id that parallel writers do not
fence each other???
Apache Flink’s KafkaSink
Writer Writer Writer Writer
Subtask 0 Subtask 1 Subtask 2 Subtask 3
© 2022 Ververica
How to choose the correct transactional.id that parallel writers do not
fence each other???
Apache Flink’s KafkaSink
Writer Writer Writer Writer
Subtask 0 Subtask 1 Subtask 2 Subtask 3
{transactionalIdPrefix} - {subtaskId} - {checkpointId}
© 2022 Ververica
Apache Flink’s KafkaSink
1. A opens transaction and writes records
2. B opens transaction and writes records
3. A fails
4. B commits the transaction.
© 2022 Ververica
Prior opened transactions for a topic need to be finished before other
committed transactions become visible.
Apache Flink’s KafkaSink
1. A opens transaction and writes records
2. B opens transaction and writes records
3. A fails
4. B commits the transaction.
© 2022 Ververica
Apache Flink’s KafkaSink
How can we make the visibility of new transactions faster than the
transaction timeout???
© 2022 Ververica
Apache Flink’s KafkaSink
How can we make the visibility of new transactions faster than the
transaction timeout???
for (int i = currentSubtaskId; ; i += parallelism) {
}
© 2022 Ververica
Apache Flink’s KafkaSink
How can we make the visibility of new transactions faster than the
transaction timeout???
for (int i = currentSubtaskId; ; i += parallelism) {
int abortedTransactions = 0;
for (int j = currentCheckpointId; ; j++, abortedTransations++) {
abortTransaction({transactionalIdPrefix}-{i}-{j})
}
If (abortedTransactions == 0) {
return;
}
}
© 2022 Ververica
Conclusion
© 2022 Ververica
Caveats with Apache Kafka’s Transaction API
● Transactional.id is bound to the lifetime of the KafkaProducer
○ Recycling of KafkaProducer is not possible because every
checkpoint changes the transactional.id
○ Aborting transactions initially can be very slow because we
need to create a new KafkaProducer every time
© 2022 Ververica
Caveats with Apache Kafka’s Transaction API
● Transactional.id is bound to the lifetime of the KafkaProducer
○ Recycling of KafkaProducer is not possible because every
checkpoint changes the transactional.id
○ Aborting transactions initially can be very slow because we
need to create a new KafkaProducer every time
We added a new method setTransactionalId using reflection to reuse
the existing KafkaProducer with a different transactional.id. [1]
[1]
https://github.com/apache/flink/blob/1d347e66eb799646b28100430b0afa65a56d844b/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/sink/FlinkKafkaInternalProduc
er.java#L153
© 2022 Ververica
Caveats with Apache Kafka’s Transaction API
● No official way to list all currently open transactions (i.e. internal
transaction topic)
○ After a job recovery new records are only visible after the
transaction timeout
○ KafkaProducer can overwrite transactions but cannot return
whether a transaction already exists
© 2022 Ververica
Caveats with Apache Kafka’s Transaction API
● No official way to list all currently open transactions (i.e. internal
transaction topic)
○ After a job recovery new records are only visible after the
transaction timeout
○ KafkaProducer can overwrite transactions but cannot return
whether a transaction already exists
We expose the KafkaProducer epoch to determine whether a
transaction is currently open (epoch == 0, means no transaction
open)
© 2022 Ververica
Summary
● Apache Kafka offers a good way to build exactly once applications
requiring high throughput and low latencies
● Transaction system is not always comparable to transactions
known from traditional databases
● KafkaSink is released with Apache Flink 1.14 supporting no, atleast
and exactly once guarantees
© 2022 Ververica
Thank You
© 2022 Ververica
www.ververica.com
fabianpaul@ververica.com

More Related Content

What's hot (20)

PDF
今話題のいろいろなコンテナランタイムを比較してみた
Kohei Tokunaga
 
PDF
Hadoop/Spark で Amazon S3 を徹底的に使いこなすワザ (Hadoop / Spark Conference Japan 2019)
Noritaka Sekiyama
 
PPTX
Tutorial: Using GoBGP as an IXP connecting router
Shu Sugimoto
 
PPTX
ポスト・ラムダアーキテクチャの切り札? Apache Hudi(NTTデータ テクノロジーカンファレンス 2020 発表資料)
NTT DATA Technology & Innovation
 
PDF
コンテナにおけるパフォーマンス調査でハマった話
Yuta Shimada
 
PDF
不揮発メモリ(NVDIMM)とLinuxの対応動向について
Yasunori Goto
 
PDF
ネットワークOS野郎 ~ インフラ野郎Night 20160414
Kentaro Ebisawa
 
PDF
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
NTT DATA OSS Professional Services
 
PDF
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
PDF
AWS で Presto を徹底的に使いこなすワザ
Noritaka Sekiyama
 
PDF
Redis vs Infinispan | DevNation Tech Talk
Red Hat Developers
 
PDF
Apache Hadoop & Hive 入門 (マーケティングデータ分析基盤技術勉強会)
Takeshi Mikami
 
PPTX
Kubernetes introduction
DAEBUM LEE
 
PDF
オンプレミスRDBMSをAWSへ移行する手法
Amazon Web Services Japan
 
PDF
ニワトリでもわかるECS入門
Yoshiki Kobayashi
 
PPTX
急速に進化を続けるCNIプラグイン Antrea
Motonori Shindo
 
PPTX
Excel でパケット分析 - グラフ化
彰 村地
 
PDF
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
hamaken
 
PPTX
【OpenStack共同検証ラボ】OpenStack監視・ログ分析基盤の作り方 - OpenStack最新情報セミナー(2016年7月)
VirtualTech Japan Inc.
 
PDF
YugabyteDBを使ってみよう - part2 -(NewSQL/分散SQLデータベースよろず勉強会 #2 発表資料)
NTT DATA Technology & Innovation
 
今話題のいろいろなコンテナランタイムを比較してみた
Kohei Tokunaga
 
Hadoop/Spark で Amazon S3 を徹底的に使いこなすワザ (Hadoop / Spark Conference Japan 2019)
Noritaka Sekiyama
 
Tutorial: Using GoBGP as an IXP connecting router
Shu Sugimoto
 
ポスト・ラムダアーキテクチャの切り札? Apache Hudi(NTTデータ テクノロジーカンファレンス 2020 発表資料)
NTT DATA Technology & Innovation
 
コンテナにおけるパフォーマンス調査でハマった話
Yuta Shimada
 
不揮発メモリ(NVDIMM)とLinuxの対応動向について
Yasunori Goto
 
ネットワークOS野郎 ~ インフラ野郎Night 20160414
Kentaro Ebisawa
 
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
NTT DATA OSS Professional Services
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
AWS で Presto を徹底的に使いこなすワザ
Noritaka Sekiyama
 
Redis vs Infinispan | DevNation Tech Talk
Red Hat Developers
 
Apache Hadoop & Hive 入門 (マーケティングデータ分析基盤技術勉強会)
Takeshi Mikami
 
Kubernetes introduction
DAEBUM LEE
 
オンプレミスRDBMSをAWSへ移行する手法
Amazon Web Services Japan
 
ニワトリでもわかるECS入門
Yoshiki Kobayashi
 
急速に進化を続けるCNIプラグイン Antrea
Motonori Shindo
 
Excel でパケット分析 - グラフ化
彰 村地
 
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
hamaken
 
【OpenStack共同検証ラボ】OpenStack監視・ログ分析基盤の作り方 - OpenStack最新情報セミナー(2016年7月)
VirtualTech Japan Inc.
 
YugabyteDBを使ってみよう - part2 -(NewSQL/分散SQLデータベースよろず勉強会 #2 発表資料)
NTT DATA Technology & Innovation
 

Similar to Apache Kafka’s Transactions in the Wild! Developing an exactly-once KafkaSink in Apache Flink with Fabian Paul | Kafka Summit London 2022 (20)

PDF
Stream processing with Apache Flink (Timo Walther - Ververica)
KafkaZone
 
PDF
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Timo Walther
 
PDF
Stream Processing Solution for the Enterprise
HostedbyConfluent
 
PPTX
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
Ververica
 
PPTX
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Ver...
Flink Forward
 
PPTX
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Flink Forward
 
PDF
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Evention
 
PDF
Apache Flink Worst Practices
Konstantin Knauf
 
PDF
Apache flink
pranay kumar
 
PPTX
Flink 0.10 - Upcoming Features
Aljoscha Krettek
 
PDF
Unified Stream and Batch Processing with Apache Flink
DataWorks Summit/Hadoop Summit
 
PDF
A unified analytics platform with Kafka and Flink | Stephan Ewen, Ververica
HostedbyConfluent
 
PDF
Apache Flink
Mike Frampton
 
PDF
Unified Data Processing with Apache Flink and Apache Pulsar_Seth Wiesman
StreamNative
 
PPTX
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
PDF
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
Paris Carbone
 
PPTX
Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...
Flink Forward
 
PPTX
Flink Streaming @BudapestData
Gyula Fóra
 
PPTX
KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...
Flink Forward
 
PDF
Getting Data In and Out of Flink - Understanding Flink and Its Connector Ecos...
HostedbyConfluent
 
Stream processing with Apache Flink (Timo Walther - Ververica)
KafkaZone
 
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Timo Walther
 
Stream Processing Solution for the Enterprise
HostedbyConfluent
 
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
Ververica
 
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Ver...
Flink Forward
 
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Flink Forward
 
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Evention
 
Apache Flink Worst Practices
Konstantin Knauf
 
Apache flink
pranay kumar
 
Flink 0.10 - Upcoming Features
Aljoscha Krettek
 
Unified Stream and Batch Processing with Apache Flink
DataWorks Summit/Hadoop Summit
 
A unified analytics platform with Kafka and Flink | Stephan Ewen, Ververica
HostedbyConfluent
 
Apache Flink
Mike Frampton
 
Unified Data Processing with Apache Flink and Apache Pulsar_Seth Wiesman
StreamNative
 
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
Paris Carbone
 
Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...
Flink Forward
 
Flink Streaming @BudapestData
Gyula Fóra
 
KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...
Flink Forward
 
Getting Data In and Out of Flink - Understanding Flink and Its Connector Ecos...
HostedbyConfluent
 
Ad

More from HostedbyConfluent (20)

PDF
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
PDF
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
PDF
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
PDF
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
PDF
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
PDF
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
PDF
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
PDF
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
PDF
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
PDF
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
PDF
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
PDF
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
PDF
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
PDF
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
PDF
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
PDF
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
PDF
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
PDF
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
PDF
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
PDF
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 
Ad

Recently uploaded (20)

PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
PPTX
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
PDF
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
PDF
Upgrading to z_OS V2R4 Part 01 of 02.pdf
Flavio787771
 
PDF
Novus Safe Lite- What is Novus Safe Lite.pdf
Novus Hi-Tech
 
PDF
How Current Advanced Cyber Threats Transform Business Operation
Eryk Budi Pratama
 
PDF
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
PPTX
Simplifying End-to-End Apache CloudStack Deployment with a Web-Based Automati...
ShapeBlue
 
PDF
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
PPTX
Building and Operating a Private Cloud with CloudStack and LINBIT CloudStack ...
ShapeBlue
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
Ampere Offers Energy-Efficient Future For AI And Cloud
ShapeBlue
 
PDF
Arcee AI - building and working with small language models (06/25)
Julien SIMON
 
PDF
Rethinking Security Operations - Modern SOC.pdf
Haris Chughtai
 
PDF
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
PDF
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
PDF
Shuen Mei Parth Sharma Boost Productivity, Innovation and Efficiency wit...
AWS Chicago
 
PDF
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
PDF
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
PPTX
Top Managed Service Providers in Los Angeles
Captain IT
 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
Upgrading to z_OS V2R4 Part 01 of 02.pdf
Flavio787771
 
Novus Safe Lite- What is Novus Safe Lite.pdf
Novus Hi-Tech
 
How Current Advanced Cyber Threats Transform Business Operation
Eryk Budi Pratama
 
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
Simplifying End-to-End Apache CloudStack Deployment with a Web-Based Automati...
ShapeBlue
 
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
Building and Operating a Private Cloud with CloudStack and LINBIT CloudStack ...
ShapeBlue
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
Ampere Offers Energy-Efficient Future For AI And Cloud
ShapeBlue
 
Arcee AI - building and working with small language models (06/25)
Julien SIMON
 
Rethinking Security Operations - Modern SOC.pdf
Haris Chughtai
 
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
Shuen Mei Parth Sharma Boost Productivity, Innovation and Efficiency wit...
AWS Chicago
 
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
Top Managed Service Providers in Los Angeles
Captain IT
 

Apache Kafka’s Transactions in the Wild! Developing an exactly-once KafkaSink in Apache Flink with Fabian Paul | Kafka Summit London 2022

  • 1. © 2022 Ververica Apache Kafka’s Transactions in the Wild! Developing an exactly-once KafkaSink in Apache Flink Fabian Paul, Ververica - Kafka Summit London 2022
  • 2. © 2022 Ververica About Ververica Original creators of Apache Flink® Complete Stream Processing Infrastructure
  • 3. © 2022 Ververica ● Apache Kafka is one of the the most widely used tools to support stream processing use cases with Apache Flink ● Different delivery guarantees in processing framework ○ At most once ○ At least once ○ Exactly once Motivation
  • 4. © 2022 Ververica ● Apache Kafka is one of the the most widely used tools to support stream processing use cases with Apache Flink ● Different delivery guarantees in processing framework ○ At most once ○ At least once ○ Exactly once ● Demand for streaming applications with stronger guarantees increases constantly i.e. financial data processing Motivation
  • 5. © 2022 Ververica Recap Apache Flink
  • 6. © 2022 Ververica Apache Flink Apache Flink, Flink®, Apache®, the squirrel logo, and the Apache feather logo are either registered trademarks or trademarks of The Apache Software Foundation.
  • 7. © 2022 Ververica Apache Flink Apache Flink, Flink®, Apache®, the squirrel logo, and the Apache feather logo are either registered trademarks or trademarks of The Apache Software Foundation.
  • 8. © 2022 Ververica Apache Flink ● Many functions are stateful ○ Streaming data arrives over time ○ Functions need to remember records or temporary results ● Any variable that lives across function invocations is state ○ State must not be lost in case of a failure Fault Tolerance
  • 9. © 2022 Ververica Apache Flink ● Many functions are stateful ○ Streaming data arrives over time ○ Functions need to remember records or temporary results ● Any variable that lives across function invocations is state ○ State must not be lost in case of a failure ● Periodically ingest checkpoint barriers into the data stream ● Save task state independently but at the same event time Fault Tolerance
  • 10. © 2022 Ververica Apache Flink Fault Tolerance
  • 11. © 2022 Ververica Apache Flink Fault Tolerance
  • 12. © 2022 Ververica Apache Flink Fault Tolerance
  • 13. © 2022 Ververica Apache Flink Fault Tolerance
  • 14. © 2022 Ververica Apache Flink Fault Tolerance
  • 15. © 2022 Ververica Apache Flink Fault Tolerance
  • 16. © 2022 Ververica Apache Flink ● Supports writing data to external system for streaming and batch applications without dedicated implementations ● Offers different mixin interfaces for stateless and stateful sinks ● Utilizes a two phase commit protocol between a Writer and Committer Operator to ensure exactly once guarantees Unified Sink Framework
  • 17. © 2022 Ververica Apache Flink Unified Sink Framework void write(InputT element, Context context) throws IOException, InterruptedException; void flush(boolean endOfInput) throws IOException, InterruptedException; Collection<CommT> prepareCommit() throws IOException, InterruptedException; List<WriterStateT> snapshotState(long checkpointId) throws IOException;
  • 18. © 2022 Ververica Apache Flink Unified Sink Framework void write(InputT element, Context context) throws IOException, InterruptedException; void flush(boolean endOfInput) throws IOException, InterruptedException; Collection<CommT> prepareCommit() throws IOException, InterruptedException; List<WriterStateT> snapshotState(long checkpointId) throws IOException; Triggered for every incoming element and supposed to write to the external system
  • 19. © 2022 Ververica Apache Flink Unified Sink Framework void write(InputT element, Context context) throws IOException, InterruptedException; void flush(boolean endOfInput) throws IOException, InterruptedException; Collection<CommT> prepareCommit() throws IOException, InterruptedException; List<WriterStateT> snapshotState(long checkpointId) throws IOException; Triggered before receiving the checkpoint barrier. After the method completes all records need to be persisted in the external system.
  • 20. © 2022 Ververica Apache Flink Unified Sink Framework void write(InputT element, Context context) throws IOException, InterruptedException; void flush(boolean endOfInput) throws IOException, InterruptedException; Collection<CommT> prepareCommit() throws IOException, InterruptedException; List<WriterStateT> snapshotState(long checkpointId) throws IOException; Triggered before the checkpoint barrier. The returned collection is forwarded to the committer operator.
  • 21. © 2022 Ververica Apache Flink Unified Sink Framework void write(InputT element, Context context) throws IOException, InterruptedException; void flush(boolean endOfInput) throws IOException, InterruptedException; Collection<CommT> prepareCommit() throws IOException, InterruptedException; List<WriterStateT> snapshotState(long checkpointId) throws IOException; Triggered before the checkpoint barrier. The returned collection (i.e. open transactions) is forwarded to the committer operator.
  • 22. © 2022 Ververica Apache Flink Unified Sink Framework void write(InputT element, Context context) throws IOException, InterruptedException; void flush(boolean endOfInput) throws IOException, InterruptedException; Collection<CommT> prepareCommit() throws IOException, InterruptedException; List<WriterStateT> snapshotState(long checkpointId) throws IOException; Triggered before the checkpoint barrier. The returned collection is persisted into the state.
  • 23. © 2022 Ververica Apache Flink Unified Sink Framework void commit(Collection<CommitRequest<CommT>> committables) throws IOException, InterruptedException;
  • 24. © 2022 Ververica Apache Flink Unified Sink Framework void commit(Collection<CommitRequest<CommT>> committables) throws IOException, InterruptedException; If all operators in the current job successfully checkpoint the commit is triggered. The implementation decides on potential retries.
  • 25. © 2022 Ververica Apache Kafka Transactions Recap
  • 26. © 2022 Ververica Transactions in Apache Kafka ● Atomic writes to multiple topics/partitions ○ All messages are part of a transaction and all or none are written ● Transactional consumers only read messages that are part of a committed transaction ● Users need to set a dedicated transaction descriptor: transactional.id
  • 27. © 2022 Ververica Transactions in Apache Kafka Properties producerProps = new Properties(); producerProps.put("bootstrap.servers", "localhost:9092"); producerProps.put("transactional.id", "prod-1"); KafkaProducer<String, String> producer = new KafkaProducer<>(producerProps); producer.initTransactions(); producer.beginTransaction(); producer.send(new ProducerRecord<>("counts", "value")); producer.commitTransaction(); API
  • 28. © 2022 Ververica Apache Flink’s KafkaSink
  • 29. © 2022 Ververica Apache Flink’s KafkaSink Committer Writer Kafka - Records Opens a transaction during creation
  • 30. © 2022 Ververica Apache Flink’s KafkaSink Committer Writer Kafka - Records Writes records continuously in the open transaction
  • 31. © 2022 Ververica Apache Flink’s KafkaSink Committer Writer Kafka - Records - Committables Flush all outstanding messages Forward transaction handle (i.e. transactional.id)
  • 32. © 2022 Ververica Apache Flink’s KafkaSink Committer Writer Kafka - Records - Committables Writes received committable into state
  • 33. © 2022 Ververica Apache Flink’s KafkaSink Committer Writer Kafka - Records - Committables All tasks have successfully checkpointed Finish all transactions based on the received committables
  • 34. © 2022 Ververica How to choose the correct transactional.id that parallel writers do not fence each other??? Apache Flink’s KafkaSink Writer Writer Writer Writer Subtask 0 Subtask 1 Subtask 2 Subtask 3
  • 35. © 2022 Ververica How to choose the correct transactional.id that parallel writers do not fence each other??? Apache Flink’s KafkaSink Writer Writer Writer Writer Subtask 0 Subtask 1 Subtask 2 Subtask 3 {transactionalIdPrefix} - {subtaskId} - {checkpointId}
  • 36. © 2022 Ververica Apache Flink’s KafkaSink 1. A opens transaction and writes records 2. B opens transaction and writes records 3. A fails 4. B commits the transaction.
  • 37. © 2022 Ververica Prior opened transactions for a topic need to be finished before other committed transactions become visible. Apache Flink’s KafkaSink 1. A opens transaction and writes records 2. B opens transaction and writes records 3. A fails 4. B commits the transaction.
  • 38. © 2022 Ververica Apache Flink’s KafkaSink How can we make the visibility of new transactions faster than the transaction timeout???
  • 39. © 2022 Ververica Apache Flink’s KafkaSink How can we make the visibility of new transactions faster than the transaction timeout??? for (int i = currentSubtaskId; ; i += parallelism) { }
  • 40. © 2022 Ververica Apache Flink’s KafkaSink How can we make the visibility of new transactions faster than the transaction timeout??? for (int i = currentSubtaskId; ; i += parallelism) { int abortedTransactions = 0; for (int j = currentCheckpointId; ; j++, abortedTransations++) { abortTransaction({transactionalIdPrefix}-{i}-{j}) } If (abortedTransactions == 0) { return; } }
  • 42. © 2022 Ververica Caveats with Apache Kafka’s Transaction API ● Transactional.id is bound to the lifetime of the KafkaProducer ○ Recycling of KafkaProducer is not possible because every checkpoint changes the transactional.id ○ Aborting transactions initially can be very slow because we need to create a new KafkaProducer every time
  • 43. © 2022 Ververica Caveats with Apache Kafka’s Transaction API ● Transactional.id is bound to the lifetime of the KafkaProducer ○ Recycling of KafkaProducer is not possible because every checkpoint changes the transactional.id ○ Aborting transactions initially can be very slow because we need to create a new KafkaProducer every time We added a new method setTransactionalId using reflection to reuse the existing KafkaProducer with a different transactional.id. [1] [1] https://github.com/apache/flink/blob/1d347e66eb799646b28100430b0afa65a56d844b/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/sink/FlinkKafkaInternalProduc er.java#L153
  • 44. © 2022 Ververica Caveats with Apache Kafka’s Transaction API ● No official way to list all currently open transactions (i.e. internal transaction topic) ○ After a job recovery new records are only visible after the transaction timeout ○ KafkaProducer can overwrite transactions but cannot return whether a transaction already exists
  • 45. © 2022 Ververica Caveats with Apache Kafka’s Transaction API ● No official way to list all currently open transactions (i.e. internal transaction topic) ○ After a job recovery new records are only visible after the transaction timeout ○ KafkaProducer can overwrite transactions but cannot return whether a transaction already exists We expose the KafkaProducer epoch to determine whether a transaction is currently open (epoch == 0, means no transaction open)
  • 46. © 2022 Ververica Summary ● Apache Kafka offers a good way to build exactly once applications requiring high throughput and low latencies ● Transaction system is not always comparable to transactions known from traditional databases ● KafkaSink is released with Apache Flink 1.14 supporting no, atleast and exactly once guarantees