SlideShare a Scribd company logo
1
Kafka Connect Best Practices:
Advice from the Field
Randall Hauch, Software Engineer
@rhauch
2
Agenda
1. Kafka Connect basics
2. Choosing and using connectors
3. Planning Kafka Connect deployments
3
Agenda
1. Kafka Connect basics
2. Choosing and using connectors
3. Planning Kafka Connect deployments
4
Apache Kafka™
Kafka
Cluster
5
Apache Kafka™
6
Kafka Connect basics
7
Kafka Connect basics
8
Kafka Connect basics
9
Kafka Connect basics
10
Kafka Connect basics
11
Kafka Connect basics
12
Kafka Connect basics
13
Kafka Connect basics
14
Kafka Connect basics
15
Kafka Connect basics
16
Kafka Connect: Connectors
Connectors are reusable components that
know how to talk to specific sources and sinks.
17
Kafka Connect: Converters
Convert between the source and sink record objects
and the binary format used to persist them in Kafka.
JSON, Avro, and others
18
Kafka Connect: Single Message Transforms (SMTs)
Modify the structure of keys and values, topic,
and partition of source and sink record objects.
19
Kafka Connect: Single Message Transforms (SMTs)
Modify the structure of keys and values, topics,
and partition of source and sink record objects.
20
Kafka Connect: Plugins
Connectors, transforms, and converters
are pluggable components.
21
Kafka Connect: Classpath Isolation
A plugin is a directory containing the JARs for a
connector, transform, and/or converters.
JAR files, sample configs, etc.
22
Kafka Connect: Classpath Isolation
my-plugins (include on the plugin.path )
JAR files, sample configs, etc.
kafka-connect-foo-connector
The plugin.path worker configuration property
lists the directories that contain plugins
23
Kafka Connect: Classpath Isolation
my-plugins (include on the plugin.path )
kafka-connect-foo-connector
kafka-connect-bar-connector
Workers isolate the JARs for each connector, transform,
and converter to prevent conflicts.
24
Kafka Connect: Offsets
Kafka Connect automatically and periodically commits
the progress of connectors.
Connectors restart at last committed position.
25
Kafka Connect: Delivery Guarantees
Kafka Connect processes each record once
under normal operations.
But things can go wrong,
so it can only guarantee at-least-once delivery.
Sink connectors can achieve exactly once
when they store offsets in the sink system.
26
Agenda
1. Kafka Connect basics
2. Choosing and using connectors
3. Planning Kafka Connect deployments
27
Source Connectors
28
Source Connectors
JDBC source connector works with lots of DBMSes.
Access data in tables, views, or custom queries.
Incremental mode requires
creation and/or modification columns.
Detects soft deletes only, not removed rows.
29
Source Connectors
Change Data Capture (CDC) connectors
monitor system for all changes, including deleted rows.
CDC connectors often require
non-standard and source-specific APIs.
Typically detect changes only in physical tables only.
30
Sink Connectors
31
Sink Connectors
How are topics and partitions mapped to the external system?
Some sink connectors are more flexible than others.
32
Sink Connectors
Is the sink connector at least once or exactly once?
Confluent’s HDFS and S3 are exactly once.
But many at-least-once sink connectors are idempotent.
33
Choosing and using connectors
Use a playground that is easy to clean up and restart.
Confluent CLI is perfect for this!
34
Confluent Command Line Interface (CLI)
Utility to easily operate Kafka-related services
on your local machine.
$ confluent start
$ confluent current
$ confluent help
$ confluent logs
$ confluent top
$ confluent status
$ confluent status connectors
$ confluent config <connector_name>
$ confluent load <connector_name>
$ confluent status <connector_name>
$ confluent unload <connector_name>
$ confluent stop connect
$ confluent stop
$ confluent destroy
Uses configuration files in
`etc/kafka` and
`etc/schema-registry`
Stores data in temporary directory:
$CONFLUENT_CURRENT or
`confluent current`
Zookeeper, Kafka, Schema Registry,
Kafka REST, and Connect Distributed
35
Confluent Command Line Interface (CLI)
Demo
36
Agenda
1. Kafka Connect basics
2. Choosing and using connectors
3. Planning Kafka Connect deployments
37
Planning Kafka Connect deployments
Understand the schemas of your records.
Confluent Schema Registry and Avro Converters
make schema evolution possible.
Producers and consumers can
adapt to new schemas at different times.
Enforce forward and/or backward compatibility.
38
Planning Kafka Connect deployments
Install connectors on all workers in the cluster,
not just one of the workers.
39
Planning Kafka Connect deployments
Kafka Connect is a simple Java application,
so it can be deployed on many systems,
including Kubernetes, Mesos, EC2, etc.
40
Planning Kafka Connect deployments
Don’t overload your workers.
If they are too busy, they may skip heartbeats
and drop out of the cluster, causing a rebalance.
(improvement: KAFKA-5741)
41
Planning Kafka Connect deployments
Tune the producers and consumers.
• producers and consumers
• offset commit intervals
• poll intervals, batch sizes, # of tasks
42
Planning Kafka Connect deployments
Minimize rebalances.
If you need more isolation and control,
use separate worker clusters.
43
Summary
44
Summary
• Use Kafka Connect for sharing
and moving data
• Choose the right connectors
• Learn how to use the connectors
• Plan your deployments
45
Thank You!

More Related Content

What's hot (20)

ODP
Liquibase & Flyway @ Baltic DevOps
Andrei Solntsev
 
PDF
MySQL Advanced Administrator 2021 - 네오클로바
NeoClova
 
PPTX
Spring boot Under Da Hood
Michel Schudel
 
PDF
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
confluent
 
PDF
PostgreSQL Deep Internal
EXEM
 
PDF
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
confluent
 
PDF
[Pgday.Seoul 2017] 2. PostgreSQL을 위한 리눅스 커널 최적화 - 김상욱
PgDay.Seoul
 
PDF
Care and Feeding of Catalyst Optimizer
Databricks
 
PDF
MongoDB World 2019: Raiders of the Anti-patterns: A Journey Towards Fixing Sc...
MongoDB
 
PDF
Real-Time Stream Processing with KSQL and Apache Kafka
confluent
 
PPTX
Whoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
DataWorks Summit
 
PDF
From Zero to Hero with Kafka Connect
confluent
 
PDF
KSQL Intro
confluent
 
PPTX
PostgreSQL and JDBC: striving for high performance
Vladimir Sitnikov
 
PDF
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Databricks
 
PDF
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
confluent
 
PDF
ksqlDB: A Stream-Relational Database System
confluent
 
PDF
ksqlDB - Stream Processing simplified!
Guido Schmutz
 
PDF
Apache Calcite (a tutorial given at BOSS '21)
Julian Hyde
 
PPTX
Google Cloud Composer
Pierre Coste
 
Liquibase & Flyway @ Baltic DevOps
Andrei Solntsev
 
MySQL Advanced Administrator 2021 - 네오클로바
NeoClova
 
Spring boot Under Da Hood
Michel Schudel
 
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
confluent
 
PostgreSQL Deep Internal
EXEM
 
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
confluent
 
[Pgday.Seoul 2017] 2. PostgreSQL을 위한 리눅스 커널 최적화 - 김상욱
PgDay.Seoul
 
Care and Feeding of Catalyst Optimizer
Databricks
 
MongoDB World 2019: Raiders of the Anti-patterns: A Journey Towards Fixing Sc...
MongoDB
 
Real-Time Stream Processing with KSQL and Apache Kafka
confluent
 
Whoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
DataWorks Summit
 
From Zero to Hero with Kafka Connect
confluent
 
KSQL Intro
confluent
 
PostgreSQL and JDBC: striving for high performance
Vladimir Sitnikov
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Databricks
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
confluent
 
ksqlDB: A Stream-Relational Database System
confluent
 
ksqlDB - Stream Processing simplified!
Guido Schmutz
 
Apache Calcite (a tutorial given at BOSS '21)
Julian Hyde
 
Google Cloud Composer
Pierre Coste
 

Viewers also liked (7)

PDF
Kafka Summit SF 2017 - Body Armor for Distributed Systems michael egorov ny c...
confluent
 
PDF
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
confluent
 
PDF
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
confluent
 
PDF
Kafka Summit SF 2017 - Database Streaming at WePay
confluent
 
PDF
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
confluent
 
PDF
Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...
confluent
 
PDF
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
confluent
 
Kafka Summit SF 2017 - Body Armor for Distributed Systems michael egorov ny c...
confluent
 
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
confluent
 
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
confluent
 
Kafka Summit SF 2017 - Database Streaming at WePay
confluent
 
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
confluent
 
Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...
confluent
 
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
confluent
 
Ad

Similar to Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field (20)

PPTX
Data Pipelines with Kafka Connect
Kaufman Ng
 
PPTX
Riding the Streaming Wave DIY style
Konstantine Karantasis
 
PDF
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
Athens Big Data
 
PPTX
Kafka connect 101
Whiteklay
 
PDF
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
PDF
Partner Development Guide for Kafka Connect
confluent
 
PPT
Kafka Explainaton
NguyenChiHoangMinh
 
PDF
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Kai Wähner
 
PPTX
Introduction to kafka connector
Knoldus Inc.
 
PDF
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...
HostedbyConfluent
 
PDF
Apache Kafka - A modern Stream Processing Platform
Guido Schmutz
 
PPTX
kafka for db as postgres
PivotalOpenSourceHub
 
PDF
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Helena Edelson
 
PPTX
Couchbase Data Pipeline
Justin Michaels
 
PDF
Azure Cosmos DB Kafka Connectors | Abinav Rameesh, Microsoft
HostedbyConfluent
 
PDF
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Guido Schmutz
 
PDF
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Lightbend
 
PDF
Kafka Connect & Streams - the ecosystem around Kafka
Guido Schmutz
 
PDF
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Athens Big Data
 
PPTX
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
Lightbend
 
Data Pipelines with Kafka Connect
Kaufman Ng
 
Riding the Streaming Wave DIY style
Konstantine Karantasis
 
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
Athens Big Data
 
Kafka connect 101
Whiteklay
 
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Partner Development Guide for Kafka Connect
confluent
 
Kafka Explainaton
NguyenChiHoangMinh
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Kai Wähner
 
Introduction to kafka connector
Knoldus Inc.
 
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...
HostedbyConfluent
 
Apache Kafka - A modern Stream Processing Platform
Guido Schmutz
 
kafka for db as postgres
PivotalOpenSourceHub
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Helena Edelson
 
Couchbase Data Pipeline
Justin Michaels
 
Azure Cosmos DB Kafka Connectors | Abinav Rameesh, Microsoft
HostedbyConfluent
 
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Guido Schmutz
 
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Lightbend
 
Kafka Connect & Streams - the ecosystem around Kafka
Guido Schmutz
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Athens Big Data
 
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
Lightbend
 
Ad

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
PDF
Migration, backup and restore made easy using Kannika
confluent
 
PDF
Five Things You Need to Know About Data Streaming in 2025
confluent
 
PDF
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
PDF
Unlocking value with event-driven architecture by Confluent
confluent
 
PDF
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
PDF
Building API data products on top of your real-time data infrastructure
confluent
 
PDF
Speed Wins: From Kafka to APIs in Minutes
confluent
 
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
Migration, backup and restore made easy using Kannika
confluent
 
Five Things You Need to Know About Data Streaming in 2025
confluent
 
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
Unlocking value with event-driven architecture by Confluent
confluent
 
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
Building API data products on top of your real-time data infrastructure
confluent
 
Speed Wins: From Kafka to APIs in Minutes
confluent
 
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 

Recently uploaded (20)

PDF
MOBILE AND WEB BASED REMOTE BUSINESS MONITORING SYSTEM
ijait
 
PPTX
Unit-1&2,mdngmnd,mngmdnmgnmdnfmngdf.pptx
jayarao21
 
PPTX
ENSA_Module_8.pptx_nice_ipsec_presentation
RanaMukherjee24
 
PPTX
Fluid statistics and Numerical on pascal law
Ravindra Kolhe
 
PDF
Web Technologies - Chapter 3 of Front end path.pdf
reemaaliasker
 
PDF
th International conference on Big Data, Machine learning and Applications (B...
Zac Darcy
 
PDF
CFM 56-7B - Engine General Familiarization. PDF
Gianluca Foro
 
PPTX
cybersecurityandthe importance of the that
JayachanduHNJc
 
PPTX
Unit II: Meteorology of Air Pollution and Control Engineering:
sundharamm
 
PDF
July 2025 - Top 10 Read Articles in Network Security & Its Applications.pdf
IJNSA Journal
 
PPTX
Cyclic_Redundancy_Check_Presentation.pptx
alhjranyblalhmwdbdal
 
PPTX
Precedence and Associativity in C prog. language
Mahendra Dheer
 
PDF
Natural Language processing and web deigning notes
AnithaSakthivel3
 
PDF
Introduction to Robotics Mechanics and Control 4th Edition by John J. Craig S...
solutionsmanual3
 
PDF
PRIZ Academy - Change Flow Thinking Master Change with Confidence.pdf
PRIZ Guru
 
PPTX
MULTI LEVEL DATA TRACKING USING COOJA.pptx
dollysharma12ab
 
PDF
A NEW FAMILY OF OPTICALLY CONTROLLED LOGIC GATES USING NAPHTHOPYRAN MOLECULE
ijoejnl
 
PDF
Non Text Magic Studio Magic Design for Presentations L&P.pdf
rajpal7872
 
PPTX
GitHub_Copilot_Basics...........................pptx
ssusera13041
 
PPTX
ENG8 Q1, WEEK 4.pptxoooiioooooooooooooooooooooooooo
chubbychubz1
 
MOBILE AND WEB BASED REMOTE BUSINESS MONITORING SYSTEM
ijait
 
Unit-1&2,mdngmnd,mngmdnmgnmdnfmngdf.pptx
jayarao21
 
ENSA_Module_8.pptx_nice_ipsec_presentation
RanaMukherjee24
 
Fluid statistics and Numerical on pascal law
Ravindra Kolhe
 
Web Technologies - Chapter 3 of Front end path.pdf
reemaaliasker
 
th International conference on Big Data, Machine learning and Applications (B...
Zac Darcy
 
CFM 56-7B - Engine General Familiarization. PDF
Gianluca Foro
 
cybersecurityandthe importance of the that
JayachanduHNJc
 
Unit II: Meteorology of Air Pollution and Control Engineering:
sundharamm
 
July 2025 - Top 10 Read Articles in Network Security & Its Applications.pdf
IJNSA Journal
 
Cyclic_Redundancy_Check_Presentation.pptx
alhjranyblalhmwdbdal
 
Precedence and Associativity in C prog. language
Mahendra Dheer
 
Natural Language processing and web deigning notes
AnithaSakthivel3
 
Introduction to Robotics Mechanics and Control 4th Edition by John J. Craig S...
solutionsmanual3
 
PRIZ Academy - Change Flow Thinking Master Change with Confidence.pdf
PRIZ Guru
 
MULTI LEVEL DATA TRACKING USING COOJA.pptx
dollysharma12ab
 
A NEW FAMILY OF OPTICALLY CONTROLLED LOGIC GATES USING NAPHTHOPYRAN MOLECULE
ijoejnl
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
rajpal7872
 
GitHub_Copilot_Basics...........................pptx
ssusera13041
 
ENG8 Q1, WEEK 4.pptxoooiioooooooooooooooooooooooooo
chubbychubz1
 

Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field