Kafka Connect - debezium

1 | Kafka Connect /Debezium - Stream MySQL events to Kafka
Kafka Connect - Debezium
Stream MySQL events to Kafka

About me
Kasun Don
Software Engineer - London
AWIN AG | Eichhornstraße 3 | 10785 Berlin
Telephone +49 (0)30 5096910 | info@awin.com | www.awin.com
• Automation & DevOps enthusiastic
• Hands on Big Data Engineering
• Open Source Contributor

Why Streaming MySQL events (CDC) ?
• Integrations with Legacy Applications
Avoid dual writes when Integrating with legacy systems.
• Smart Cache Invalidation
Automatically invalidate entries in a cache as soon as the record(s) for entries change or are removed.
• Monitoring Data Changes
Immediately react to data changes committed by application/user.
• Data Warehousing
Atomic operation synchronizations for ETL-type solutions.
• Event Sourcing (CQRS)
Totally ordered collection of events to asynchronously update the read-only views while writes can be recorded as normal.

Apache Kafka
Kafka is a distributed publish-subscribe messaging system that is designed to be fast, scalable,
and durable.
Producer
Consumer Consumer Consumer
Producer Producer
Kafka

Kafka Connect
Connectors – A logical process responsible for managing the copying of data between Kafka and
another system.
There are two types of connectors,
• Source Connectors import data from another system
• Sink Connectors export data from Kafka
Workers – Unit of work that schedules connectors and tasks in a
process.
There are two main type of workers: standalone and distributed
Tasks - Unit of process that handles assigned set of work load by connectors.
Connector configuration allows set to maximum number of tasks can be run by a
connector.

Kafka Connect - Overview
Data
Source
Data
Sink
KafkaConnect
KAFKA
KafkaConnect

Kafka Connect – Configuration
Common Connector Configuration
• name - Unique name for the connector. Attempting to register again with the same name will
fail.
• connector.class - The Java class for the connector
• tasks.max - The maximum number of tasks that should be created for this connector. The
connector may create fewer tasks if it cannot achieve this level of parallelism.
Please note that connector configuration might vary, see specific connector documentation for
more information.
Distributed Mode - Worker Configuration
bootstrap.servers - A list of host/port pairs to use for establishing the initial connection to the Kafka cluster.
group.id - A unique string that identifies the Connect cluster group this worker belongs to.
config.storage.topic - The topic to store connector and task configuration data in. This must be the same for all
workers with the same group.id.
offset.storage.topic - The topic to store offset data for connectors in. This must be the same for all workers with the
same group.id
status.storage.topic - The name of the topic where connector and task configuration status updates are stored.
For more distributed mode worker configuration : http://docs.confluent.io/current/connect/userguide.html#configuring-workers

Kafka Connect – Running A Instance
It is recommended to run Kafka Connect on containerized environments such as Kubernetes, Mesos, Docker Swarm, or
YARN.
Kafka Connect distributed mode exposes port 8083 by default to serve management REST interface.
Kafka Connect does not automatically handle restarting or scaling workers which means your existing clustering solutions can continue to be used transparently. –
Confluent.io
$ docker run -d
> --name=kafka-connect
> --net=host
> -e CONNECT_BOOTSTRAP_SERVERS="kafka-broker:9092"
> -e CONNECT_GROUP_ID="group_1"
> -e CONNECT_CONFIG_STORAGE_TOPIC="kafka-connect-config"
> -e CONNECT_OFFSET_STORAGE_TOPIC="kafka-connect-offset"
> -e CONNECT_STATUS_STORAGE_TOPIC="kafka-connect-status"
> -e CONNECT_KEY_CONVERTER="org.apache.kafka.connect.json.JsonConverter"
> -e CONNECT_VALUE_CONVERTER="org.apache.kafka.connect.json.JsonConverter"
> -e CONNECT_INTERNAL_KEY_CONVERTER="org.apache.kafka.connect.json.JsonConverter"
> -e CONNECT_INTERNAL_VALUE_CONVERTER="org.apache.kafka.connect.json.JsonConverter"
> -e CONNECT_LOG4J_LOGGERS="io.debezium.connector.mysql=INFO"
> -v /opt/kafka-connect/jars:/etc/kafka-connect/jars
> --restart always
> confluentinc/cp-kafka-connect:3.3.0

Debezium Connector
What is Debezium ?
Debezium is an open source distributed platform for change data capture using MySQL row-level binary
logs. Debezium built on top of Kafka Connect API Framework to support fault tolerance and high availability
using Apache Kafka eco system. Debezium records in a transaction log all row-level changes committed to
each database table.
Supported Databases
Debezium currently able to support following list of database software.
• MySQL
• MongoDB
• PostgreSQL
For more Information : http://debezium.io/docs/connectors/

Debezium Connector – MySQL Configuration
Enable binary logs
server-id = 1000001
log_bin = mysql-bin
binlog_format = row
binlog_row_image = full
expire_logs_days = 5
or
Enable GTIDs
gtid_mode = on
enforce_gtid_consistency = on
MySQL user with sufficient privileges
GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION
CLIENT ON *.* TO 'debezium' IDENTIFIED BY password';
Supported MySQL topologies
• MySQL standalone
• MySQL master and slave
• Highly Available MySQL clusters
• Multi-Master MySQL
• Hosted MySQL eg: Amazon RDS and Amazon Aurora

Debezium Connector – MySQL Connector
Configuration
Example Configuration
{
"name": "example-connector",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"database.hostname": "127.0.0.1",
"database.port": "3306",
"database.user": "debezium",
"database.password": "dbz",
"database.server.id": "184054",
"database.server.name": "mysql-example",
"database.whitelist": "db1",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "dbhistory.mysql-example"
}
}
For more configuration : http://debezium.io/docs/connectors/mysql/

Debezium Connector – Add Connector to Kafka
Connect
For more configuration : http://debezium.io/docs/connectors/mysql/
More REST Endpoints : https://docs.confluent.io/current/connect/managing.html#using-the-rest-interface
List Available Connector plugins
$ curl -s http://kafka-connect:8083/connector-plugins
[
{
"class": "io.confluent.connect.jdbc.JdbcSinkConnector"
},
{
"class": "io.confluent.connect.jdbc.JdbcSourceConnector"
},
{
"class": "io.debezium.connector.mysql.MySqlConnector"
},
{
"class": "org.apache.kafka.connect.file.FileStreamSinkConnector"
},
{
"class": "org.apache.kafka.connect.file.FileStreamSourceConnector"
}
]
Add connector
$ curl -s -X POST -H "Content-Type: application/json" --data @connector-config.json http://kafka-connect:8083/conn
Remove connector
$ curl -X DELETE -H "Content-Type: application/json” http://kafka-connect:8083/connectors

Debezium Connector – Sample CDC Event
{
"schema": {},
"payload": {
"before": null,
"after": {
"id": 1004,
"first_name": "Anne Marie",
"last_name": "Kretchmar",
"email": "annek@noanswer.org"
},
"source": {
"name": "mysql-server-1",
"server_id": 223344,
"ts_sec": 1465581,
"gtid": null,
"file": "mysql-bin.000003",
"pos": 805,
"row": 0,
"snapshot": null
},
"op": "d",
"ts_ms": 1465581902461
}
}
{
"schema": {},
"payload": {
"before": {
"id": 1004,
"first_name": "Anne Marie",
"last_name": "Kretchmar",
"email": "annek@noanswer.org"
},
"after": null,
"source": {
"name": "mysql-server-1",
"server_id": 223344,
"ts_sec": 1465889,
"gtid": null,
"file": "mysql-bin.000003",
"pos": 806,
"row": 0,
"snapshot": null
},
"op": "d",
"ts_ms": 1465581902500
}
}
INSERT DELETE

Useful Links
Kafka Connect – User Guide
http://docs.confluent.io/2.0.0/connect/userguide.
html
Debezium – Interactive tutorial
http://debezium.io/docs/tutorial/
Debezium – MySQL connector
http://debezium.io/docs/connectors/mysql/
Kafka Connect – REST Endpoints
http://docs.confluent.io/2.0.0/connect/userguide.html#rest-
interface
Debezium Support/User Group
User ::
https://gitter.im/debezium/user
Dev :: https://gitter.im/debezium/dev
Kafka Connect – Connectors
https://www.confluent.io/product/connectors/

Q & A

Thank you
http://linkedin.com/in/kasundon

Kafka Connect - debezium

Recommended

More Related Content

What's hot (20)

Similar to Kafka Connect - debezium (20)

Recently uploaded (20)

Kafka Connect - debezium