SlideShare a Scribd company logo
© 2015, Conversant, Inc. All rights reserved.
PRESENTED BY
September 22, 2016
Kafka: data loss and duplication
Jayesh Thakrar
© 2015, Conversant, Inc. All rights reserved.2
Topics
 Kafka Overview
 Data Loss
 Data Duplication
 Audit & Monitoring
© 2015, Conversant, Inc. All rights reserved.3
Kafka Overview
© 2015, Conversant, Inc. All rights reserved.4
Kafka As A Log Abstraction
Client: Producer
Client: Consumer BClient: Consumer A
Kafka Server = Kafka Broker
Topic: app_events
Source: https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
© 2015, Conversant, Inc. All rights reserved.5
Topic Partitioning . . .
Kafka Broker
Client: Producer or Consumer
• Log sharded into partitions
• Messages ordered within each partition
• Message offset = absolute position in
partition
• Partitions stored on filesystem as
ordered sequence of log segments (files)
• Producers and clients can explicitly
control messages for/from a partition or
let API handle it
Source: https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
Topic: app_events
© 2015, Conversant, Inc. All rights reserved.6
Topic Partitioning . . . Scalability (Horizontal)
Client: Producer, Consumer
Kafka Broker 2
Leader
Replica
Replica
Leader
Replica
Replica
Leader
Replica
Replica
Kafka Broker 0
Kafka Broker 1
© 2015, Conversant, Inc. All rights reserved.7
Topic Partitioning . . . Scalability (Horizontal)
Kafka Broker 2
Leader
Replica
Replica
Leader
Replica
Replica
Leader
Replica
Replica
Kafka Broker 0
Kafka Broker 1
Pull-based inter-broker replication
© 2015, Conversant, Inc. All rights reserved.8
Other Key Concepts
 Cluster = collection of brokers (and loosely, zookeeper ensemble)
 Broker-id = a unique id (integer) assigned to each broker
 Controller = functionality within each broker responsible for leader
assignment and management, with one being the active controller
 Replica = partition copy, represented by the broker-id
 Assigned replicas = set of all replicas (broker-ids) for a partition
 ISR = In-Sync Replicas = subset of assigned replicas (brokers) that are
“in-sync/caught-up” with the leader*, including the leader
© 2015, Conversant, Inc. All rights reserved.9
Data Loss
© 2015, Conversant, Inc. All rights reserved.10
Data Loss : Inevitable
Upto 0.01% data loss:
For 700 billion
messages per day,
that's upto 7 million lost
messages
© 2015, Conversant, Inc. All rights reserved.11
PRODUCER – INNER WORING & DATA LOSS
Kafka Producer API
API Call-tree
kafkaProducer.send()
…. accumulator.append() // buffer
…. sender.send() // network I/O
•Messages accumulate in buffer in batches
•Batched by partition, retry at batch level
•Expired batches dropped after retries
•Error count and other metrics via JMX
Data Loss at Producer
•Failure to close / flush producer on
termination
•Dropped batches due to communication
or other errors when acks = 0 or retry
exhaustion
•Data produced faster than delivery,
causing BufferExhaustedException
(deprecated in 0.10+)
© 2015, Conversant, Inc. All rights reserved.12
dATA LOSS BY CLUSTER / BROKERS
Was it
a
leader?
Detected by
Controller via
zookeeper
Was it
in ISR?
Other
replicas
in ISR?
Elect another
leader
Allow
unclean
election?
ISR >=
min.insync
.replicas?
Relax, everything will
be fine
Partition
unavailable !!
Other
replicas
available?
Y Y
N
N
Y
Y
Y
Y
N
Potential data-loss
depending upon acks
setting at producer
Broker
Crashes
N
N
N
1
2
4
5 6
3
© 2015, Conversant, Inc. All rights reserved.13
Config for Data Durability and Consistency
 Producer config
- acks = -1 (or all)
- max.block.ms (blocking on buffer full, default = 60000) and retries
- request.timeout.ms (default = 30000) – it triggers retries
 Topic config
- min.insync.replicas = 2 (or higher)
 Broker config
- unclean.leader.election.enable = false
- timeout.ms (default = 30000) – inter-broker timeout for acks
© 2015, Conversant, Inc. All rights reserved.14
Config for Availability and Throughput
 Producer config
- acks = 0 (or 1)
- buffer.memory, batch.size, linger.ms (default = 100)
- request.timeout.ms, max.block.ms (default = 60,000), retries
- max.in.flight.requests.per.connection
 Topic config
- min.insync.replicas = 1 (default)
 Broker config
- unclean.leader.election.enable = true
© 2015, Conversant, Inc. All rights reserved.15
Data Duplication
© 2015, Conversant, Inc. All rights reserved.16
Data Duplication: How it occurs
Client: Producer
Client: Consumer BClient: Consumer A
Kafka Broker
Topic: app_events
Producer (API)
retries = messages
resent after timeout
when retries > 1
Consumer
consumes messages
more than once after
restart from unclean
shutdown / crash
© 2015, Conversant, Inc. All rights reserved.17
Data Duplication: How to minimize
Client: Producer
Client: Consumer BClient: Consumer A
Kafka Broker
Topic: app_events
Supplement Kafka API
offset management with:
- lookup last processed
offset in destination at
startup
- off-process, low-latency,
durable datastore to lookup
processed messages by
key or topic+partition+offset
© 2015, Conversant, Inc. All rights reserved.18
Audit & Monitoring
© 2015, Conversant, Inc. All rights reserved.19
Auditing: detect dropped and duplicates
 At producer, use call-back in Kafka producer API to:
- persist mapping of message-key to Kafka key
(Kafka key = topic + partition + offset mapping)
- persist message count per time interval/window
 At consumer check for:
- dropped/lost messages by verifying Kafka key to message key
- message count per time interval for duplicate data
© 2015, Conversant, Inc. All rights reserved.20
Monitoring and Operations
 JMX metrics reporter in producer, consumer and broker
 https://cwiki.apache.org/confluence/display/KAFKA/Operations
(outdated, but still relevant)
 SignalFX
http://www.confluent.io/blog/how-we-monitor-and-run-kafka-at-scale-signalfx/
 Netflix
http://techblog.netflix.com/2016/04/kafka-inside-keystone-pipeline.html
© 2015, Conversant, Inc. All rights reserved.21
Questions?
© 2015, Conversant, Inc. All rights reserved.22
Jayesh Thakrar
jthakrar@conversantmedia.com
Ad

More Related Content

What's hot (20)

IBM Message Hub service in Bluemix - Apache Kafka in a public cloud
IBM Message Hub service in Bluemix - Apache Kafka in a public cloudIBM Message Hub service in Bluemix - Apache Kafka in a public cloud
IBM Message Hub service in Bluemix - Apache Kafka in a public cloud
Andrew Schofield
 
Kafka connect 101
Kafka connect 101Kafka connect 101
Kafka connect 101
Whiteklay
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
confluent
 
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)
Timothy Spann
 
Why are we using kubernetes
Why are we using kubernetesWhy are we using kubernetes
Why are we using kubernetes
Worapol Alex Pongpech, PhD
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Guozhang Wang
 
Beyond the Brokers | Emma Humber and Andrew Borley, IBM
Beyond the Brokers | Emma Humber and Andrew Borley, IBMBeyond the Brokers | Emma Humber and Andrew Borley, IBM
Beyond the Brokers | Emma Humber and Andrew Borley, IBM
HostedbyConfluent
 
Kafka connect
Kafka connectKafka connect
Kafka connect
Andrew Stevenson
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Jeff Holoman
 
Apache Kafka - Free Friday
Apache Kafka - Free FridayApache Kafka - Free Friday
Apache Kafka - Free Friday
Otávio Carvalho
 
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and VormetricProtecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
confluent
 
Introducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Introducing IBM Message Hub: Cloud-scale messaging based on Apache KafkaIntroducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Introducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Andrew Schofield
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the FieldKafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
confluent
 
Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc.
Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc.Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc.
Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc.
Suneet Grover
 
Building Kafka-powered Activity Stream
Building Kafka-powered Activity StreamBuilding Kafka-powered Activity Stream
Building Kafka-powered Activity Stream
Oleksiy Holubyev
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Jemin Patel
 
Kafka clients and emitters
Kafka clients and emittersKafka clients and emitters
Kafka clients and emitters
Edgar Domingues
 
intro-kafka
intro-kafkaintro-kafka
intro-kafka
Rahul Shukla
 
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
How did we move the mountain? - Migrating 1 trillion+ messages per day across...How did we move the mountain? - Migrating 1 trillion+ messages per day across...
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
HostedbyConfluent
 
Effectively Managing a Hybrid Messaging Environment
Effectively Managing a Hybrid Messaging EnvironmentEffectively Managing a Hybrid Messaging Environment
Effectively Managing a Hybrid Messaging Environment
Andrew Schofield
 
IBM Message Hub service in Bluemix - Apache Kafka in a public cloud
IBM Message Hub service in Bluemix - Apache Kafka in a public cloudIBM Message Hub service in Bluemix - Apache Kafka in a public cloud
IBM Message Hub service in Bluemix - Apache Kafka in a public cloud
Andrew Schofield
 
Kafka connect 101
Kafka connect 101Kafka connect 101
Kafka connect 101
Whiteklay
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
confluent
 
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)
Timothy Spann
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Guozhang Wang
 
Beyond the Brokers | Emma Humber and Andrew Borley, IBM
Beyond the Brokers | Emma Humber and Andrew Borley, IBMBeyond the Brokers | Emma Humber and Andrew Borley, IBM
Beyond the Brokers | Emma Humber and Andrew Borley, IBM
HostedbyConfluent
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Jeff Holoman
 
Apache Kafka - Free Friday
Apache Kafka - Free FridayApache Kafka - Free Friday
Apache Kafka - Free Friday
Otávio Carvalho
 
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and VormetricProtecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
confluent
 
Introducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Introducing IBM Message Hub: Cloud-scale messaging based on Apache KafkaIntroducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Introducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Andrew Schofield
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the FieldKafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
confluent
 
Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc.
Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc.Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc.
Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc.
Suneet Grover
 
Building Kafka-powered Activity Stream
Building Kafka-powered Activity StreamBuilding Kafka-powered Activity Stream
Building Kafka-powered Activity Stream
Oleksiy Holubyev
 
Kafka clients and emitters
Kafka clients and emittersKafka clients and emitters
Kafka clients and emitters
Edgar Domingues
 
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
How did we move the mountain? - Migrating 1 trillion+ messages per day across...How did we move the mountain? - Migrating 1 trillion+ messages per day across...
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
HostedbyConfluent
 
Effectively Managing a Hybrid Messaging Environment
Effectively Managing a Hybrid Messaging EnvironmentEffectively Managing a Hybrid Messaging Environment
Effectively Managing a Hybrid Messaging Environment
Andrew Schofield
 

Viewers also liked (8)

BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...
BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...
BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...
Big Data Week
 
BDW Chicago 2016 - Manny Puentes, CTO, Altitude digital - How We Built a Data...
BDW Chicago 2016 - Manny Puentes, CTO, Altitude digital - How We Built a Data...BDW Chicago 2016 - Manny Puentes, CTO, Altitude digital - How We Built a Data...
BDW Chicago 2016 - Manny Puentes, CTO, Altitude digital - How We Built a Data...
Big Data Week
 
BDW Chicago 2016 - Alan Williamson, Chief Technology Officer, One Plus Syste...
BDW Chicago 2016 - Alan Williamson, Chief Technology Officer,  One Plus Syste...BDW Chicago 2016 - Alan Williamson, Chief Technology Officer,  One Plus Syste...
BDW Chicago 2016 - Alan Williamson, Chief Technology Officer, One Plus Syste...
Big Data Week
 
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
Big Data Week
 
BDW Chicago 2016 - Jennifer Boyce, Data Scientist, Sprout Social - The Road t...
BDW Chicago 2016 - Jennifer Boyce, Data Scientist, Sprout Social - The Road t...BDW Chicago 2016 - Jennifer Boyce, Data Scientist, Sprout Social - The Road t...
BDW Chicago 2016 - Jennifer Boyce, Data Scientist, Sprout Social - The Road t...
Big Data Week
 
BDW Chicago 2016 - Alex Bordei, Head of Product Management, Bigstep - The Dat...
BDW Chicago 2016 - Alex Bordei, Head of Product Management, Bigstep - The Dat...BDW Chicago 2016 - Alex Bordei, Head of Product Management, Bigstep - The Dat...
BDW Chicago 2016 - Alex Bordei, Head of Product Management, Bigstep - The Dat...
Big Data Week
 
BDW Chicago 2016 - Randal Cox, Chief Scientist & Co-Founder, Rippleshot - Ene...
BDW Chicago 2016 - Randal Cox, Chief Scientist & Co-Founder, Rippleshot - Ene...BDW Chicago 2016 - Randal Cox, Chief Scientist & Co-Founder, Rippleshot - Ene...
BDW Chicago 2016 - Randal Cox, Chief Scientist & Co-Founder, Rippleshot - Ene...
Big Data Week
 
BDW Chicago 2016 - Jessica Freaner, Data Scientist, Datascope Analytics - You...
BDW Chicago 2016 - Jessica Freaner, Data Scientist, Datascope Analytics - You...BDW Chicago 2016 - Jessica Freaner, Data Scientist, Datascope Analytics - You...
BDW Chicago 2016 - Jessica Freaner, Data Scientist, Datascope Analytics - You...
Big Data Week
 
BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...
BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...
BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...
Big Data Week
 
BDW Chicago 2016 - Manny Puentes, CTO, Altitude digital - How We Built a Data...
BDW Chicago 2016 - Manny Puentes, CTO, Altitude digital - How We Built a Data...BDW Chicago 2016 - Manny Puentes, CTO, Altitude digital - How We Built a Data...
BDW Chicago 2016 - Manny Puentes, CTO, Altitude digital - How We Built a Data...
Big Data Week
 
BDW Chicago 2016 - Alan Williamson, Chief Technology Officer, One Plus Syste...
BDW Chicago 2016 - Alan Williamson, Chief Technology Officer,  One Plus Syste...BDW Chicago 2016 - Alan Williamson, Chief Technology Officer,  One Plus Syste...
BDW Chicago 2016 - Alan Williamson, Chief Technology Officer, One Plus Syste...
Big Data Week
 
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
Big Data Week
 
BDW Chicago 2016 - Jennifer Boyce, Data Scientist, Sprout Social - The Road t...
BDW Chicago 2016 - Jennifer Boyce, Data Scientist, Sprout Social - The Road t...BDW Chicago 2016 - Jennifer Boyce, Data Scientist, Sprout Social - The Road t...
BDW Chicago 2016 - Jennifer Boyce, Data Scientist, Sprout Social - The Road t...
Big Data Week
 
BDW Chicago 2016 - Alex Bordei, Head of Product Management, Bigstep - The Dat...
BDW Chicago 2016 - Alex Bordei, Head of Product Management, Bigstep - The Dat...BDW Chicago 2016 - Alex Bordei, Head of Product Management, Bigstep - The Dat...
BDW Chicago 2016 - Alex Bordei, Head of Product Management, Bigstep - The Dat...
Big Data Week
 
BDW Chicago 2016 - Randal Cox, Chief Scientist & Co-Founder, Rippleshot - Ene...
BDW Chicago 2016 - Randal Cox, Chief Scientist & Co-Founder, Rippleshot - Ene...BDW Chicago 2016 - Randal Cox, Chief Scientist & Co-Founder, Rippleshot - Ene...
BDW Chicago 2016 - Randal Cox, Chief Scientist & Co-Founder, Rippleshot - Ene...
Big Data Week
 
BDW Chicago 2016 - Jessica Freaner, Data Scientist, Datascope Analytics - You...
BDW Chicago 2016 - Jessica Freaner, Data Scientist, Datascope Analytics - You...BDW Chicago 2016 - Jessica Freaner, Data Scientist, Datascope Analytics - You...
BDW Chicago 2016 - Jessica Freaner, Data Scientist, Datascope Analytics - You...
Big Data Week
 
Ad

Similar to BDW Chicago 2016 - Jayesh Thakrar, Sr. Software Engineer, Conversant - Data Loss and Data Duplication in Kafka (20)

Data Loss and Duplication in Kafka
Data Loss and Duplication in KafkaData Loss and Duplication in Kafka
Data Loss and Duplication in Kafka
Jayesh Thakrar
 
Kinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-diveKinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-dive
Yifeng Jiang
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
Joe Stein
 
Kafka Deep Dive
Kafka Deep DiveKafka Deep Dive
Kafka Deep Dive
Knoldus Inc.
 
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsPortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
Timothy Spann
 
Kafka 0.9, Things you should know
Kafka 0.9, Things you should knowKafka 0.9, Things you should know
Kafka 0.9, Things you should know
Ratish Ravindran
 
Apache Kafka - Strakin Technologies Pvt Ltd
Apache Kafka - Strakin Technologies Pvt LtdApache Kafka - Strakin Technologies Pvt Ltd
Apache Kafka - Strakin Technologies Pvt Ltd
Strakin Technologies Pvt Ltd
 
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LMESet your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
confluent
 
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
HostedbyConfluent
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Kafka RealTime Streaming
Kafka RealTime StreamingKafka RealTime Streaming
Kafka RealTime Streaming
Viyaan Jhiingade
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
confluent
 
Understanding Apache Kafka P99 Latency at Scale
Understanding Apache Kafka P99 Latency at ScaleUnderstanding Apache Kafka P99 Latency at Scale
Understanding Apache Kafka P99 Latency at Scale
ScyllaDB
 
Kafka Lag Monitoring For Human Beings (Elad Leev, AppsFlyer) Kafka Summit 2020
Kafka Lag Monitoring For Human Beings (Elad Leev, AppsFlyer) Kafka Summit 2020Kafka Lag Monitoring For Human Beings (Elad Leev, AppsFlyer) Kafka Summit 2020
Kafka Lag Monitoring For Human Beings (Elad Leev, AppsFlyer) Kafka Summit 2020
HostedbyConfluent
 
Highly Available Kafka Consumers and Kafka Streams on Kubernetes with Adrian ...
Highly Available Kafka Consumers and Kafka Streams on Kubernetes with Adrian ...Highly Available Kafka Consumers and Kafka Streams on Kubernetes with Adrian ...
Highly Available Kafka Consumers and Kafka Streams on Kubernetes with Adrian ...
HostedbyConfluent
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis
 
Removing performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configurationRemoving performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configuration
Knoldus Inc.
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
Joe Stein
 
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing HubIMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
In-Memory Computing Summit
 
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
HostedbyConfluent
 
Data Loss and Duplication in Kafka
Data Loss and Duplication in KafkaData Loss and Duplication in Kafka
Data Loss and Duplication in Kafka
Jayesh Thakrar
 
Kinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-diveKinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-dive
Yifeng Jiang
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
Joe Stein
 
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsPortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
Timothy Spann
 
Kafka 0.9, Things you should know
Kafka 0.9, Things you should knowKafka 0.9, Things you should know
Kafka 0.9, Things you should know
Ratish Ravindran
 
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LMESet your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
confluent
 
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
HostedbyConfluent
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
confluent
 
Understanding Apache Kafka P99 Latency at Scale
Understanding Apache Kafka P99 Latency at ScaleUnderstanding Apache Kafka P99 Latency at Scale
Understanding Apache Kafka P99 Latency at Scale
ScyllaDB
 
Kafka Lag Monitoring For Human Beings (Elad Leev, AppsFlyer) Kafka Summit 2020
Kafka Lag Monitoring For Human Beings (Elad Leev, AppsFlyer) Kafka Summit 2020Kafka Lag Monitoring For Human Beings (Elad Leev, AppsFlyer) Kafka Summit 2020
Kafka Lag Monitoring For Human Beings (Elad Leev, AppsFlyer) Kafka Summit 2020
HostedbyConfluent
 
Highly Available Kafka Consumers and Kafka Streams on Kubernetes with Adrian ...
Highly Available Kafka Consumers and Kafka Streams on Kubernetes with Adrian ...Highly Available Kafka Consumers and Kafka Streams on Kubernetes with Adrian ...
Highly Available Kafka Consumers and Kafka Streams on Kubernetes with Adrian ...
HostedbyConfluent
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis
 
Removing performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configurationRemoving performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configuration
Knoldus Inc.
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
Joe Stein
 
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing HubIMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
In-Memory Computing Summit
 
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
HostedbyConfluent
 
Ad

More from Big Data Week (20)

BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data A...
 BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data A... BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data A...
BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data A...
Big Data Week
 
BDWW17 London - Steve Bradbury, GRSC - Big Data to the Rescue: A Fraud Case S...
BDWW17 London - Steve Bradbury, GRSC - Big Data to the Rescue: A Fraud Case S...BDWW17 London - Steve Bradbury, GRSC - Big Data to the Rescue: A Fraud Case S...
BDWW17 London - Steve Bradbury, GRSC - Big Data to the Rescue: A Fraud Case S...
Big Data Week
 
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal InferenceBDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
Big Data Week
 
BDW17 London - Rita Simoes, Boehringer Ingelheim - Big Data in Pharma: Sittin...
BDW17 London - Rita Simoes, Boehringer Ingelheim - Big Data in Pharma: Sittin...BDW17 London - Rita Simoes, Boehringer Ingelheim - Big Data in Pharma: Sittin...
BDW17 London - Rita Simoes, Boehringer Ingelheim - Big Data in Pharma: Sittin...
Big Data Week
 
BDW17 London - Mick Ridley, Exterion Media & Dale Campbell , TfL - Transformi...
BDW17 London - Mick Ridley, Exterion Media & Dale Campbell , TfL - Transformi...BDW17 London - Mick Ridley, Exterion Media & Dale Campbell , TfL - Transformi...
BDW17 London - Mick Ridley, Exterion Media & Dale Campbell , TfL - Transformi...
Big Data Week
 
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...
Big Data Week
 
BDW17 London - Steve Bradbury - GRSC - Making Sense of the Chaos of Data
BDW17 London - Steve Bradbury - GRSC - Making Sense of the Chaos of DataBDW17 London - Steve Bradbury - GRSC - Making Sense of the Chaos of Data
BDW17 London - Steve Bradbury - GRSC - Making Sense of the Chaos of Data
Big Data Week
 
BDW17 London - Andy Boura - Thomson Reuters - Does Big Data Have to Mean Big ...
BDW17 London - Andy Boura - Thomson Reuters - Does Big Data Have to Mean Big ...BDW17 London - Andy Boura - Thomson Reuters - Does Big Data Have to Mean Big ...
BDW17 London - Andy Boura - Thomson Reuters - Does Big Data Have to Mean Big ...
Big Data Week
 
BDW17 London - Tom Woolrich, Financial Times - What Does Big Data Mean for th...
BDW17 London - Tom Woolrich, Financial Times - What Does Big Data Mean for th...BDW17 London - Tom Woolrich, Financial Times - What Does Big Data Mean for th...
BDW17 London - Tom Woolrich, Financial Times - What Does Big Data Mean for th...
Big Data Week
 
BDW17 London - Andrew Fryer, Microsoft - Everybody Needs a Bit of Science in ...
BDW17 London - Andrew Fryer, Microsoft - Everybody Needs a Bit of Science in ...BDW17 London - Andrew Fryer, Microsoft - Everybody Needs a Bit of Science in ...
BDW17 London - Andrew Fryer, Microsoft - Everybody Needs a Bit of Science in ...
Big Data Week
 
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the CloudBDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
Big Data Week
 
BDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
BDW16 London - William Vambenepe, Google - 3rd Generation Data PlatformBDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
BDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
Big Data Week
 
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
Big Data Week
 
BDW16 London - Nondas Sourlas, Bupa - Big Data in Healthcare
BDW16 London  - Nondas Sourlas, Bupa - Big Data in HealthcareBDW16 London  - Nondas Sourlas, Bupa - Big Data in Healthcare
BDW16 London - Nondas Sourlas, Bupa - Big Data in Healthcare
Big Data Week
 
BDW16 London - John Callan, Boxever - Data and Analytics - The Fuel Your Bran...
BDW16 London - John Callan, Boxever - Data and Analytics - The Fuel Your Bran...BDW16 London - John Callan, Boxever - Data and Analytics - The Fuel Your Bran...
BDW16 London - John Callan, Boxever - Data and Analytics - The Fuel Your Bran...
Big Data Week
 
BDW16 London - John Belchamber, Telefonica - New Data, New Strategies, New Op...
BDW16 London - John Belchamber, Telefonica - New Data, New Strategies, New Op...BDW16 London - John Belchamber, Telefonica - New Data, New Strategies, New Op...
BDW16 London - John Belchamber, Telefonica - New Data, New Strategies, New Op...
Big Data Week
 
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
Big Data Week
 
BDW16 London - Jonny Voon, Innovate UK - Smart Cities and the Buzz Word Bingo
BDW16 London - Jonny Voon, Innovate UK - Smart Cities and the Buzz Word BingoBDW16 London - Jonny Voon, Innovate UK - Smart Cities and the Buzz Word Bingo
BDW16 London - Jonny Voon, Innovate UK - Smart Cities and the Buzz Word Bingo
Big Data Week
 
BDW16 London - Marius Boeru, Bigstep - How to Automate Big Data with Ansible
BDW16 London -  Marius Boeru, Bigstep - How to Automate Big Data with AnsibleBDW16 London -  Marius Boeru, Bigstep - How to Automate Big Data with Ansible
BDW16 London - Marius Boeru, Bigstep - How to Automate Big Data with Ansible
Big Data Week
 
BDW16 London - Josh Partridge, Shazam - How Labels, Radio Stations and Brand...
BDW16 London - Josh Partridge, Shazam -  How Labels, Radio Stations and Brand...BDW16 London - Josh Partridge, Shazam -  How Labels, Radio Stations and Brand...
BDW16 London - Josh Partridge, Shazam - How Labels, Radio Stations and Brand...
Big Data Week
 
BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data A...
 BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data A... BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data A...
BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data A...
Big Data Week
 
BDWW17 London - Steve Bradbury, GRSC - Big Data to the Rescue: A Fraud Case S...
BDWW17 London - Steve Bradbury, GRSC - Big Data to the Rescue: A Fraud Case S...BDWW17 London - Steve Bradbury, GRSC - Big Data to the Rescue: A Fraud Case S...
BDWW17 London - Steve Bradbury, GRSC - Big Data to the Rescue: A Fraud Case S...
Big Data Week
 
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal InferenceBDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
Big Data Week
 
BDW17 London - Rita Simoes, Boehringer Ingelheim - Big Data in Pharma: Sittin...
BDW17 London - Rita Simoes, Boehringer Ingelheim - Big Data in Pharma: Sittin...BDW17 London - Rita Simoes, Boehringer Ingelheim - Big Data in Pharma: Sittin...
BDW17 London - Rita Simoes, Boehringer Ingelheim - Big Data in Pharma: Sittin...
Big Data Week
 
BDW17 London - Mick Ridley, Exterion Media & Dale Campbell , TfL - Transformi...
BDW17 London - Mick Ridley, Exterion Media & Dale Campbell , TfL - Transformi...BDW17 London - Mick Ridley, Exterion Media & Dale Campbell , TfL - Transformi...
BDW17 London - Mick Ridley, Exterion Media & Dale Campbell , TfL - Transformi...
Big Data Week
 
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...
Big Data Week
 
BDW17 London - Steve Bradbury - GRSC - Making Sense of the Chaos of Data
BDW17 London - Steve Bradbury - GRSC - Making Sense of the Chaos of DataBDW17 London - Steve Bradbury - GRSC - Making Sense of the Chaos of Data
BDW17 London - Steve Bradbury - GRSC - Making Sense of the Chaos of Data
Big Data Week
 
BDW17 London - Andy Boura - Thomson Reuters - Does Big Data Have to Mean Big ...
BDW17 London - Andy Boura - Thomson Reuters - Does Big Data Have to Mean Big ...BDW17 London - Andy Boura - Thomson Reuters - Does Big Data Have to Mean Big ...
BDW17 London - Andy Boura - Thomson Reuters - Does Big Data Have to Mean Big ...
Big Data Week
 
BDW17 London - Tom Woolrich, Financial Times - What Does Big Data Mean for th...
BDW17 London - Tom Woolrich, Financial Times - What Does Big Data Mean for th...BDW17 London - Tom Woolrich, Financial Times - What Does Big Data Mean for th...
BDW17 London - Tom Woolrich, Financial Times - What Does Big Data Mean for th...
Big Data Week
 
BDW17 London - Andrew Fryer, Microsoft - Everybody Needs a Bit of Science in ...
BDW17 London - Andrew Fryer, Microsoft - Everybody Needs a Bit of Science in ...BDW17 London - Andrew Fryer, Microsoft - Everybody Needs a Bit of Science in ...
BDW17 London - Andrew Fryer, Microsoft - Everybody Needs a Bit of Science in ...
Big Data Week
 
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the CloudBDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
Big Data Week
 
BDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
BDW16 London - William Vambenepe, Google - 3rd Generation Data PlatformBDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
BDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
Big Data Week
 
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
Big Data Week
 
BDW16 London - Nondas Sourlas, Bupa - Big Data in Healthcare
BDW16 London  - Nondas Sourlas, Bupa - Big Data in HealthcareBDW16 London  - Nondas Sourlas, Bupa - Big Data in Healthcare
BDW16 London - Nondas Sourlas, Bupa - Big Data in Healthcare
Big Data Week
 
BDW16 London - John Callan, Boxever - Data and Analytics - The Fuel Your Bran...
BDW16 London - John Callan, Boxever - Data and Analytics - The Fuel Your Bran...BDW16 London - John Callan, Boxever - Data and Analytics - The Fuel Your Bran...
BDW16 London - John Callan, Boxever - Data and Analytics - The Fuel Your Bran...
Big Data Week
 
BDW16 London - John Belchamber, Telefonica - New Data, New Strategies, New Op...
BDW16 London - John Belchamber, Telefonica - New Data, New Strategies, New Op...BDW16 London - John Belchamber, Telefonica - New Data, New Strategies, New Op...
BDW16 London - John Belchamber, Telefonica - New Data, New Strategies, New Op...
Big Data Week
 
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
Big Data Week
 
BDW16 London - Jonny Voon, Innovate UK - Smart Cities and the Buzz Word Bingo
BDW16 London - Jonny Voon, Innovate UK - Smart Cities and the Buzz Word BingoBDW16 London - Jonny Voon, Innovate UK - Smart Cities and the Buzz Word Bingo
BDW16 London - Jonny Voon, Innovate UK - Smart Cities and the Buzz Word Bingo
Big Data Week
 
BDW16 London - Marius Boeru, Bigstep - How to Automate Big Data with Ansible
BDW16 London -  Marius Boeru, Bigstep - How to Automate Big Data with AnsibleBDW16 London -  Marius Boeru, Bigstep - How to Automate Big Data with Ansible
BDW16 London - Marius Boeru, Bigstep - How to Automate Big Data with Ansible
Big Data Week
 
BDW16 London - Josh Partridge, Shazam - How Labels, Radio Stations and Brand...
BDW16 London - Josh Partridge, Shazam -  How Labels, Radio Stations and Brand...BDW16 London - Josh Partridge, Shazam -  How Labels, Radio Stations and Brand...
BDW16 London - Josh Partridge, Shazam - How Labels, Radio Stations and Brand...
Big Data Week
 

Recently uploaded (20)

Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 

BDW Chicago 2016 - Jayesh Thakrar, Sr. Software Engineer, Conversant - Data Loss and Data Duplication in Kafka

  • 1. © 2015, Conversant, Inc. All rights reserved. PRESENTED BY September 22, 2016 Kafka: data loss and duplication Jayesh Thakrar
  • 2. © 2015, Conversant, Inc. All rights reserved.2 Topics  Kafka Overview  Data Loss  Data Duplication  Audit & Monitoring
  • 3. © 2015, Conversant, Inc. All rights reserved.3 Kafka Overview
  • 4. © 2015, Conversant, Inc. All rights reserved.4 Kafka As A Log Abstraction Client: Producer Client: Consumer BClient: Consumer A Kafka Server = Kafka Broker Topic: app_events Source: https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
  • 5. © 2015, Conversant, Inc. All rights reserved.5 Topic Partitioning . . . Kafka Broker Client: Producer or Consumer • Log sharded into partitions • Messages ordered within each partition • Message offset = absolute position in partition • Partitions stored on filesystem as ordered sequence of log segments (files) • Producers and clients can explicitly control messages for/from a partition or let API handle it Source: https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying Topic: app_events
  • 6. © 2015, Conversant, Inc. All rights reserved.6 Topic Partitioning . . . Scalability (Horizontal) Client: Producer, Consumer Kafka Broker 2 Leader Replica Replica Leader Replica Replica Leader Replica Replica Kafka Broker 0 Kafka Broker 1
  • 7. © 2015, Conversant, Inc. All rights reserved.7 Topic Partitioning . . . Scalability (Horizontal) Kafka Broker 2 Leader Replica Replica Leader Replica Replica Leader Replica Replica Kafka Broker 0 Kafka Broker 1 Pull-based inter-broker replication
  • 8. © 2015, Conversant, Inc. All rights reserved.8 Other Key Concepts  Cluster = collection of brokers (and loosely, zookeeper ensemble)  Broker-id = a unique id (integer) assigned to each broker  Controller = functionality within each broker responsible for leader assignment and management, with one being the active controller  Replica = partition copy, represented by the broker-id  Assigned replicas = set of all replicas (broker-ids) for a partition  ISR = In-Sync Replicas = subset of assigned replicas (brokers) that are “in-sync/caught-up” with the leader*, including the leader
  • 9. © 2015, Conversant, Inc. All rights reserved.9 Data Loss
  • 10. © 2015, Conversant, Inc. All rights reserved.10 Data Loss : Inevitable Upto 0.01% data loss: For 700 billion messages per day, that's upto 7 million lost messages
  • 11. © 2015, Conversant, Inc. All rights reserved.11 PRODUCER – INNER WORING & DATA LOSS Kafka Producer API API Call-tree kafkaProducer.send() …. accumulator.append() // buffer …. sender.send() // network I/O •Messages accumulate in buffer in batches •Batched by partition, retry at batch level •Expired batches dropped after retries •Error count and other metrics via JMX Data Loss at Producer •Failure to close / flush producer on termination •Dropped batches due to communication or other errors when acks = 0 or retry exhaustion •Data produced faster than delivery, causing BufferExhaustedException (deprecated in 0.10+)
  • 12. © 2015, Conversant, Inc. All rights reserved.12 dATA LOSS BY CLUSTER / BROKERS Was it a leader? Detected by Controller via zookeeper Was it in ISR? Other replicas in ISR? Elect another leader Allow unclean election? ISR >= min.insync .replicas? Relax, everything will be fine Partition unavailable !! Other replicas available? Y Y N N Y Y Y Y N Potential data-loss depending upon acks setting at producer Broker Crashes N N N 1 2 4 5 6 3
  • 13. © 2015, Conversant, Inc. All rights reserved.13 Config for Data Durability and Consistency  Producer config - acks = -1 (or all) - max.block.ms (blocking on buffer full, default = 60000) and retries - request.timeout.ms (default = 30000) – it triggers retries  Topic config - min.insync.replicas = 2 (or higher)  Broker config - unclean.leader.election.enable = false - timeout.ms (default = 30000) – inter-broker timeout for acks
  • 14. © 2015, Conversant, Inc. All rights reserved.14 Config for Availability and Throughput  Producer config - acks = 0 (or 1) - buffer.memory, batch.size, linger.ms (default = 100) - request.timeout.ms, max.block.ms (default = 60,000), retries - max.in.flight.requests.per.connection  Topic config - min.insync.replicas = 1 (default)  Broker config - unclean.leader.election.enable = true
  • 15. © 2015, Conversant, Inc. All rights reserved.15 Data Duplication
  • 16. © 2015, Conversant, Inc. All rights reserved.16 Data Duplication: How it occurs Client: Producer Client: Consumer BClient: Consumer A Kafka Broker Topic: app_events Producer (API) retries = messages resent after timeout when retries > 1 Consumer consumes messages more than once after restart from unclean shutdown / crash
  • 17. © 2015, Conversant, Inc. All rights reserved.17 Data Duplication: How to minimize Client: Producer Client: Consumer BClient: Consumer A Kafka Broker Topic: app_events Supplement Kafka API offset management with: - lookup last processed offset in destination at startup - off-process, low-latency, durable datastore to lookup processed messages by key or topic+partition+offset
  • 18. © 2015, Conversant, Inc. All rights reserved.18 Audit & Monitoring
  • 19. © 2015, Conversant, Inc. All rights reserved.19 Auditing: detect dropped and duplicates  At producer, use call-back in Kafka producer API to: - persist mapping of message-key to Kafka key (Kafka key = topic + partition + offset mapping) - persist message count per time interval/window  At consumer check for: - dropped/lost messages by verifying Kafka key to message key - message count per time interval for duplicate data
  • 20. © 2015, Conversant, Inc. All rights reserved.20 Monitoring and Operations  JMX metrics reporter in producer, consumer and broker  https://cwiki.apache.org/confluence/display/KAFKA/Operations (outdated, but still relevant)  SignalFX http://www.confluent.io/blog/how-we-monitor-and-run-kafka-at-scale-signalfx/  Netflix http://techblog.netflix.com/2016/04/kafka-inside-keystone-pipeline.html
  • 21. © 2015, Conversant, Inc. All rights reserved.21 Questions?
  • 22. © 2015, Conversant, Inc. All rights reserved.22 Jayesh Thakrar [email protected]

Editor's Notes

  • #3: Kafka is a popular messaging system, being used by startups to some of the largest internet companies. Unlike some other messaging systems, Kafka can loose messages or introduce duplicates. This presentation will look into how it can happen, the tuning knobs to minimize thar and operational tips. The presentation will start with an overview that lays the foundation to understand the data loss and data duplication. Then it will cover the remainder of the key topics of data loss, data duplication and auditing and monitoring. Among other things, auditing and monitoring has pointers on how to collect metrics to quatify any data loss or duplication issues, if any.
  • #5: Kafka started out as an log abstraction, in which message producers generate messages and append to an infinte log. The producer is agnostic of how or where the messages are stored or consumed. Messages in turn are retrieved by consumers, that are agnostic of message producers. Messages are stored in the log in the order they are presented to Kafka. A message can be stored as just a value or as a key-value pair. Consumers retrieve messages in the same order, however they can have filters to select message with specific keys or retrieve from a specific starting point in the log. The Kafka consumer API provides for a checkpoint mechanism to store the point of most recently read message.
  • #6: If the log is implemented as a single, append-only log file than it becomes a scalability limiting factor. So Kafka has the notion of partitioning a log or topic. By that, the log is logically sharded into partitions. You can determine the sharding rule by giving your own partitioner or can use the built-in hash-based sharding which is applied to the key for sharding. Note that if you don't need or have a key, then use a null value for the key – not an empty string or a constant, otherwise sharding will not work correctly and there will be skew.
  • #7: Partitioning allows data or partitions to be shared across multiple servers or Kafka brokers. Also, for durability reasons, partitions may be replicated across servers. All partitions are deemed equal, but one of the replicated partitions will be deemed as the primary or the leader and the others as follower replicas. The producers and consumers only work with the leader replicas i.e. they communicate with the brokers hosting the leader replica.
  • #8: For each partition, the follower replica brokers fetch the data from the leader broker and this is all taken care of by the brokers using a pull mechanism. So note that a leader partition or broker serves producer clients, consumer clients and partition replicas.
  • #11: Inspite of partition and replication, there is the potential for data loss. And who else then to get that admission of potential data loss than Netflix – who has one of the largest Kafka environment in terms of data flow.