SlideShare a Scribd company logo
101* mistakes FINN.no
has made with Kafka
Audun Fauchald Strand
@audunstrand
bio: gof, mq, ejb,
mda, wli, eda, soa,
esb, ddd
Henning Spjelkavik
@spjelkavik
bio: ejb, aq, eda,
soa, gis, ddd, iof,
aws, sql
agenda
introduction to kafka
kafka @ finn.no
101* mistakes
questions
“From a certain point onward
there is no longer any turning
back. That is the point that
must be reached.”
― Franz Kafka, The Trial
FINN.no
2nd largest website in norway
60 millions pageviews pr day
80 microservices
130 developers
900 deploys pr. week
6 minutes from commit to deploy
(median)
101 mistakes FINN.no has made with Kafka (Baksida meetup)
introduction to
kafka
why use kafka
#notAnESB
what is a log
terminology
components
giant leap
“A First Sign of the Beginning of
Understanding is the Wish to Die.”
― Franz Kafka
https://commons.wikimedia.org/wiki/File:Kafka.jpg
Why use Kafka?
“Apache Kafka is publish-subscribe messaging
rethought as a distributed commit log.”
● Fast
● Scalable
● Durable
● Distributed by design
Sweet spot: High volume, low latency
Quora:
“Use Kafka if you have a fire hose of events (100k+/sec)
you need delivered in partitioned order 'at least once' with
a mix of online and batch consumers, you want to be able
to re-read messages”
“Use Rabbit if you have messages (20k+/sec) that need to
be routed in complex ways to consumers, you want per-
message delivery guarantees, you don't care about ordered
delivery”
#NotAnESB
“Based on conversations with the project
sponsors I began to suspect that at least the
introduction of the ESB was a case of RDD, ie.
Resume-Driven Development, development in
which key choices are made with only one
question in mind: how good does it look on my
CV?
Talking to the developers I learned that the ESB
had introduced “nothing but pain.”
Was this really another case of architect’s dream,
developer’s nightmare?”
1. Are you integrating 3 or more applications/services? If
you only need to communicate between 2 applications,
using point-to-point integration is going to be easier.
2. Do you need to use more than one type of
communication protocol? If you are just using
HTTP/Web Services or just JMS, you’re not going to get
any of the benefits if cross protocol messaging and
transformation that Mule provides.
3. Do you need message routing capabilities such as
forking and aggregating message flows, or content-
based routing? Many applications do not need these
capabilities
Mule ESB
What is a log?
A log is perhaps the simplest possible storage abstraction.
It is an append-only, totally-ordered sequence of records ordered by time.
Appended to the end of the log, reads proceed left-to-right.
Each entry is assigned a unique sequential log entry number.
The ordering of records defines a notion of "time" since entries to the left are
defined to be older then entries to the right.
This is a data log, not an application log (i.e not log4j)
The two problems a log solves—ordering changes and distributing data—are even
more important in distributed data systems.
Changelog 101: Tables and Events are Dual
Duality: a log of changes and a table.
Accounting
log: credit and debit (events pr key)
table: all current balances (i.e state pr key)
In a sense the log is the more fundamental data structure: in addition to creating the
original table you can also transform it to create all kinds of derived tables.
producers writes to brokers
consumers reads from brokers
everything is distributed
data is stored in topics
topics are split into partitions
which are replicated
kafka cluster
consumer
producerproducer
producer producer
consumer
consumer
consumer
consumer
consumer
producer
producer
terminology
producer
ad.new
consumer
group.id
ad.new
broker1
broker2
broker3
P0
R1
P0
R2
P1
R3
P2
R1
P1
R2
P2
R3
zookeeper
components
1:data 2:... 3:... 4:... 5:... 6:... 7:...
old messages newer messages
...
consumer.group.1
consumer.group.2
producer a1
Giant leap?
In fact, persistent replicated messaging is such a giant leap in messaging architecture that it may be worthwhile to point out a few side
effects:
a. Per-message acknowledgments have disappeared
b. ordered delivery
c. The problem of mismatched consumer speed has disappeared. A slow consumer can peacefully co-exist with a fast
consumer now
d. Need for difficult messaging semantics like delayed delivery, re-delivery etc. has disappeared. Now it is all up to the
consumer to read whatever message whenever - onus has shifted from broker to consumer
e. The holy grail of message delivery guarantee: at-least-once is the new reality - both Kafka and Azure Event Hub
provides this guarantee. You still have to make your consumers and downstream systems idempotent so that recovering
from a failure and processing the same message twice does not upset it too much, but hey - that has always been the
case
http://blogs.msdn.com/b/opensourcemsft/archive/2015/08/08/choose-between-azure-event-hub-and-kafka-_2d00_-what-
you-need-to-know.aspx
Confluent platform
kafka @ finn.no
kafka @finn.
no
timeline
architecture
tools
use cases
timeline
2012 Decided to use RabbitMQ as message queue. Kafka was installed for a test
2013 Feb - Kafka PoC (“strømmen”)
Ad matching (“lagret søk”)
Ad indexed
2014
Product lifecycle - product paid, etc
2015 Feb -> May - 0.8.2. Dedicated cluster
dcx
dcy
architecture
broker05
zk
kafka
broker01
zk
kafka
broker03
zk
kafka
broker04
zk
kafka
broker02
zk
kafka
messages as thrift map - single
IDL
common java library
+ schema
+ produce messages
+ consume messages
node.js
ruby
python
tooling
Grafana dashboard visualizing jmx stats
kafka-manager
kafka-cat
kafka offset monitor
alerts with sensu
tooling
Grafana dashboard visualizing jmx stats
kafka-manager
kafka-cat
kafka offset monitor
alerts with sensu
uses
counting
+ pageviews
+ clicks
communication between services
+ domain events (eg. new ad)
+ notifications
+ build pipeline
data replication
+ log compaction
tooling
+ slack notifications
+ (zipkin)
cache invalidation
+ on update/delete (updated profile)
101* mistakes
“God gives the
nuts, but he
does not crack
them.”
― Franz Kafka
* 10
1. pre 1.0
2. inside/outside
3. schemas
4. kafka is a database
5. client side rebalance
6. mixed load/one configuration
7. autocreate of topics
8. 8 zk nodes
9. unclean leader election
10. 128 partitions
why is it a mistake
what is the consequence
what is the correct solution
what has finn.no done
mistake?
started using kafka pre 1.0 release
why is it a mistake
0.7 -> 0.8: not backwards compatible
0.7 client does not work well with 0.8 cluster
0.8 -> 0.9: not backwards compatible
0.8 consumer does not work with 0.9 cluster
0.9 - 1.0: ???
what is the consequence
kafka is a critical component for communication between applications
coordination of 10-15 teams with 30 services
migration process of 6-8 months
from decision to old cluster turned off
what is the correct solution
evaluate the maturity of
critical architecture
components before
everyone starts using it
what has finn.no done
0.7 -> 0.8
1) create additional 0.8 cluster
2) all clients consume from both clusters (0.7 and 0.8)
3) critical services (payment) migrates consumers and producers during nighttime
with downtime
4) rest of services migrates it producers to 0.8 (the last mile takes a long time)
5) stop consuming from 0.7
6) turn off 0.7 cluster
what has finn.no done
0.7 -0.8
1) create additional 0.8 cluster
2) all clients consume from both clusters (0.7 and 0.8)
3) critical services migrates consumers and producers during nighttime with
downtime
4) rest of services migrates it producers to 0.8 (the last mile takes a long time)
5) stop consuming from 0.7
6) turn off 0.7 cluster
7) read blogpost stating that 0.9 is not backwards compatible with 0.8
mistake:
not considering the coupling
you get when everyone can
see all your data
https://flic.kr/p/6MjhUR
why is it a mistake
everything that is published on Kafka is visible to any client that can access
what is the consequence
direct reads across services/domains is quite normal in legacy and/or enterprise
systems
this coupling makes it hard to make changes
Kafka has no security pr topic - you must add that yourself
what is the correct solution
Data on the inside versus data on the outside
At least decide on a convention for what is private data and what is public data
mistake:
not properly defined schema
why is it a mistake
schema change differently from the code producing and consuming messages
data needs versioning
defining schema in a java library makes it more difficult to access data from non-jvm
languages
code repository with java is not the easiest way to figure out the data on a topic
what is the consequence
development speed outside jvm has been slow
change of data needs coordinated deployment
difficult to create tooling that needs to know data format, like data lake
what is the correct solution
confluent.io platform has a separate schema registry
rest interface
apache avro
multiple compatibility settings and evolutions strategies
connect
what has finn.no done
still using java library
confluent platform 2.0 is planned for the next step, not kafka 0.9
mistake:
Kafka is like a database -
treat it like one
https://flic.kr/p/2xQ9VT
why is it a mistake
We used our normal Ops scripts for kafka - if a config changes, restart automatically
If shutdown does not work within 5 seconds, kill -9
A database needs to finish what it is doing, before shutting down
A distributed database even more so
what is the consequence
At least need for recovery at startup
Data loss
No convergation - you won’t come back up
what is the correct solution
Do not play with stored data - understand how and when to apply changes
what has finn.no done
stop using the kill -9 script
mistake:
not properly understood
client side rebalance
why is it a mistake
kafka has a clear algorithm for handling increase or decrease in clients to be able
to keep everything balanced.
all consumers are reconnected
This algorithm creates a lot of noise in logs when you deploy all the time
common java-library had 4 consumer-threads as default pr application
what is the consequence
developers did not understand what happened
during a deploy
“kafka is unstable”
most service-instances did not receive messages
each deploy of a service (typically 4 instances)
triggered 4 rebalances.
if rebalance takes to long, the (at least our)
consumer dies.
“kafka is down”
what is the correct solution
1. For each topic T that Ci
subscribes to
2. let PT
be all partitions producing topic T
3. let CG
be all consumers in the same group as Ci
that consume topic T
4. sort PT
(so partitions on the same broker are clustered together)
5. sort CG
6. let i be the index position of Ci
in CG
and let N = size(PT
)/size(CG
)
7. assign partitions from i*N to (i+1)*N - 1 to consumer Ci
8. remove current entries owned by Ci
from the partition owner registry
9. add newly assigned partitions to the partition owner registry
(we may need to re-try this until the original partition owner releases its
ownership)
all consumers in a group rebalances when a consumer arrives or departs from the
group
what has finn.no done
some consumers use 1 thread pr instance
planning to rewrite consumer library
read kafka documentation
mistake:
running mixed load with a
single configuration
https://flic.kr/p/qbarDR
why is it a mistake
Historically - One Big Database with Expensive License => One, Single Server
Database world - OLTP and OLAP
Changes with Open Source software and Cloud
Tried to simplify the developer's day with a single config
Kafka supports very high throughput and highly reliable
what is the consequence
Trade off between throughput and degree of reliability
With a single configuration - the last commit wins (remember the 128 partitions?)
Either high throughput, and risk of loss - or potentially too slow
what is the correct solution
understand your use cases and their needs!
Defaults that are quite reliable
Exposing configuration variables in the client
Ask the questions;
● at least once delivery
● ordering - if you partition, what must have strict ordering
● 99% delivery is that enough?
● which level of throughput is needed
what has finn.no done
Configuration
Configuration for production
● Partitions
● Replicas (default.replication.factor)
● Minimum ISR (min.insync.replicas)
● Wait for acknowledge when producing messages (request.required.acks, block.on.buffer.full)
● Retries
● Leader election
Configuration for consumer
● Number of threads
● When to commit (autocommit.enable vs consumer.commitOffsets)
Gwen Shapira recommends...
● akcs = all
● block.on.buffer.full = true
● retries = MAX_INT
● max.inflight.requests.per.connect = 1
● Producer.close()
● replication-factor >= 3
● min.insync.replicas = 2
● unclean.leader.election = false
● auto,offset.commit = false
● commit after processing
● monitor!
mistake:
topics are autocreated
why is it a mistake
topics are created every time someone tries to consume from or produce to a
topics name
what is the consequence
topic names from production:
we are not able to control the number of topics
too many topics gives too many partitions. ZooKeeper gets slow when handling
this.
no place to put topic condig
Event.USER.blabla, testing42, testing2,
Event.GO_CLICK.asdf4133, Event.GO_CLICK.asdf7392, Event.GO_CLICK.asdf7532,
what is the correct solution
small number of partitions as default
increase number of partitons for selected topics
what has finn.no done
5 partitions as default
2 topics have more than 5 partitons
topics with lots of traffic
mistake:
deploy a proof of concept
hack - in production ; i.e
why we had 8 zk nodes
https://flic.kr/p/6eoSgT
why is it a mistake
Kafka was set up by Ops for a test - not for hardened production use
By coincidence we had 8 nodes for kafka, the same 8 nodes for zookeeper
Zookeeper is dependent on a majority quorum, low latency between nodes
what is the consequence
Zookeeper recommends 3 nodes for normal usage, 5 for high, and any more is
questionable
More nodes leads to longer time for finding consensus, more communication
If we get a split between data centers, there will be 4 in each
You should not run Zk between data centers, due to latency and outage
possibilities
what is the correct solution
Have an odd number of Zookeeper nodes - preferrably 3, at most 5
Don’t cross data centers
Check the documentation before deploying serious production load
Don’t run a sensitive service (Zookeeper) on a server with 50 services, 300% over
committed on RAM
Not treating Kafka as a database. kill -9
mistake:
unclean.leader.election = true
for reliable messaging
http://media-2.web.britannica.com/eb-media/40/126140-050-523CFDE4.jpg
why is it a mistake
in certain conditions unclean.leader.elections=true can lose messages
replication.factor = 3
in.sync.replicas = 1 100
101
replica1
100
101
replica2
100
replica3
why is it a mistake
in certain conditions unclean.leader.elections=true can lose messages
replication.factor = 3
in.sync.replicas = 1
replica3 dies
100
101
replica1
100
101
replica2
100
replica3
leader
why is it a mistake
in certain conditions unclean.leader.elections=true can lose messages
replication.factor = 3
in.sync.replicas = 1
replica2 dies
100
101
102
103
104
replica1
100
101
replica2
100
replica3
why is it a mistake
in certain conditions unclean.leader.elections=true can lose messages
replication.factor = 3
in.sync.replicas = 1
replica1 dies
100
101
102
103
104
replica1
100
101
replica2
100
replica3
why is it a mistake
in certain conditions unclean.leader.elections=true can lose messages
replication.factor = 3
in.sync.replicas = 1
which replicas comes
online first
100
101
102
103
104
replica1
100
replica3
100
101
replica2
what is the consequence
messages might be lost forever
without errors in the client
https://upload.wikimedia.org/wikipedia/commons/d/d4/George-W-Bush.jpeg
what is the correct solution
replication.factor = 3
in.sync.replicas = 2
unclean.leader.election=false
(unless you are worried about what
happens when replica1 (leader) is dead for
a long time)
what has finn.no done
replication.factor = 3
in.sync.replicas = 1 (2 for selected topics)
unclean.leader.election=true
mistake:
default configuration of 128 partitions
for each topic
why is it a mistake
partitions are kafkas way of scaling consumers, 128 partitions can handle 128
consumers processes
0.8 clusters could not reduce the number of partitions without deleting data
highest number of consumers today is 20
what is the consequence
0.8 cluster was configured with 128 partitions as default, for all topics.
many partitions and many topics creates many datapoints that must be coordinated
zookeeper must coordinate all this
rebalance must balance all clients on all partitions
zookeeper and kafka went down (may 2015)
(500 topics * 128 partitions)
what is the correct solution
small number of partitions as default
increase number of partitions for selected topics
understand your use case
reduce length of transactions on consumer side
what has finn.no done
5 partitions as default
2 topics have more than 5 partitions
topics with lots of traffic
101 mistakes FINN.no has made with Kafka (Baksida meetup)
Should you do
this at home?
101 mistakes FINN.no has made with Kafka (Baksida meetup)
“They say ignorance is
bliss.... they're wrong ”
― Franz Kafka
??
?
“It's only because
of their stupidity
that they're able
to be so sure of
themselves.”
― Franz Kafka,
The Trial
Audun Fauchald Strand
@audunstrand
Henning Spjelkavik
@spjelkavik
http://www.finn.no/apply-here
Ad

More Related Content

What's hot (20)

Consumer offset management in Kafka
Consumer offset management in KafkaConsumer offset management in Kafka
Consumer offset management in Kafka
Joel Koshy
 
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 PeopleKafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
confluent
 
Developing with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaDeveloping with the Go client for Apache Kafka
Developing with the Go client for Apache Kafka
Joe Stein
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Shiao-An Yuan
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Guozhang Wang
 
The Foundations of Multi-DC Kafka (Jakub Korab, Solutions Architect, Confluen...
The Foundations of Multi-DC Kafka (Jakub Korab, Solutions Architect, Confluen...The Foundations of Multi-DC Kafka (Jakub Korab, Solutions Architect, Confluen...
The Foundations of Multi-DC Kafka (Jakub Korab, Solutions Architect, Confluen...
confluent
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Jeff Holoman
 
Troubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolutionTroubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolution
Joel Koshy
 
Have your cake and eat it too
Have your cake and eat it tooHave your cake and eat it too
Have your cake and eat it too
Gwen (Chen) Shapira
 
Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaDecoupling Decisions with Apache Kafka
Decoupling Decisions with Apache Kafka
Grant Henke
 
Help, my Kafka is broken! (Emma Humber, IBM) Kafka Summit SF 2019
Help, my Kafka is broken! (Emma Humber, IBM) Kafka Summit SF 2019Help, my Kafka is broken! (Emma Humber, IBM) Kafka Summit SF 2019
Help, my Kafka is broken! (Emma Humber, IBM) Kafka Summit SF 2019
confluent
 
Kafka Reliability Guarantees ATL Kafka User Group
Kafka Reliability Guarantees ATL Kafka User GroupKafka Reliability Guarantees ATL Kafka User Group
Kafka Reliability Guarantees ATL Kafka User Group
Jeff Holoman
 
Reliability Guarantees for Apache Kafka
Reliability Guarantees for Apache KafkaReliability Guarantees for Apache Kafka
Reliability Guarantees for Apache Kafka
confluent
 
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache KafkaKafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafka
confluent
 
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...
confluent
 
Data Loss and Duplication in Kafka
Data Loss and Duplication in KafkaData Loss and Duplication in Kafka
Data Loss and Duplication in Kafka
Jayesh Thakrar
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Michael Noll
 
Kafka aws
Kafka awsKafka aws
Kafka aws
Ariel Moskovich
 
... No it's Apache Kafka!
... No it's Apache Kafka!... No it's Apache Kafka!
... No it's Apache Kafka!
makker_nl
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
Amir Sedighi
 
Consumer offset management in Kafka
Consumer offset management in KafkaConsumer offset management in Kafka
Consumer offset management in Kafka
Joel Koshy
 
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 PeopleKafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
confluent
 
Developing with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaDeveloping with the Go client for Apache Kafka
Developing with the Go client for Apache Kafka
Joe Stein
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Shiao-An Yuan
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Guozhang Wang
 
The Foundations of Multi-DC Kafka (Jakub Korab, Solutions Architect, Confluen...
The Foundations of Multi-DC Kafka (Jakub Korab, Solutions Architect, Confluen...The Foundations of Multi-DC Kafka (Jakub Korab, Solutions Architect, Confluen...
The Foundations of Multi-DC Kafka (Jakub Korab, Solutions Architect, Confluen...
confluent
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Jeff Holoman
 
Troubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolutionTroubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolution
Joel Koshy
 
Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaDecoupling Decisions with Apache Kafka
Decoupling Decisions with Apache Kafka
Grant Henke
 
Help, my Kafka is broken! (Emma Humber, IBM) Kafka Summit SF 2019
Help, my Kafka is broken! (Emma Humber, IBM) Kafka Summit SF 2019Help, my Kafka is broken! (Emma Humber, IBM) Kafka Summit SF 2019
Help, my Kafka is broken! (Emma Humber, IBM) Kafka Summit SF 2019
confluent
 
Kafka Reliability Guarantees ATL Kafka User Group
Kafka Reliability Guarantees ATL Kafka User GroupKafka Reliability Guarantees ATL Kafka User Group
Kafka Reliability Guarantees ATL Kafka User Group
Jeff Holoman
 
Reliability Guarantees for Apache Kafka
Reliability Guarantees for Apache KafkaReliability Guarantees for Apache Kafka
Reliability Guarantees for Apache Kafka
confluent
 
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache KafkaKafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafka
confluent
 
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...
confluent
 
Data Loss and Duplication in Kafka
Data Loss and Duplication in KafkaData Loss and Duplication in Kafka
Data Loss and Duplication in Kafka
Jayesh Thakrar
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Michael Noll
 
... No it's Apache Kafka!
... No it's Apache Kafka!... No it's Apache Kafka!
... No it's Apache Kafka!
makker_nl
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
Amir Sedighi
 

Viewers also liked (20)

101 ways to configure kafka - badly (Kafka Summit)
101 ways to configure kafka - badly (Kafka Summit)101 ways to configure kafka - badly (Kafka Summit)
101 ways to configure kafka - badly (Kafka Summit)
Henning Spjelkavik
 
Kafka 0.9, Things you should know
Kafka 0.9, Things you should knowKafka 0.9, Things you should know
Kafka 0.9, Things you should know
Ratish Ravindran
 
Embedded Mirror Maker
Embedded Mirror MakerEmbedded Mirror Maker
Embedded Mirror Maker
Simon Suo
 
Spoilt for Choice: How to Choose the Right Enterprise Service Bus (ESB)?
Spoilt for Choice: How to Choose the Right Enterprise Service Bus (ESB)?Spoilt for Choice: How to Choose the Right Enterprise Service Bus (ESB)?
Spoilt for Choice: How to Choose the Right Enterprise Service Bus (ESB)?
Kai Wähner
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Kafka, the "DialTone for Data": Building a self-service, scalable, streaming ...
Kafka, the "DialTone for Data": Building a self-service, scalable, streaming ...Kafka, the "DialTone for Data": Building a self-service, scalable, streaming ...
Kafka, the "DialTone for Data": Building a self-service, scalable, streaming ...
confluent
 
Multi tier, multi-tenant, multi-problem kafka
Multi tier, multi-tenant, multi-problem kafkaMulti tier, multi-tenant, multi-problem kafka
Multi tier, multi-tenant, multi-problem kafka
Todd Palino
 
Kinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-diveKinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-dive
Yifeng Jiang
 
Securing Kafka
Securing Kafka Securing Kafka
Securing Kafka
confluent
 
Reliable and Scalable Data Ingestion at Airbnb
Reliable and Scalable Data Ingestion at AirbnbReliable and Scalable Data Ingestion at Airbnb
Reliable and Scalable Data Ingestion at Airbnb
DataWorks Summit/Hadoop Summit
 
Kafka at Scale: Multi-Tier Architectures
Kafka at Scale: Multi-Tier ArchitecturesKafka at Scale: Multi-Tier Architectures
Kafka at Scale: Multi-Tier Architectures
Todd Palino
 
Introduction To Confluence
Introduction To ConfluenceIntroduction To Confluence
Introduction To Confluence
Hua Soon Sim
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
DataWorks Summit/Hadoop Summit
 
Building a Replicated Logging System with Apache Kafka
Building a Replicated Logging System with Apache KafkaBuilding a Replicated Logging System with Apache Kafka
Building a Replicated Logging System with Apache Kafka
Guozhang Wang
 
No data loss pipeline with apache kafka
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafka
Jiangjie Qin
 
Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Production
confluent
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
Allen (Xiaozhong) Wang
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
Jiangjie Qin
 
Handle Large Messages In Apache Kafka
Handle Large Messages In Apache KafkaHandle Large Messages In Apache Kafka
Handle Large Messages In Apache Kafka
Jiangjie Qin
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
Joe Stein
 
101 ways to configure kafka - badly (Kafka Summit)
101 ways to configure kafka - badly (Kafka Summit)101 ways to configure kafka - badly (Kafka Summit)
101 ways to configure kafka - badly (Kafka Summit)
Henning Spjelkavik
 
Kafka 0.9, Things you should know
Kafka 0.9, Things you should knowKafka 0.9, Things you should know
Kafka 0.9, Things you should know
Ratish Ravindran
 
Embedded Mirror Maker
Embedded Mirror MakerEmbedded Mirror Maker
Embedded Mirror Maker
Simon Suo
 
Spoilt for Choice: How to Choose the Right Enterprise Service Bus (ESB)?
Spoilt for Choice: How to Choose the Right Enterprise Service Bus (ESB)?Spoilt for Choice: How to Choose the Right Enterprise Service Bus (ESB)?
Spoilt for Choice: How to Choose the Right Enterprise Service Bus (ESB)?
Kai Wähner
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Kafka, the "DialTone for Data": Building a self-service, scalable, streaming ...
Kafka, the "DialTone for Data": Building a self-service, scalable, streaming ...Kafka, the "DialTone for Data": Building a self-service, scalable, streaming ...
Kafka, the "DialTone for Data": Building a self-service, scalable, streaming ...
confluent
 
Multi tier, multi-tenant, multi-problem kafka
Multi tier, multi-tenant, multi-problem kafkaMulti tier, multi-tenant, multi-problem kafka
Multi tier, multi-tenant, multi-problem kafka
Todd Palino
 
Kinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-diveKinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-dive
Yifeng Jiang
 
Securing Kafka
Securing Kafka Securing Kafka
Securing Kafka
confluent
 
Kafka at Scale: Multi-Tier Architectures
Kafka at Scale: Multi-Tier ArchitecturesKafka at Scale: Multi-Tier Architectures
Kafka at Scale: Multi-Tier Architectures
Todd Palino
 
Introduction To Confluence
Introduction To ConfluenceIntroduction To Confluence
Introduction To Confluence
Hua Soon Sim
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
DataWorks Summit/Hadoop Summit
 
Building a Replicated Logging System with Apache Kafka
Building a Replicated Logging System with Apache KafkaBuilding a Replicated Logging System with Apache Kafka
Building a Replicated Logging System with Apache Kafka
Guozhang Wang
 
No data loss pipeline with apache kafka
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafka
Jiangjie Qin
 
Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Production
confluent
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
Jiangjie Qin
 
Handle Large Messages In Apache Kafka
Handle Large Messages In Apache KafkaHandle Large Messages In Apache Kafka
Handle Large Messages In Apache Kafka
Jiangjie Qin
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
Joe Stein
 
Ad

Similar to 101 mistakes FINN.no has made with Kafka (Baksida meetup) (20)

Apache kafka
Apache kafkaApache kafka
Apache kafka
NexThoughts Technologies
 
Kafka in Action MEAP V12 Dylan D Scott Viktor Gamov Dave Klein
Kafka in Action MEAP V12 Dylan D Scott Viktor Gamov Dave KleinKafka in Action MEAP V12 Dylan D Scott Viktor Gamov Dave Klein
Kafka in Action MEAP V12 Dylan D Scott Viktor Gamov Dave Klein
biruktresehb
 
The Art of Message Queues - TEKX
The Art of Message Queues - TEKXThe Art of Message Queues - TEKX
The Art of Message Queues - TEKX
Mike Willbanks
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
Anton Nazaruk
 
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Shameera Rathnayaka
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
GeeksLab Odessa
 
Microservices in a Streaming World
Microservices in a Streaming WorldMicroservices in a Streaming World
Microservices in a Streaming World
Hans Jespersen
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
Ding Li
 
Evolving from Messaging to Event Streaming
Evolving from Messaging to Event StreamingEvolving from Messaging to Event Streaming
Evolving from Messaging to Event Streaming
confluent
 
The C10k Problem
The C10k ProblemThe C10k Problem
The C10k Problem
Subhadra Sundar Chakraborty
 
Removing performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configurationRemoving performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configuration
Knoldus Inc.
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
DataStax Academy
 
Confluent Messaging Modernization Forum
Confluent Messaging Modernization ForumConfluent Messaging Modernization Forum
Confluent Messaging Modernization Forum
confluent
 
Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019
Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019 Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019
Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019
confluent
 
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreAzure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
DataStax Academy
 
Kafka for Scale
Kafka for ScaleKafka for Scale
Kafka for Scale
Eyal Ben Ivri
 
Timothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for ML
Edunomica
 
Understanding Apache Kafka P99 Latency at Scale
Understanding Apache Kafka P99 Latency at ScaleUnderstanding Apache Kafka P99 Latency at Scale
Understanding Apache Kafka P99 Latency at Scale
ScyllaDB
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
Joe Stein
 
Event driven-arch
Event driven-archEvent driven-arch
Event driven-arch
Mohammed Shoaib
 
Kafka in Action MEAP V12 Dylan D Scott Viktor Gamov Dave Klein
Kafka in Action MEAP V12 Dylan D Scott Viktor Gamov Dave KleinKafka in Action MEAP V12 Dylan D Scott Viktor Gamov Dave Klein
Kafka in Action MEAP V12 Dylan D Scott Viktor Gamov Dave Klein
biruktresehb
 
The Art of Message Queues - TEKX
The Art of Message Queues - TEKXThe Art of Message Queues - TEKX
The Art of Message Queues - TEKX
Mike Willbanks
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
Anton Nazaruk
 
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Shameera Rathnayaka
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
GeeksLab Odessa
 
Microservices in a Streaming World
Microservices in a Streaming WorldMicroservices in a Streaming World
Microservices in a Streaming World
Hans Jespersen
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
Ding Li
 
Evolving from Messaging to Event Streaming
Evolving from Messaging to Event StreamingEvolving from Messaging to Event Streaming
Evolving from Messaging to Event Streaming
confluent
 
Removing performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configurationRemoving performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configuration
Knoldus Inc.
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
DataStax Academy
 
Confluent Messaging Modernization Forum
Confluent Messaging Modernization ForumConfluent Messaging Modernization Forum
Confluent Messaging Modernization Forum
confluent
 
Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019
Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019 Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019
Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019
confluent
 
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreAzure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
DataStax Academy
 
Timothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for ML
Edunomica
 
Understanding Apache Kafka P99 Latency at Scale
Understanding Apache Kafka P99 Latency at ScaleUnderstanding Apache Kafka P99 Latency at Scale
Understanding Apache Kafka P99 Latency at Scale
ScyllaDB
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
Joe Stein
 
Ad

More from Henning Spjelkavik (20)

Hles 2021 Digital transformation - How to use digital tools to improve our ev...
Hles 2021 Digital transformation - How to use digital tools to improve our ev...Hles 2021 Digital transformation - How to use digital tools to improve our ev...
Hles 2021 Digital transformation - How to use digital tools to improve our ev...
Henning Spjelkavik
 
Digital techlunsj hos FINN.no 2020-06-10
Digital techlunsj hos FINN.no 2020-06-10Digital techlunsj hos FINN.no 2020-06-10
Digital techlunsj hos FINN.no 2020-06-10
Henning Spjelkavik
 
10 years of microservices at finn.no - why is that dragon still here (ndc o...
10 years of microservices at finn.no  - why is that dragon still here  (ndc o...10 years of microservices at finn.no  - why is that dragon still here  (ndc o...
10 years of microservices at finn.no - why is that dragon still here (ndc o...
Henning Spjelkavik
 
How FINN became somewhat search engine friendly @ Oslo SEO meetup 2018
How FINN became somewhat search engine friendly @ Oslo SEO meetup 2018How FINN became somewhat search engine friendly @ Oslo SEO meetup 2018
How FINN became somewhat search engine friendly @ Oslo SEO meetup 2018
Henning Spjelkavik
 
An approach to it in a high level event - IOF HLES 2017
An  approach to it in a high level event - IOF HLES 2017An  approach to it in a high level event - IOF HLES 2017
An approach to it in a high level event - IOF HLES 2017
Henning Spjelkavik
 
Smidig 2016 - Er ledelse verdifullt likevel?
Smidig 2016 - Er ledelse verdifullt likevel?Smidig 2016 - Er ledelse verdifullt likevel?
Smidig 2016 - Er ledelse verdifullt likevel?
Henning Spjelkavik
 
Geomatikkdagene 2016 - Kart på FINN.no
Geomatikkdagene 2016 - Kart på FINN.noGeomatikkdagene 2016 - Kart på FINN.no
Geomatikkdagene 2016 - Kart på FINN.no
Henning Spjelkavik
 
IT for Event Directors
IT for Event DirectorsIT for Event Directors
IT for Event Directors
Henning Spjelkavik
 
Hvorfor vi bør brenne gammel management litteratur
Hvorfor vi bør brenne gammel management litteraturHvorfor vi bør brenne gammel management litteratur
Hvorfor vi bør brenne gammel management litteratur
Henning Spjelkavik
 
How we sleep well at night using Hystrix at Finn.no
How we sleep well at night using Hystrix at Finn.noHow we sleep well at night using Hystrix at Finn.no
How we sleep well at night using Hystrix at Finn.no
Henning Spjelkavik
 
HLES 2015 It in a high level event
HLES 2015 It in a high level eventHLES 2015 It in a high level event
HLES 2015 It in a high level event
Henning Spjelkavik
 
Strategisk design med "Impact Mapping"
Strategisk design med "Impact Mapping"Strategisk design med "Impact Mapping"
Strategisk design med "Impact Mapping"
Henning Spjelkavik
 
Smidig 2014 - Impact Mapping - Levér det som teller
Smidig 2014 - Impact Mapping - Levér det som tellerSmidig 2014 - Impact Mapping - Levér det som teller
Smidig 2014 - Impact Mapping - Levér det som teller
Henning Spjelkavik
 
Kart på FINN.no - Fra CGI til slippy map
Kart på FINN.no - Fra CGI til slippy mapKart på FINN.no - Fra CGI til slippy map
Kart på FINN.no - Fra CGI til slippy map
Henning Spjelkavik
 
Arena and TV-production - at IOF Open Technical Meeting in Lavarone 2014
Arena and TV-production - at IOF Open Technical Meeting in Lavarone 2014Arena and TV-production - at IOF Open Technical Meeting in Lavarone 2014
Arena and TV-production - at IOF Open Technical Meeting in Lavarone 2014
Henning Spjelkavik
 
Misbruk av målstyring
Misbruk av målstyringMisbruk av målstyring
Misbruk av målstyring
Henning Spjelkavik
 
Jz2010 Hvordan enkel analyse kan øke stabiliteten og hastigheten
Jz2010 Hvordan enkel analyse kan øke stabiliteten og hastighetenJz2010 Hvordan enkel analyse kan øke stabiliteten og hastigheten
Jz2010 Hvordan enkel analyse kan øke stabiliteten og hastigheten
Henning Spjelkavik
 
Fornebuløpet - Treningsprogram
Fornebuløpet - TreningsprogramFornebuløpet - Treningsprogram
Fornebuløpet - Treningsprogram
Henning Spjelkavik
 
Verdistrømanalyse Smidig 2009
Verdistrømanalyse   Smidig 2009Verdistrømanalyse   Smidig 2009
Verdistrømanalyse Smidig 2009
Henning Spjelkavik
 
Hles 2021 Digital transformation - How to use digital tools to improve our ev...
Hles 2021 Digital transformation - How to use digital tools to improve our ev...Hles 2021 Digital transformation - How to use digital tools to improve our ev...
Hles 2021 Digital transformation - How to use digital tools to improve our ev...
Henning Spjelkavik
 
Digital techlunsj hos FINN.no 2020-06-10
Digital techlunsj hos FINN.no 2020-06-10Digital techlunsj hos FINN.no 2020-06-10
Digital techlunsj hos FINN.no 2020-06-10
Henning Spjelkavik
 
10 years of microservices at finn.no - why is that dragon still here (ndc o...
10 years of microservices at finn.no  - why is that dragon still here  (ndc o...10 years of microservices at finn.no  - why is that dragon still here  (ndc o...
10 years of microservices at finn.no - why is that dragon still here (ndc o...
Henning Spjelkavik
 
How FINN became somewhat search engine friendly @ Oslo SEO meetup 2018
How FINN became somewhat search engine friendly @ Oslo SEO meetup 2018How FINN became somewhat search engine friendly @ Oslo SEO meetup 2018
How FINN became somewhat search engine friendly @ Oslo SEO meetup 2018
Henning Spjelkavik
 
An approach to it in a high level event - IOF HLES 2017
An  approach to it in a high level event - IOF HLES 2017An  approach to it in a high level event - IOF HLES 2017
An approach to it in a high level event - IOF HLES 2017
Henning Spjelkavik
 
Smidig 2016 - Er ledelse verdifullt likevel?
Smidig 2016 - Er ledelse verdifullt likevel?Smidig 2016 - Er ledelse verdifullt likevel?
Smidig 2016 - Er ledelse verdifullt likevel?
Henning Spjelkavik
 
Geomatikkdagene 2016 - Kart på FINN.no
Geomatikkdagene 2016 - Kart på FINN.noGeomatikkdagene 2016 - Kart på FINN.no
Geomatikkdagene 2016 - Kart på FINN.no
Henning Spjelkavik
 
Hvorfor vi bør brenne gammel management litteratur
Hvorfor vi bør brenne gammel management litteraturHvorfor vi bør brenne gammel management litteratur
Hvorfor vi bør brenne gammel management litteratur
Henning Spjelkavik
 
How we sleep well at night using Hystrix at Finn.no
How we sleep well at night using Hystrix at Finn.noHow we sleep well at night using Hystrix at Finn.no
How we sleep well at night using Hystrix at Finn.no
Henning Spjelkavik
 
HLES 2015 It in a high level event
HLES 2015 It in a high level eventHLES 2015 It in a high level event
HLES 2015 It in a high level event
Henning Spjelkavik
 
Strategisk design med "Impact Mapping"
Strategisk design med "Impact Mapping"Strategisk design med "Impact Mapping"
Strategisk design med "Impact Mapping"
Henning Spjelkavik
 
Smidig 2014 - Impact Mapping - Levér det som teller
Smidig 2014 - Impact Mapping - Levér det som tellerSmidig 2014 - Impact Mapping - Levér det som teller
Smidig 2014 - Impact Mapping - Levér det som teller
Henning Spjelkavik
 
Kart på FINN.no - Fra CGI til slippy map
Kart på FINN.no - Fra CGI til slippy mapKart på FINN.no - Fra CGI til slippy map
Kart på FINN.no - Fra CGI til slippy map
Henning Spjelkavik
 
Arena and TV-production - at IOF Open Technical Meeting in Lavarone 2014
Arena and TV-production - at IOF Open Technical Meeting in Lavarone 2014Arena and TV-production - at IOF Open Technical Meeting in Lavarone 2014
Arena and TV-production - at IOF Open Technical Meeting in Lavarone 2014
Henning Spjelkavik
 
Jz2010 Hvordan enkel analyse kan øke stabiliteten og hastigheten
Jz2010 Hvordan enkel analyse kan øke stabiliteten og hastighetenJz2010 Hvordan enkel analyse kan øke stabiliteten og hastigheten
Jz2010 Hvordan enkel analyse kan øke stabiliteten og hastigheten
Henning Spjelkavik
 
Fornebuløpet - Treningsprogram
Fornebuløpet - TreningsprogramFornebuløpet - Treningsprogram
Fornebuløpet - Treningsprogram
Henning Spjelkavik
 
Verdistrømanalyse Smidig 2009
Verdistrømanalyse   Smidig 2009Verdistrømanalyse   Smidig 2009
Verdistrømanalyse Smidig 2009
Henning Spjelkavik
 

Recently uploaded (20)

Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 

101 mistakes FINN.no has made with Kafka (Baksida meetup)

  • 1. 101* mistakes FINN.no has made with Kafka Audun Fauchald Strand @audunstrand bio: gof, mq, ejb, mda, wli, eda, soa, esb, ddd Henning Spjelkavik @spjelkavik bio: ejb, aq, eda, soa, gis, ddd, iof, aws, sql
  • 2. agenda introduction to kafka kafka @ finn.no 101* mistakes questions “From a certain point onward there is no longer any turning back. That is the point that must be reached.” ― Franz Kafka, The Trial
  • 3. FINN.no 2nd largest website in norway 60 millions pageviews pr day 80 microservices 130 developers 900 deploys pr. week 6 minutes from commit to deploy (median)
  • 6. why use kafka #notAnESB what is a log terminology components giant leap “A First Sign of the Beginning of Understanding is the Wish to Die.” ― Franz Kafka https://commons.wikimedia.org/wiki/File:Kafka.jpg
  • 7. Why use Kafka? “Apache Kafka is publish-subscribe messaging rethought as a distributed commit log.” ● Fast ● Scalable ● Durable ● Distributed by design Sweet spot: High volume, low latency Quora: “Use Kafka if you have a fire hose of events (100k+/sec) you need delivered in partitioned order 'at least once' with a mix of online and batch consumers, you want to be able to re-read messages” “Use Rabbit if you have messages (20k+/sec) that need to be routed in complex ways to consumers, you want per- message delivery guarantees, you don't care about ordered delivery”
  • 8. #NotAnESB “Based on conversations with the project sponsors I began to suspect that at least the introduction of the ESB was a case of RDD, ie. Resume-Driven Development, development in which key choices are made with only one question in mind: how good does it look on my CV? Talking to the developers I learned that the ESB had introduced “nothing but pain.” Was this really another case of architect’s dream, developer’s nightmare?” 1. Are you integrating 3 or more applications/services? If you only need to communicate between 2 applications, using point-to-point integration is going to be easier. 2. Do you need to use more than one type of communication protocol? If you are just using HTTP/Web Services or just JMS, you’re not going to get any of the benefits if cross protocol messaging and transformation that Mule provides. 3. Do you need message routing capabilities such as forking and aggregating message flows, or content- based routing? Many applications do not need these capabilities Mule ESB
  • 9. What is a log? A log is perhaps the simplest possible storage abstraction. It is an append-only, totally-ordered sequence of records ordered by time. Appended to the end of the log, reads proceed left-to-right. Each entry is assigned a unique sequential log entry number. The ordering of records defines a notion of "time" since entries to the left are defined to be older then entries to the right. This is a data log, not an application log (i.e not log4j) The two problems a log solves—ordering changes and distributing data—are even more important in distributed data systems.
  • 10. Changelog 101: Tables and Events are Dual Duality: a log of changes and a table. Accounting log: credit and debit (events pr key) table: all current balances (i.e state pr key) In a sense the log is the more fundamental data structure: in addition to creating the original table you can also transform it to create all kinds of derived tables.
  • 11. producers writes to brokers consumers reads from brokers everything is distributed data is stored in topics topics are split into partitions which are replicated kafka cluster consumer producerproducer producer producer consumer consumer consumer consumer consumer producer producer terminology
  • 13. 1:data 2:... 3:... 4:... 5:... 6:... 7:... old messages newer messages ... consumer.group.1 consumer.group.2 producer a1
  • 14. Giant leap? In fact, persistent replicated messaging is such a giant leap in messaging architecture that it may be worthwhile to point out a few side effects: a. Per-message acknowledgments have disappeared b. ordered delivery c. The problem of mismatched consumer speed has disappeared. A slow consumer can peacefully co-exist with a fast consumer now d. Need for difficult messaging semantics like delayed delivery, re-delivery etc. has disappeared. Now it is all up to the consumer to read whatever message whenever - onus has shifted from broker to consumer e. The holy grail of message delivery guarantee: at-least-once is the new reality - both Kafka and Azure Event Hub provides this guarantee. You still have to make your consumers and downstream systems idempotent so that recovering from a failure and processing the same message twice does not upset it too much, but hey - that has always been the case http://blogs.msdn.com/b/opensourcemsft/archive/2015/08/08/choose-between-azure-event-hub-and-kafka-_2d00_-what- you-need-to-know.aspx
  • 18. timeline 2012 Decided to use RabbitMQ as message queue. Kafka was installed for a test 2013 Feb - Kafka PoC (“strømmen”) Ad matching (“lagret søk”) Ad indexed 2014 Product lifecycle - product paid, etc 2015 Feb -> May - 0.8.2. Dedicated cluster
  • 19. dcx dcy architecture broker05 zk kafka broker01 zk kafka broker03 zk kafka broker04 zk kafka broker02 zk kafka messages as thrift map - single IDL common java library + schema + produce messages + consume messages node.js ruby python
  • 20. tooling Grafana dashboard visualizing jmx stats kafka-manager kafka-cat kafka offset monitor alerts with sensu
  • 21. tooling Grafana dashboard visualizing jmx stats kafka-manager kafka-cat kafka offset monitor alerts with sensu
  • 22. uses counting + pageviews + clicks communication between services + domain events (eg. new ad) + notifications + build pipeline data replication + log compaction tooling + slack notifications + (zipkin) cache invalidation + on update/delete (updated profile)
  • 23. 101* mistakes “God gives the nuts, but he does not crack them.” ― Franz Kafka
  • 24. * 10 1. pre 1.0 2. inside/outside 3. schemas 4. kafka is a database 5. client side rebalance 6. mixed load/one configuration 7. autocreate of topics 8. 8 zk nodes 9. unclean leader election 10. 128 partitions
  • 25. why is it a mistake what is the consequence what is the correct solution what has finn.no done
  • 26. mistake? started using kafka pre 1.0 release
  • 27. why is it a mistake 0.7 -> 0.8: not backwards compatible 0.7 client does not work well with 0.8 cluster 0.8 -> 0.9: not backwards compatible 0.8 consumer does not work with 0.9 cluster 0.9 - 1.0: ???
  • 28. what is the consequence kafka is a critical component for communication between applications coordination of 10-15 teams with 30 services migration process of 6-8 months from decision to old cluster turned off
  • 29. what is the correct solution evaluate the maturity of critical architecture components before everyone starts using it
  • 30. what has finn.no done 0.7 -> 0.8 1) create additional 0.8 cluster 2) all clients consume from both clusters (0.7 and 0.8) 3) critical services (payment) migrates consumers and producers during nighttime with downtime 4) rest of services migrates it producers to 0.8 (the last mile takes a long time) 5) stop consuming from 0.7 6) turn off 0.7 cluster
  • 31. what has finn.no done 0.7 -0.8 1) create additional 0.8 cluster 2) all clients consume from both clusters (0.7 and 0.8) 3) critical services migrates consumers and producers during nighttime with downtime 4) rest of services migrates it producers to 0.8 (the last mile takes a long time) 5) stop consuming from 0.7 6) turn off 0.7 cluster 7) read blogpost stating that 0.9 is not backwards compatible with 0.8
  • 32. mistake: not considering the coupling you get when everyone can see all your data https://flic.kr/p/6MjhUR
  • 33. why is it a mistake everything that is published on Kafka is visible to any client that can access
  • 34. what is the consequence direct reads across services/domains is quite normal in legacy and/or enterprise systems this coupling makes it hard to make changes Kafka has no security pr topic - you must add that yourself
  • 35. what is the correct solution Data on the inside versus data on the outside At least decide on a convention for what is private data and what is public data
  • 37. why is it a mistake schema change differently from the code producing and consuming messages data needs versioning defining schema in a java library makes it more difficult to access data from non-jvm languages code repository with java is not the easiest way to figure out the data on a topic
  • 38. what is the consequence development speed outside jvm has been slow change of data needs coordinated deployment difficult to create tooling that needs to know data format, like data lake
  • 39. what is the correct solution confluent.io platform has a separate schema registry rest interface apache avro multiple compatibility settings and evolutions strategies connect
  • 40. what has finn.no done still using java library confluent platform 2.0 is planned for the next step, not kafka 0.9
  • 41. mistake: Kafka is like a database - treat it like one https://flic.kr/p/2xQ9VT
  • 42. why is it a mistake We used our normal Ops scripts for kafka - if a config changes, restart automatically If shutdown does not work within 5 seconds, kill -9 A database needs to finish what it is doing, before shutting down A distributed database even more so
  • 43. what is the consequence At least need for recovery at startup Data loss No convergation - you won’t come back up
  • 44. what is the correct solution Do not play with stored data - understand how and when to apply changes
  • 45. what has finn.no done stop using the kill -9 script
  • 47. why is it a mistake kafka has a clear algorithm for handling increase or decrease in clients to be able to keep everything balanced. all consumers are reconnected This algorithm creates a lot of noise in logs when you deploy all the time common java-library had 4 consumer-threads as default pr application
  • 48. what is the consequence developers did not understand what happened during a deploy “kafka is unstable” most service-instances did not receive messages each deploy of a service (typically 4 instances) triggered 4 rebalances. if rebalance takes to long, the (at least our) consumer dies. “kafka is down”
  • 49. what is the correct solution 1. For each topic T that Ci subscribes to 2. let PT be all partitions producing topic T 3. let CG be all consumers in the same group as Ci that consume topic T 4. sort PT (so partitions on the same broker are clustered together) 5. sort CG 6. let i be the index position of Ci in CG and let N = size(PT )/size(CG ) 7. assign partitions from i*N to (i+1)*N - 1 to consumer Ci 8. remove current entries owned by Ci from the partition owner registry 9. add newly assigned partitions to the partition owner registry (we may need to re-try this until the original partition owner releases its ownership) all consumers in a group rebalances when a consumer arrives or departs from the group
  • 50. what has finn.no done some consumers use 1 thread pr instance planning to rewrite consumer library read kafka documentation
  • 51. mistake: running mixed load with a single configuration https://flic.kr/p/qbarDR
  • 52. why is it a mistake Historically - One Big Database with Expensive License => One, Single Server Database world - OLTP and OLAP Changes with Open Source software and Cloud Tried to simplify the developer's day with a single config Kafka supports very high throughput and highly reliable
  • 53. what is the consequence Trade off between throughput and degree of reliability With a single configuration - the last commit wins (remember the 128 partitions?) Either high throughput, and risk of loss - or potentially too slow
  • 54. what is the correct solution understand your use cases and their needs!
  • 55. Defaults that are quite reliable Exposing configuration variables in the client Ask the questions; ● at least once delivery ● ordering - if you partition, what must have strict ordering ● 99% delivery is that enough? ● which level of throughput is needed what has finn.no done
  • 56. Configuration Configuration for production ● Partitions ● Replicas (default.replication.factor) ● Minimum ISR (min.insync.replicas) ● Wait for acknowledge when producing messages (request.required.acks, block.on.buffer.full) ● Retries ● Leader election Configuration for consumer ● Number of threads ● When to commit (autocommit.enable vs consumer.commitOffsets)
  • 57. Gwen Shapira recommends... ● akcs = all ● block.on.buffer.full = true ● retries = MAX_INT ● max.inflight.requests.per.connect = 1 ● Producer.close() ● replication-factor >= 3 ● min.insync.replicas = 2 ● unclean.leader.election = false ● auto,offset.commit = false ● commit after processing ● monitor!
  • 59. why is it a mistake topics are created every time someone tries to consume from or produce to a topics name
  • 60. what is the consequence topic names from production: we are not able to control the number of topics too many topics gives too many partitions. ZooKeeper gets slow when handling this. no place to put topic condig Event.USER.blabla, testing42, testing2, Event.GO_CLICK.asdf4133, Event.GO_CLICK.asdf7392, Event.GO_CLICK.asdf7532,
  • 61. what is the correct solution small number of partitions as default increase number of partitons for selected topics
  • 62. what has finn.no done 5 partitions as default 2 topics have more than 5 partitons topics with lots of traffic
  • 63. mistake: deploy a proof of concept hack - in production ; i.e why we had 8 zk nodes https://flic.kr/p/6eoSgT
  • 64. why is it a mistake Kafka was set up by Ops for a test - not for hardened production use By coincidence we had 8 nodes for kafka, the same 8 nodes for zookeeper Zookeeper is dependent on a majority quorum, low latency between nodes
  • 65. what is the consequence Zookeeper recommends 3 nodes for normal usage, 5 for high, and any more is questionable More nodes leads to longer time for finding consensus, more communication If we get a split between data centers, there will be 4 in each You should not run Zk between data centers, due to latency and outage possibilities
  • 66. what is the correct solution Have an odd number of Zookeeper nodes - preferrably 3, at most 5 Don’t cross data centers Check the documentation before deploying serious production load Don’t run a sensitive service (Zookeeper) on a server with 50 services, 300% over committed on RAM
  • 67. Not treating Kafka as a database. kill -9
  • 68. mistake: unclean.leader.election = true for reliable messaging http://media-2.web.britannica.com/eb-media/40/126140-050-523CFDE4.jpg
  • 69. why is it a mistake in certain conditions unclean.leader.elections=true can lose messages replication.factor = 3 in.sync.replicas = 1 100 101 replica1 100 101 replica2 100 replica3
  • 70. why is it a mistake in certain conditions unclean.leader.elections=true can lose messages replication.factor = 3 in.sync.replicas = 1 replica3 dies 100 101 replica1 100 101 replica2 100 replica3 leader
  • 71. why is it a mistake in certain conditions unclean.leader.elections=true can lose messages replication.factor = 3 in.sync.replicas = 1 replica2 dies 100 101 102 103 104 replica1 100 101 replica2 100 replica3
  • 72. why is it a mistake in certain conditions unclean.leader.elections=true can lose messages replication.factor = 3 in.sync.replicas = 1 replica1 dies 100 101 102 103 104 replica1 100 101 replica2 100 replica3
  • 73. why is it a mistake in certain conditions unclean.leader.elections=true can lose messages replication.factor = 3 in.sync.replicas = 1 which replicas comes online first 100 101 102 103 104 replica1 100 replica3 100 101 replica2
  • 74. what is the consequence messages might be lost forever without errors in the client https://upload.wikimedia.org/wikipedia/commons/d/d4/George-W-Bush.jpeg
  • 75. what is the correct solution replication.factor = 3 in.sync.replicas = 2 unclean.leader.election=false (unless you are worried about what happens when replica1 (leader) is dead for a long time)
  • 76. what has finn.no done replication.factor = 3 in.sync.replicas = 1 (2 for selected topics) unclean.leader.election=true
  • 77. mistake: default configuration of 128 partitions for each topic
  • 78. why is it a mistake partitions are kafkas way of scaling consumers, 128 partitions can handle 128 consumers processes 0.8 clusters could not reduce the number of partitions without deleting data highest number of consumers today is 20
  • 79. what is the consequence 0.8 cluster was configured with 128 partitions as default, for all topics. many partitions and many topics creates many datapoints that must be coordinated zookeeper must coordinate all this rebalance must balance all clients on all partitions zookeeper and kafka went down (may 2015) (500 topics * 128 partitions)
  • 80. what is the correct solution small number of partitions as default increase number of partitions for selected topics understand your use case reduce length of transactions on consumer side
  • 81. what has finn.no done 5 partitions as default 2 topics have more than 5 partitions topics with lots of traffic
  • 83. Should you do this at home?
  • 85. “They say ignorance is bliss.... they're wrong ” ― Franz Kafka ?? ?
  • 86. “It's only because of their stupidity that they're able to be so sure of themselves.” ― Franz Kafka, The Trial Audun Fauchald Strand @audunstrand Henning Spjelkavik @spjelkavik http://www.finn.no/apply-here