101 mistakes FINN.no has made with Kafka (Baksida meetup)

101* mistakes FINN.no
has made with Kafka
Audun Fauchald Strand
@audunstrand
bio: gof, mq, ejb,
mda, wli, eda, soa,
esb, ddd
Henning Spjelkavik
@spjelkavik
bio: ejb, aq, eda,
soa, gis, ddd, iof,
aws, sql

agenda
introduction to kafka
kafka @ finn.no
101* mistakes
questions
“From a certain point onward
there is no longer any turning
back. That is the point that
must be reached.”
― Franz Kafka, The Trial

FINN.no
2nd largest website in norway
60 millions pageviews pr day
80 microservices
130 developers
900 deploys pr. week
6 minutes from commit to deploy
(median)

why use kafka
#notAnESB
what is a log
terminology
components
giant leap
“A First Sign of the Beginning of
Understanding is the Wish to Die.”
― Franz Kafka
https://commons.wikimedia.org/wiki/File:Kafka.jpg

Why use Kafka?
“Apache Kafka is publish-subscribe messaging
rethought as a distributed commit log.”
● Fast
● Scalable
● Durable
● Distributed by design
Sweet spot: High volume, low latency
Quora:
“Use Kafka if you have a fire hose of events (100k+/sec)
you need delivered in partitioned order 'at least once' with
a mix of online and batch consumers, you want to be able
to re-read messages”
“Use Rabbit if you have messages (20k+/sec) that need to
be routed in complex ways to consumers, you want per-
message delivery guarantees, you don't care about ordered
delivery”

#NotAnESB
“Based on conversations with the project
sponsors I began to suspect that at least the
introduction of the ESB was a case of RDD, ie.
Resume-Driven Development, development in
which key choices are made with only one
question in mind: how good does it look on my
CV?
Talking to the developers I learned that the ESB
had introduced “nothing but pain.”
Was this really another case of architect’s dream,
developer’s nightmare?”
1. Are you integrating 3 or more applications/services? If
you only need to communicate between 2 applications,
using point-to-point integration is going to be easier.
2. Do you need to use more than one type of
communication protocol? If you are just using
HTTP/Web Services or just JMS, you’re not going to get
any of the benefits if cross protocol messaging and
transformation that Mule provides.
3. Do you need message routing capabilities such as
forking and aggregating message flows, or content-
based routing? Many applications do not need these
capabilities
Mule ESB

What is a log?
A log is perhaps the simplest possible storage abstraction.
It is an append-only, totally-ordered sequence of records ordered by time.
Appended to the end of the log, reads proceed left-to-right.
Each entry is assigned a unique sequential log entry number.
The ordering of records defines a notion of "time" since entries to the left are
defined to be older then entries to the right.
This is a data log, not an application log (i.e not log4j)
The two problems a log solves—ordering changes and distributing data—are even
more important in distributed data systems.

Changelog 101: Tables and Events are Dual
Duality: a log of changes and a table.
Accounting
log: credit and debit (events pr key)
table: all current balances (i.e state pr key)
In a sense the log is the more fundamental data structure: in addition to creating the
original table you can also transform it to create all kinds of derived tables.

producers writes to brokers
consumers reads from brokers
everything is distributed
data is stored in topics
topics are split into partitions
which are replicated
kafka cluster
consumer
producerproducer
producer producer
consumer
consumer
consumer
consumer
consumer
producer
producer
terminology

producer
ad.new
consumer
group.id
ad.new
broker1
broker2
broker3
P0
R1
P0
R2
P1
R3
P2
R1
P1
R2
P2
R3
zookeeper
components

1:data 2:... 3:... 4:... 5:... 6:... 7:...
old messages newer messages
...
consumer.group.1
consumer.group.2
producer a1

Giant leap?
In fact, persistent replicated messaging is such a giant leap in messaging architecture that it may be worthwhile to point out a few side
effects:
a. Per-message acknowledgments have disappeared
b. ordered delivery
c. The problem of mismatched consumer speed has disappeared. A slow consumer can peacefully co-exist with a fast
consumer now
d. Need for difficult messaging semantics like delayed delivery, re-delivery etc. has disappeared. Now it is all up to the
consumer to read whatever message whenever - onus has shifted from broker to consumer
e. The holy grail of message delivery guarantee: at-least-once is the new reality - both Kafka and Azure Event Hub
provides this guarantee. You still have to make your consumers and downstream systems idempotent so that recovering
from a failure and processing the same message twice does not upset it too much, but hey - that has always been the
case
http://blogs.msdn.com/b/opensourcemsft/archive/2015/08/08/choose-between-azure-event-hub-and-kafka-_2d00_-what-
you-need-to-know.aspx

kafka @finn.
no
timeline
architecture
tools
use cases

timeline
2012 Decided to use RabbitMQ as message queue. Kafka was installed for a test
2013 Feb - Kafka PoC (“strømmen”)
Ad matching (“lagret søk”)
Ad indexed
2014
Product lifecycle - product paid, etc
2015 Feb -> May - 0.8.2. Dedicated cluster

dcx
dcy
architecture
broker05
zk
kafka
broker01
zk
kafka
broker03
zk
kafka
broker04
zk
kafka
broker02
zk
kafka
messages as thrift map - single
IDL
common java library
+ schema
+ produce messages
+ consume messages
node.js
ruby
python

tooling
Grafana dashboard visualizing jmx stats
kafka-manager
kafka-cat
kafka offset monitor
alerts with sensu

uses
counting
+ pageviews
+ clicks
communication between services
+ domain events (eg. new ad)
+ notifications
+ build pipeline
data replication
+ log compaction
tooling
+ slack notifications
+ (zipkin)
cache invalidation
+ on update/delete (updated profile)

101* mistakes
“God gives the
nuts, but he
does not crack
them.”
― Franz Kafka

* 10
1. pre 1.0
2. inside/outside
3. schemas
4. kafka is a database
5. client side rebalance
6. mixed load/one configuration
7. autocreate of topics
8. 8 zk nodes
9. unclean leader election
10. 128 partitions

why is it a mistake
what is the consequence
what is the correct solution
what has finn.no done

mistake?
started using kafka pre 1.0 release

why is it a mistake
0.7 -> 0.8: not backwards compatible
0.7 client does not work well with 0.8 cluster
0.8 -> 0.9: not backwards compatible
0.8 consumer does not work with 0.9 cluster
0.9 - 1.0: ???

kafka is a critical component for communication between applications
coordination of 10-15 teams with 30 services
migration process of 6-8 months
from decision to old cluster turned off

evaluate the maturity of
critical architecture
components before
everyone starts using it

0.7 -> 0.8
1) create additional 0.8 cluster
2) all clients consume from both clusters (0.7 and 0.8)
3) critical services (payment) migrates consumers and producers during nighttime
with downtime
4) rest of services migrates it producers to 0.8 (the last mile takes a long time)
5) stop consuming from 0.7
6) turn off 0.7 cluster

0.7 -0.8
1) create additional 0.8 cluster
2) all clients consume from both clusters (0.7 and 0.8)
3) critical services migrates consumers and producers during nighttime with
downtime
4) rest of services migrates it producers to 0.8 (the last mile takes a long time)
5) stop consuming from 0.7
6) turn off 0.7 cluster
7) read blogpost stating that 0.9 is not backwards compatible with 0.8

mistake:
not considering the coupling
you get when everyone can
see all your data
https://flic.kr/p/6MjhUR

why is it a mistake
everything that is published on Kafka is visible to any client that can access

direct reads across services/domains is quite normal in legacy and/or enterprise
systems
this coupling makes it hard to make changes
Kafka has no security pr topic - you must add that yourself

Data on the inside versus data on the outside
At least decide on a convention for what is private data and what is public data

mistake:
not properly defined schema

why is it a mistake
schema change differently from the code producing and consuming messages
data needs versioning
defining schema in a java library makes it more difficult to access data from non-jvm
languages
code repository with java is not the easiest way to figure out the data on a topic

development speed outside jvm has been slow
change of data needs coordinated deployment
difficult to create tooling that needs to know data format, like data lake

confluent.io platform has a separate schema registry
rest interface
apache avro
multiple compatibility settings and evolutions strategies
connect

still using java library
confluent platform 2.0 is planned for the next step, not kafka 0.9

mistake:
Kafka is like a database -
treat it like one
https://flic.kr/p/2xQ9VT

why is it a mistake
We used our normal Ops scripts for kafka - if a config changes, restart automatically
If shutdown does not work within 5 seconds, kill -9
A database needs to finish what it is doing, before shutting down
A distributed database even more so

At least need for recovery at startup
Data loss
No convergation - you won’t come back up

Do not play with stored data - understand how and when to apply changes

stop using the kill -9 script

mistake:
not properly understood
client side rebalance

why is it a mistake
kafka has a clear algorithm for handling increase or decrease in clients to be able
to keep everything balanced.
all consumers are reconnected
This algorithm creates a lot of noise in logs when you deploy all the time
common java-library had 4 consumer-threads as default pr application

developers did not understand what happened
during a deploy
“kafka is unstable”
most service-instances did not receive messages
each deploy of a service (typically 4 instances)
triggered 4 rebalances.
if rebalance takes to long, the (at least our)
consumer dies.
“kafka is down”

1. For each topic T that Ci
subscribes to
2. let PT
be all partitions producing topic T
3. let CG
be all consumers in the same group as Ci
that consume topic T
4. sort PT
(so partitions on the same broker are clustered together)
5. sort CG
6. let i be the index position of Ci
in CG
and let N = size(PT
)/size(CG
)
7. assign partitions from i*N to (i+1)*N - 1 to consumer Ci
8. remove current entries owned by Ci
from the partition owner registry
9. add newly assigned partitions to the partition owner registry
(we may need to re-try this until the original partition owner releases its
ownership)
all consumers in a group rebalances when a consumer arrives or departs from the
group

some consumers use 1 thread pr instance
planning to rewrite consumer library
read kafka documentation

mistake:
running mixed load with a
single configuration
https://flic.kr/p/qbarDR

why is it a mistake
Historically - One Big Database with Expensive License => One, Single Server
Database world - OLTP and OLAP
Changes with Open Source software and Cloud
Tried to simplify the developer's day with a single config
Kafka supports very high throughput and highly reliable

Trade off between throughput and degree of reliability
With a single configuration - the last commit wins (remember the 128 partitions?)
Either high throughput, and risk of loss - or potentially too slow

understand your use cases and their needs!

Defaults that are quite reliable
Exposing configuration variables in the client
Ask the questions;
● at least once delivery
● ordering - if you partition, what must have strict ordering
● 99% delivery is that enough?
● which level of throughput is needed

Configuration
Configuration for production
● Partitions
● Replicas (default.replication.factor)
● Minimum ISR (min.insync.replicas)
● Wait for acknowledge when producing messages (request.required.acks, block.on.buffer.full)
● Retries
● Leader election
Configuration for consumer
● Number of threads
● When to commit (autocommit.enable vs consumer.commitOffsets)

Gwen Shapira recommends...
● akcs = all
● block.on.buffer.full = true
● retries = MAX_INT
● max.inflight.requests.per.connect = 1
● Producer.close()
● replication-factor >= 3
● min.insync.replicas = 2
● unclean.leader.election = false
● auto,offset.commit = false
● commit after processing
● monitor!

mistake:
topics are autocreated

why is it a mistake
topics are created every time someone tries to consume from or produce to a
topics name

topic names from production:
we are not able to control the number of topics
too many topics gives too many partitions. ZooKeeper gets slow when handling
this.
no place to put topic condig
Event.USER.blabla, testing42, testing2,
Event.GO_CLICK.asdf4133, Event.GO_CLICK.asdf7392, Event.GO_CLICK.asdf7532,

small number of partitions as default
increase number of partitons for selected topics

5 partitions as default
2 topics have more than 5 partitons
topics with lots of traffic

mistake:
deploy a proof of concept
hack - in production ; i.e
why we had 8 zk nodes
https://flic.kr/p/6eoSgT

why is it a mistake
Kafka was set up by Ops for a test - not for hardened production use
By coincidence we had 8 nodes for kafka, the same 8 nodes for zookeeper
Zookeeper is dependent on a majority quorum, low latency between nodes

Zookeeper recommends 3 nodes for normal usage, 5 for high, and any more is
questionable
More nodes leads to longer time for finding consensus, more communication
If we get a split between data centers, there will be 4 in each
You should not run Zk between data centers, due to latency and outage
possibilities

Have an odd number of Zookeeper nodes - preferrably 3, at most 5
Don’t cross data centers
Check the documentation before deploying serious production load
Don’t run a sensitive service (Zookeeper) on a server with 50 services, 300% over
committed on RAM

Not treating Kafka as a database. kill -9

mistake:
unclean.leader.election = true
for reliable messaging
http://media-2.web.britannica.com/eb-media/40/126140-050-523CFDE4.jpg

why is it a mistake
in certain conditions unclean.leader.elections=true can lose messages
replication.factor = 3
in.sync.replicas = 1 100
101
replica1
100
101
replica2
100
replica3

why is it a mistake
in.sync.replicas = 1
replica3 dies
100
101
replica1
100
101
replica2
100
replica3
leader

why is it a mistake
replica2 dies
100
101
102
103
104
replica1
100
101
replica2
100
replica3

why is it a mistake
replica1 dies
100
101
102
103
104
replica1
100
101
replica2
100
replica3

why is it a mistake
which replicas comes
online first
100
101
102
103
104
replica1
100
replica3
100
101
replica2

messages might be lost forever
without errors in the client
https://upload.wikimedia.org/wikipedia/commons/d/d4/George-W-Bush.jpeg

unclean.leader.election=false
(unless you are worried about what
happens when replica1 (leader) is dead for
a long time)

in.sync.replicas = 1 (2 for selected topics)
unclean.leader.election=true

mistake:
default configuration of 128 partitions
for each topic

why is it a mistake
partitions are kafkas way of scaling consumers, 128 partitions can handle 128
consumers processes
0.8 clusters could not reduce the number of partitions without deleting data
highest number of consumers today is 20

0.8 cluster was configured with 128 partitions as default, for all topics.
many partitions and many topics creates many datapoints that must be coordinated
zookeeper must coordinate all this
rebalance must balance all clients on all partitions
zookeeper and kafka went down (may 2015)
(500 topics * 128 partitions)

small number of partitions as default
increase number of partitions for selected topics
understand your use case
reduce length of transactions on consumer side

5 partitions as default
2 topics have more than 5 partitions
topics with lots of traffic

“They say ignorance is
bliss.... they're wrong ”
― Franz Kafka
??
?

“It's only because
of their stupidity
that they're able
to be so sure of
themselves.”
― Franz Kafka,
The Trial
Audun Fauchald Strand
@audunstrand
Henning Spjelkavik
@spjelkavik
http://www.finn.no/apply-here

101 mistakes FINN.no has made with Kafka (Baksida meetup)

Recommended

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to 101 mistakes FINN.no has made with Kafka (Baksida meetup) (20)

More from Henning Spjelkavik (20)

Recently uploaded (20)

101 mistakes FINN.no has made with Kafka (Baksida meetup)