0% found this document useful (0 votes)
52 views

Kafka Lessons That We Learned

Kafka lessons that we learned The company experienced several issues after upgrading to Kafka 0.8 including data imbalance across brokers due to changes in partition assignment and data replication features, excessive disk usage from bugs and compression settings, and increased data volume that required scaling the cluster. Lessons learned include properly configuring data replication, monitoring disk usage and topics, addressing bugs contributing to duplicate data, and planning for future scaling needs by separating data types across clusters.

Uploaded by

kebarcla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

Kafka Lessons That We Learned

Kafka lessons that we learned The company experienced several issues after upgrading to Kafka 0.8 including data imbalance across brokers due to changes in partition assignment and data replication features, excessive disk usage from bugs and compression settings, and increased data volume that required scaling the cluster. Lessons learned include properly configuring data replication, monitoring disk usage and topics, addressing bugs contributing to duplicate data, and planning for future scaling needs by separating data types across clusters.

Uploaded by

kebarcla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Kafka lessons that we learned

Kafka lessons that we learned


the hard way
Data Balancing
The Kafka 0.7 cluster has been stable and well-
balanced from the beginning. Kafka 0.8
introduced some new changes.

Partition assignment.
Data replication feature.
Data Balancing
Partition assignment

Cannot use F5 for load balancing.


Load among brokers out of balance.
Monitor disk usage with Bosun (oops).
Occasional maintenance with kafka-reassign-
partitions.sh.
Data Balancing
Data replication feature

Switched from RAID-10 to JBOD as


recommended on Kafka web site.
Drives were severely out-of-balance.
A bad drive brings down the whole broker.
Switching to RAID-10.
Monitor disk usage with Bosun (oops).
Data Balancing
Our own bugs

Log forwarder topic explosion.


Cap on number of forensic topics per stack.
Increased Data
Why is there so much more data in the Kafka 0.8
cluster?

How do we scale going forward?


Increased Data
Why is there so much more data in the Kafka 0.8
cluster?

Log forwarder EOF bug.


Fixed.
Preventative measures going forward:
Monitor topics with Spark.
quota.producer.default property in Kafka 0.9.
Increased Data
Why is there so much more data in the Kafka 0.8
cluster?

Snappy compression. Brokers are I/O bound.


Switched back to gzip for forensic data.
Continue to use Snappy for binary Avro data.
Increased Data
Why is there so much more data in the Kafka 0.8
cluster?

Duplicate data. Forwarder sends logs to


eventdata and stack-specific topics.
Handle multiple topics with Camus or Gobblin.
Increased Data
How do we scale going forward?

Add nodes to Kafka cluster.


Repurpose 0.7 servers.
Separate Kafka clusters for business and
forensic data.

You might also like