Apache Kafka's performance and scalability can be impacted by both hardware and software dimensions. In this presentation, we explore two recent experiences from running a managed Kafka service.
The first example recounts our experiences with running Kafka on AWS's Graviton2 (ARM) instances. We performed extensive benchmarking but didn't initially see the expected performance benefits. We developed multiple hypotheses to explain the unrealized performance improvement, but we could not experimentally determine the cause. We then profiled the Kafka application, and after identifying and confirming a likely cause, we found a workaround and obtained the hoped-for improved price/performance.
The second example explores the ability of Kafka to scale with increasing partitions. We revisit our previous benchmarking experiments with the newest version of Kafka (3.X), which has the option to replace Zookeeper with the new KRaft protocol. We test the theory that Kafka with KRaft can 'scale to millions of partitions' and also provide valuable experimental feedback on how close KRaft is to being production-ready.
Presentation for the ApacheCon NA Performance Engineering Track, October 6, 2022, Sheraton Hotel, New Orleans.