Developing fast scalable Big Data applications has been made significantly easier over the last decade with horizontally scalable open-source databases and streaming technologies such as Apache Cassandra and Apache Kafka. Cloud-native trends have also accelerated the uptake and ease of use of these technologies, and they are available as managed services on multiple cloud platforms.
But maybe it has become too easy to embark on building complex distributed applications using multiple massively scalable open-source technologies, as there are still many performance and scalability issues to be aware of. In this talk, I will give a high-level overview of some of the performance and scalability challenges I’ve overcome over the last six years building realistic demonstration applications using Apache Cassandra and Apache Kafka (and more), supplemented with performance insights from our operation of thousands of production clusters. Keynote talk for the Performance Engineering track at Community Over Code Halifax 2023, https://communityovercode.org/schedule-list/#PE001