"It's important that even under load, Apache Kafka ensures user topics are fully replicated in synch. Replication is essential to endure resilience to data loss, so both users and operators care about it. If a topic partition falls out of the ISR (In-Synch-replicas) set, a user experiences unavailability (when producing with the default acknowledgment setting). Users may use non-default acks mode to work around it, but the effect on a Kafka cluster is to make the under-replication worse. Even simple Under replication with no Under Min Isr is to be avoided as a cluster update may cause the dreaded Under Min ISR. There are a number of settings that can be used, from quotas to number of replication threads to more low-level settings. This session wants to show how we successfully measured and evolved our Kafkas configuration, with the goal of giving the best possible user experience (and resilience to their data). Hofstadter's Law applied! ""It always takes longer than you expect, even when you take into account Hofstadter's Law."""