This is a talk given at ApacheCon 2015
If data is the lifeblood of high technology, Apache Kafka is the circulatory system in use at LinkedIn. It is used for moving every type of data around between systems, and it touches virtually every server, every day. This can only be accomplished with multiple Kafka clusters, installed at several sites, and they must all work together to assure no message loss, and almost no message duplication. In this presentation, we will discuss the architectural choices behind how the clusters are deployed, and the tools and processes that have been developed to manage them. Todd Palino will also discuss some of the challenges of running Kafka at this scale, and how they are being addressed both operationally and in the Kafka development community.
Note - there are a significant amount of slide notes on each slide that goes into detail. Please make sure to check out the downloaded file to get the full content!