At LinkedIn, the Kafka infrastructure is run as a service: the Streaming team develops and deploys Kafka, but is not the producer or consumer of the data that flows through it. With multiple datacenters, and numerous applications sharing these clusters, we have developed an architecture with multiple pipelines and multiple tiers. Most days, this works out well, but it has led to many interesting problems. Over the years we have worked to develop a number of solutions, most of them open source, to make it possible for us to reliably handle over a trillion messages a day.