More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved.
More Datacenters, More Problems

Todd Palino

Who Am I?
 Kafka, Samza, and Zookeeper SRE at LinkedIn
 Site Reliability Engineering
– Administrators
– Architects
– Developers
 Keep the site running, always
3

Data @ LinkedIn is Hiring!
 Streams Infrastructure
– Kafka pub/sub ecosystem
– Stream Processing Platform built on Apache Samza
– Next Generation change capture technology (incubating)
 Join us in working on cutting edge stream processing infrastructures
– Please contact kparamasivam@linkedin.com
– Software developers and Site Reliability Engineers at all levels
 LinkedIn Data Infrastructure Meetup
– Where: LinkedIn @ 2061 Stierlin Court, Mountain View
– When: May 11th at 6:30 PM
– Registration: http://www.meetup.com/LinkedIn-Data-Infrastructure-Meetup
4

Kafka At LinkedIn
 1100+ Kafka brokers
 Over 32,000 topics
 350,000+ Partitions
 875 Billion messages per day
 185 Terabytes In
 675 Terabytes Out
 Peak Load (whole site)
– 10.5 Million messages/sec
– 18.5 Gigabits/sec Inbound
– 70.5 Gigabits/sec Outbound
5
 1800+ Kafka brokers
 Over 95,000 topics
 1,340,000+ Partitions
 1.3 Trillion messages per day
 330 Terabytes In
 1.2 Petabytes Out
 Peak Load (single cluster)
– 2 Million messages/sec
– 4.7 Gigabits/sec Inbound
– 15 Gigabits/sec Outbound

What Will We Talk About?
 Tiered Cluster Architecture
 Multiple Datacenters
 Performance Concerns
 Conclusion
6

Tiered Cluster Architecture
7

One Kafka Cluster
8

Single Cluster – Remote Clients
9

Single Cluster – Spanning Datacenters
10

Multiple Clusters – Local and Remote Clients
11

Multiple Clusters – Message Aggregation
12

Why Not Direct?
 Network Concerns
– Bandwidth
– Network partitioning
– Latency
 Security Concerns
– Firewalls and ACLs
– Encrypting data in transit
 Resource Concerns
– A misbehaving application can swamp production resources
13

What Do We Lose?
 You may lose message ordering
– Mirror maker breaks apart message batches and redistributes them
 You may lose key to partition affinity
– Mirror maker will partition based on the key
– Differing partition counts in source and target will result in differing distribution
– Mirror maker does not (without work) honor custom partitioning
14

Multiple Datacenters
15

Why Multiple Datacenters?
 Disaster Recovery
16
 Geolocalization  Legal / Political

Planning For the Worst
 Keep your datacenters identical
– Snowflake services are hard to fail over
 Services that need to run only once in the infrastructure need to coordinate
– Zookeeper across sites can be used for this (at least 3 sites)
– Moving from one Kafka cluster to another is hard
– KIP-33: Time-based log indexing
 What about AWS (or Google Cloud)?
– Disaster recovery can be accomplished via availiability zones
17

Planning For the Best
 Not much different than designing for disasters
 Consider having a limited number of aggregate clusters
– Every copy of a message costs you money
– Have 2 downstream (or super) sites that process aggregate messages
– Push results back out to all sites
– Good place for stream processing, like Samza
 What about AWS (or Google Cloud)?
– Geolocalization can be accomplished via regions
18

Segregating Data
 All the data, available everywhere
 The right data, only where we need it
 Topics are (usually) the smallest unit we want to deal with
– Mirroring a topic between clusters is easy
– Filtering that mirror is much harder
 Have enough topics so consumers do not throw away messages
19

Retaining Data
20
 Retain data only as long as you have to
 Move data away from the front lines as
quickly as possible.

Performance Concerns
21

Buy The Book!
22
Early Access available now.
Covers all aspects of Kafka,
from setup to client
development to ongoing
administration and
troubleshooting.
Also discusses stream
processing and other use
cases.

Kafka Cluster Sizing
 How big for your local cluster?
– How much disk space do you have?
– How much network bandwidth do you have?
– CPU, memory, disk I/O
 How big for your aggregate cluster?
– In general, multiple the number of brokers by the number of local clusters
– May have additional concerns with lots of consumers
23

Topic Configuration
 Partition Counts for Local
– Many theories on how to do this correctly, but the answer is “it depends”
– How many consumers do you have?
– Do you have specific partition requirements?
– Keeping partition sizes manageable
 Partition Counts for Aggregate
– Multiply the number of partitions in a local cluster by the number of local clusters
– Periodically review partition counts in all clusters
 Message Retention
– If aggregate is where you really need the messages, only retain it in local for long
enough to cover mirror maker problems
24

Where Do I Put Mirror Makers?
 Best practice was to keep the mirror maker local to the target cluster
– TLS (can) make this a new game
 In the datacenter with the produce cluster
– Fewer issues from networking
– Significant performance hit when using TLS on consume
 In the datacenter with the consume cluster
– Highest performance TLS
– Potential to consume messages and drop on produce
25

Mirror Maker Sizing
 Number of servers and streams
– Size the number of servers based on the peak bytes per second
– Co-locate mirror makers
– Run more mirror makers in an instance than you need
– Use multiple consumer and producer streams
 Other tunables to look at
– Partition assignment strategy
– In flight requests per connection
– Linger time
26

Segregation of Topics
 Not all topics are created equal
 High Priority Topics
– Topics that change search results
– Topics used for hourly or daily reporting
 Low Latency Topics
 Run a separate mirror maker for these topics
– One bloated topic won’t affect reporting
– Restarting the mirror maker takes less time
– Less time to catch up when you fall behind
27

Conclusion
28

Broker Improvements
 Namespaces
– Namespace topics by datacenter
– Eliminate local clusters and just have aggregate
– Significant hardware savings
 JBOD Fixes
– Intelligent partition assignment
– Admin tools to move partitions between mount points
– Broker should not fail completely with a single disk failure
29

Mirror Maker Improvements
 Identity Mirror Maker
– Messages in source partition 0 get produced directly to partition 0
– No decompression of message batches
– Keeps key to partition affinity, supporting custom partitioning
– Requires mirror maker to maintain downstream partition counts
 Multi-Consumer Mirror Maker
– A single mirror maker that consumes from more than one cluster
– Reduces the number of copies of mirror maker that need to be running
– Forces a produce-local architecture, however
30

Administrative Improvements
 Multiple cluster management
– Topic management across clusters
– Visualization of mirror maker paths
 Better client monitoring
– Burrow for consumer monitoring
– No open source solution for producer monitoring (audit)
 End-to-end availability monitoring
31

Getting Involved With Kafka
 http://kafka.apache.org
 Join the mailing lists
– users@kafka.apache.org
– dev@kafka.apache.org
 irc.freenode.net - #apache-kafka
 Meetups
– Apache Kafka - http://www.meetup.com/http-kafka-apache-org
– Bay Area Samza - http://www.meetup.com/Bay-Area-Samza-Meetup/
 Contribute code
32

Data @ LinkedIn is Hiring!
 Streams Infrastructure
– Kafka pub/sub ecosystem
– Stream Processing Platform built on Apache Samza
– Next Generation change capture technology (incubating)
 Join us in working on cutting edge stream processing infrastructures
– Please contact kparamasivam@linkedin.com
– Software developers and Site Reliability Engineers at all levels
 LinkedIn Data Infrastructure Meetup
– Where: LinkedIn @ 2061 Stierlin Court, Mountain View
– When: May 11th at 6:30 PM
– Registration: http://www.meetup.com/LinkedIn-Data-Infrastructure-Meetup
33

More Datacenters, More Problems

More Datacenters, More Problems

Recommended

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to More Datacenters, More Problems (20)

More from Todd Palino (7)

Recently uploaded (20)

More Datacenters, More Problems

Editor's Notes