How to build 1000 microservices with Kafka and thrive

natans@wix.com twitter @NSilnitsky linkedin/natansilnitsky github.com/natansil
How to build 1000 microservices
with Kafka and thrive
Natan Silnitsky
Backend Infra Developer, Wix.com

Wix Backend
~270 developers
~1300 micro-services
>10M LoC (Scala ,Java, Node.js)

Wix Backend
~270 developers
~1300 micro-services
>10M LoC (Scala ,Java, Node.js)
850M Kafka messages every day

Our inter-service communication Story
Or
How Wix moved to event driven architecture

till 2014
synchronous
rpc-json + retries
*special cases:
activeMQ
Embedded JMS broker
for the JVM

Kafka SDK
synchronous
rpc-json + retries
*special cases:
activeMQ
2015
many
production
operations
issues
till 2014

A lot of
boilerplate
and missing
features
Kafka SDK
2015
synchronous
rpc-json + retries
*special cases:
activeMQ
till 2014
many
production
operations
issues

Greyhound Make it simple
and abstract
Kafka SDK
2015
synchronous
rpc-json + retries
*special cases:
activeMQ
till 2014
many
production
operations
issues
A lot of
boilerplate
and missing
features

Greyhound
Kafka SDK
2015
synchronous
rpc-json + retries
*special cases:
activeMQ
till 2014
many
production
operations
issues
A lot of
boilerplate
and missing
features
2018
Async request-reply
* polyglot

Greyhound
Kafka SDK
Build 1,000 microservices and thrive
* Easily change

Greyhound
Kafka SDK
Kafka works for us because of its structure

Kafka
Producer
Topic
Partition
Partition
Partition
Kafka Broker
Topic Topic
Partition
Partition
Partition
Partition
Partition
Partition

Topic TopicTopic
Partition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
Kafka
Producer
Partition
0 1 2 3 4 5
append-only log

Topic TopicTopic
Partition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
0 1 2 3 4 5
Kafka
Consumers
6 7 8 9
1
0
1
1
1
2
1
3
1
4
1
5
1
6
1
7
1
8
1
9
2
0

Kafka
Consumer
Kafka
Producer
Kafka Broker
*performance/scale

Greyhound
Producer
Greyhound
Consumer
Kafka
Consumer
Kafka
Producer
Kafka Broker
Why we wrapped Kafka

Greyhound wraps Kafka
Less
boilerplate
…? - Setup boilerplate

KafkaProducer createProducer() {
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("acks", "all");
props.put("retries", 0);
props.put("batch.size", 16384);
props.put("linger.ms", 1);
props.put("buﬀer.memory", 33554432);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer<>(props);
Return producer;
}
Setup
with Kafka

KafkaProducer createProducer(GreyhoundBuilder greyhoundBuilder) {
GreyhoundBuﬀeredProducer unorderedProducer = greyhoundBuilder
.buﬀeredProducerMaker()
.unordered()
.build();
}
Setup
with Greyhound
* broker

KafkaConsumer createConsumer() {
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.serializer", "com.foo.customserializer.UserSerializer");
KafkaConsumer<String, User> consumer = new KafkaConsumer<>(props);
Return producer;
}
Setup
with Kafka

GreyhoundConsumer createConsumer() {
GreyhoundConsumer consumer = GreyhoundConsumer.aGreyhoundConsumerSpec(topic, handler);
}
Setup
with Greyhound
* Opinionated Types

Less
boilerplate
…? - Setup boilerplate
- Consumer API
boilerplate

static void runConsumer() throws InterruptedException {
ﬁnal Consumer<Long, SomeMessage> consumer = createConsumer();
while (true) {
ﬁnal ConsumerRecords<Long, SomeMessage> consumerRecords =
consumer.poll(1000);
consumerRecords.forEach(record -> {
System.out.printf("Record value:%dn", record.value().messageValue);
});
consumer.commitAsync();
}
}
Kafka
Consumer API

static void runConsumer() throws InterruptedException {
String topic = "some-topic";
MessageHandler<SomeMessage> handler =
message -> System.out.printf("Record value:%dn", message.messageValue);
GreyhoundConsumer consumer = GreyhoundConsumer.aGreyhoundConsumerSpec(topic, handler);
}
Greyhound
Consumer API
* No explicit commit

- Setup boilerplate
- Consumer API boilerplate
Extra
features
…?

The Kafka consumer is NOT thread-safe.
All network I/O happens in the thread of the application making the call.
It is the responsibility of the user to ensure that multi-threaded access is
properly synchronized.
Un-synchronized access will result in ConcurrentModiﬁcationException.
https://kafka.apache.org/20/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html

So
instead of
a lot of
consumers...
- Setup boilerplate
+ Parallel
Consumption!

Kafka Broker
Topic
Greyhound
Consumer
Kafka
Consumer
WORKER THREAD
POOL
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
(THREAD-SAFE)
PARALLEL
CONSUMPTION

static void runConsumer() {
MessageHandler<SomeMessage> handler = ...
GreyhoundConsumer consumer = GreyhoundConsumer.aGreyhoundConsumerSpec(topic, handler)
.withGroup("some-group")
.withMaxParallelism(10);
}
Greyhound
Consumer

...what about
Error handling?
- Setup boilerplate
+ Thread-safe Parallel Consumption
+ Retries!

static void setupConsumer() {
ConsumeRetryPolicy retryPolicy = ConsumeRetryPolicy.aRetryPolicy(
Retries.fromBackoﬀs(Duration.ofSeconds(1), Duration.ofMinutes(10)));
GreyhoundConsumer consumer = GreyhoundConsumer.aGreyhoundConsumerSpec(topic, handler)
.withGroup("some-group")
.withRetry(retryPolicy);
}
Greyhound
Consumer

Kafka Broker
site-created-topic
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
Greyhound Consumer
Kafka Consumer
FAILS TO
READ

Kafka Broker
site-created-topic
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
site-created-topic-retry-0
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
Inspired by Uber
RETRY!
Greyhound Consumer
Kafka Consumer
RETRY
PRODUCER

Kafka Broker
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
Kafka Broker
site-created-topic
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
site-created-topic-retry-N
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
RETRY!
Inspired by Uber

Retries same message on failure
* Handle lag
HANDLER
BLOCKING
POLICY
Kafka Broker
site-created-topic
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
Greyhound Consumer
Kafka Consumer

* Handle lag
HANDLER
BLOCKING
POLICY

Greyhound wraps Kafkaand when Kafka
brokers are
unavailable...
- Setup boilerplate
+ Scheduled retries & blocking handler
+ Resilient Producer

Greyhound
Resilient
Producer
FAILS TO
PRODUCE
Kafka Broker

Retry on failure
Save to disk
H2
Greyhound
Resilient
Producer
Kafka Broker

H2H2H2
Retry on failure
H2
Greyhound
Resilient
Producer
Ordered
block until message
is produced
Kafka Broker

H2H2H2
Retry on failure
H2
Greyhound
Resilient
Producer
Kafka Broker
Unordered
(better throughput
and resiliency)

- Setup boilerplate
+ Context Propagation
Super cool
for us

CONTEXT
PROPAGATION
USER REQUEST
METADATA
Browser
User
action

Producer
Kafka Broker
Topic/Partition/Oﬀset
Headers
Key
Value
timestamp
Browser

Kafka Broker
Topic/Partition/Oﬀset
Headers
Key
Value
timestamp
Consumer

- Setup boilerplate
+ Context propagation
+ Metrics publishing
One last thing…
for now

ALERTS
METRICS
PUBLISHING
Producer Consumer
Metrics
Server
Slack
email

Kafka
Consumer
Kafka
Producer
Kafka Broker
- Setup boilerplate
+ Context propagation
+ Metrics publishing

Greyhound takes care of
the mechanics of distributed systems
so YOU can focus on writing business logic.

Greyhound
Consumer
Greyhound
Producer
WIX CI +
Greyhound new features
/ updates
continuously
updated

Kafka Broker
ConsumerProducer
Browser
Service A Service B
IT’S A HYBRID ARCHITECTURE.
REQUEST-RESPONSE
EVENT-DRIVEN
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
REST

Kafka Broker
Producer
Browser
Pub/Sub
REST
use case #1
SITE
CREATED!

Kafka Broker
Consumer
Producer
Browser
Pub/Sub
REST
use case #1
Consumer
Consumer
SITE
CREATED!

Kafka Broker
Consumer + Retries
Job
Scheduler
Guarantee completion
use case #2
Producer
1
32
RENEW
SUBSCRIPTION!

Kafka Broker
Long running async business process
use case #3
Browser Web
Sockets
Service
1
Subscribe for notiﬁcations

Kafka Broker
Consumer
use case #3
Browser
Producer
REST
2
3
Web
Sockets
Service
4
1
Subscribe for notiﬁcations

Kafka Broker
Consumer
use case #3
Browser
Producer
REST
2
3
Web
Sockets
Service
4
6
Completion notiﬁcation
5

Kafka Broker
ConsumerProducer
Service C
THE
COMPACT
OPTION
KVStore
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5

0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
DC A
(Kafka Cluster A)
THE
CROSS-DC
REPLICATION
OPTION
DC B
(Kafka Cluster A)

DC B
(Kafka Cluster A)
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
DC A
(Kafka Cluster A)
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
dynamic consumer
COPIES
IN REAL
TIME
X20
RAVEN

Wix has embraced
Asynchronous, Event Driven design.
...and has all the required tools in place to support it.

Thank You

Slides & More
slideshare.net/NatanSilnitsky
medium.com/@natansil
twitter.com/NSilnitsky

Q&A

How to build 1000 microservices with Kafka and thrive

Recommended

More Related Content

What's hot (20)

Similar to How to build 1000 microservices with Kafka and thrive (20)

More from Natan Silnitsky (20)

Recently uploaded (20)

How to build 1000 microservices with Kafka and thrive