Event-Driven Architecture Masterclass: Engineering a Robust, High-performance EDA

Engineering a Robust and
High performance EDA
with Redpanda
Christina Lin

2
About me
2
Christina Lin
Developer Advocate, Redpanda
SOA
WebSphere
DB2
Sybase
Oracle
MQ
J2EE
EJB
DevOps
Microservice
EIP
K8s
Agile Integration
Data
Mesh
Active MQ
Live data stack
Resilience - handle failures and scale gracefully
Elasticity – infrastructure that can scale dynamically
Decentralization - data ownership, empowering individual teams
Performance - low latency and high throughput
Autonomy – self service, define quality, and access
Nimble - efficient data movement
Distributed -distributed data processing for cloud native
Agility – quickly respond to change in data

Event Driven Architecture
4
Services
Microservices
Databases
IoT Devices
Applications
System B
Team A
Department C
Team D
Group E
Services
Databases
IoT Devices
Applications
System B
Team A
Department C
Team D
Group E
Microservices
Producer
Consumer

Event Driven Architecture
5
Orders
Health records
Restock Signal
CDC Event
Streaming
Table/
Materialize
view
Data Store
Payroll
Payment
Shipment Signal
Inventory

The Contracts
6
Microservices
Microservices
Databases
/ CDC
Microservices
Data
Lake/Data
warehouse
Microservices

Schema Registry
7
Producer
Data structure encoding
- Avro, Protobuf and JSON
Data structure
- {name:type}
Serialize
Download the
schema (version)
Consumer
Schema Registry
Deserialize
Value
(Binary)
Schema
ID
Key
(Binary)
Value
(Binary)

Schema Registry
8
Server-side validation
Value
(Binary)
Sche
ma ID
Key
(Binar
y)
Value
(Binary)
Schema Registry
Check if schema id is
valid
Schema Registry
Producer
• Backward
• Forward
• Full
compatibility
• None
Schema Registry
Version 1
Version 2
Version 3

Schema Registry in Redpanda
9
Service Registry
Service Registry
Restful Endpoint
Restful Endpoint
_schemas
_schemas

Schema Registry
• Assign a default value to the fields that you might remove in the future
• Do not rename an existing field—add an alias instead
• When using schema evolution, always provide a default value
• Never delete a required field
When not to use Schema registry
• You’re certain the schema won’t change in the future
• If hardware resources are limited and low latency is critical, it may impact
performance (e.g., for IoT)
• You want to serialize the data with an unsupported serialization scheme
10

Event validation & DLQ
11
DLQ
Consumer
Correction/
Remedy
Validator
DLQ
Correction/
Remedy
Validator
DLQ
Correction/
Remedy

In broker validation – how it works
12
Replicate
across clusters
customer
partition 1
Load to
cache
Validate
against
schema
Transform
Write back to
disk with DMA
Customer validated
partition 1
Example repo: https://github.com/redpanda-data/redpanda-labs/tree/main/data-transforms/to_avro

In broker validation & transformation
• Firsthand processing, quick filtering
• Simple rerouting determine on ingested data
• Masking, schema validation
• Stateless, functional processing
When not to use in broker transformation?
• When it requires external data dependencies
• Windowing, complex processing, with multiple streams of input
• When it requires to keep the state of the processes
13

15
Turning the knobs
Producer
Producer
Producer
Producer
Producer
Producer
Consumer
Consumer
Consumer
Consumer
Consumer
Consumer
Consumer

16
The Broker
Infrastructure Storage – XFS,NVMe
Network bandwidth
Memory
CPU
Location (Multi-AZ)
OS
Disk I/O
read_iops/bandwidth
write_iops/bandwidth
Broker
# Brokers
# Replicas
# Partitions
Log segment size

17
Partitions
Partitions
Producer
Consumer
Consumer
Consumer
Group A
• Round Robin
• Hashing Key Partition
• Custom Partitioner
Overhead
• File handler
• Follower, heartbeat
• Large Metadata
quadratic (N2)
Idempotency
• Order guarantee in partition only
Higher latency
• Producer batch
Consumer rebalance
• RangeAssignor (SW)
• RoundRobinAssignor(SW)
• StickyAssignor(SW)
• CooperativeStickyAssignor
• Static (No Rebalance)

18
Producer
Producer
fsync
Acknowledgment
from the leader Ack=all
Ack=1
Majority of replicas
acknowledge
write_caching_default=true
flush_bytes, flush_ms
Ack=0
Doesn’t wait for
acknowledgments
and doesn’t retry
sending messages
Producer
batch.size
linger.ms
compression

19
Consumer
Consumer
fetch.min.bytes
max.poll.records
fetch.max.bytes
fetch.max.wait.ms

High Throughput
• There is no on size fits all, there are many factor when it comes to
performances.
• More partition will allow more parallel processing, hence higher throughput,
but it comes with cost.(Avoid over-partitioning or under-partitioning.)
• Experiment with acks settings, Enable write caching,
• Explore how the producer batches messages. Increasing the value
of batch.size and linger.ms can increase throughput by making the
producer add more messages into one batch
• Explore consumer fetch frequency and message size.
• Start with a baseline configuration and gradually make changes, measuring
the impact of each change on performance.
20

21
Robust for Stateful Processes

Beyond just streams of events
22
Databases
/ CDC
Microservices
Databases
/ CDC
Databases
/ CDC
Processor

Beyond just streams of events
23
Databases
/ CDC
Microservices
Databases
/ CDC
Databases
/ CDC
Processor

Limited disk space
24
Event Sourcing
S3 Rehydrate

State Snapshot
25
Microservices
Databases
/ CDC
Databases
/ CDC
Databases
/ CDC
Processor
Snapshot

Summary
■ Use schema to insure data shape for consumer
■ When designing, think about compatibility
■ Validation to ensures consumer always get the correct format.
■ In broker transform are great for simple, functions, stateless processes
■ Provision appropriate partition to your topics
■ Depends on your use case, for producer, always set the right Ack, and buffer
■ For stateful streams processing, use snapshot for fault tolerance
26

On demand example
27
Batch
Every 10 mins
CSV
CSV
Batch
pipeline
Batch Processing
Batch
pipeline
Right away!
Stream
CSV

Keep in touch!
Christina Lin
Developer Advocate
Redpanda
Christina@redpanda.com

Event-Driven Architecture Masterclass: Engineering a Robust, High-performance EDA

Recommended

More Related Content

Similar to Event-Driven Architecture Masterclass: Engineering a Robust, High-performance EDA (20)

More from ScyllaDB (20)

Recently uploaded (20)

Event-Driven Architecture Masterclass: Engineering a Robust, High-performance EDA