Stream Processing at Lyft
Stream Processing at Lyft
1
Agenda
• Why Flink
• Why Kafka
• Open problems
2
Goals of Lyft’s Streaming Platform
• Solve the hard parts of stream processing ONCE for the entire
company
3
Streaming Platform Overview
Stream Compute
Pub/Sub Streaming Pub/Sub
Service One
Streaming
Service Two
Streaming
Service Three
Amazon Salt
Amazon S3 Wavefront Docker
EC2 (Conifg / Orca) 4
Lyft Streaming Platform - Streaming Compute Criteria
The contenders: Apache Flink, Apache Spark Streaming, Apache Kafka Streams
5
Why Flink? API Considerations
• Functional / Fluent API
• Flexible Windowing API
• Event Time Support
• Apache Beam Support
• Stream SQL
• Powerful Direct API
• Late Data Handling
6
Why Flink? Operational Considerations
7
Lyft Streaming Platform - Pub/Sub Criteria
9
Open Problems
10
Rescaling Kafka
12
Rescaling Kafka while preserving per-key ordering
13
Efficient Dynamic Computation Over Streams
15
Efficient Dynamic Computations over streams
16
Long term storage for events: Real-time and historical reads
17
Zero Downtime deployments for streaming services
18
Summary
We’re Hiring!
19
Thank you!
Jamie Grier
20