0% found this document useful (0 votes)
52 views

Stream Processing at Lyft

Lyft has built a streaming platform using Apache Flink for stream processing and Apache Kafka for messaging. The goals were to make building real-time microservices easy and solve stream processing problems once for the company. Some open problems discussed include efficiently rescaling Kafka while preserving per-key ordering, enabling dynamic computations over streams, long-term storage for real-time and historical data access, and achieving zero downtime deployments for streaming services. Lyft is still working on these challenging problems and is hiring for engineers interested in streaming systems.

Uploaded by

Jamie Grier
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

Stream Processing at Lyft

Lyft has built a streaming platform using Apache Flink for stream processing and Apache Kafka for messaging. The goals were to make building real-time microservices easy and solve stream processing problems once for the company. Some open problems discussed include efficiently rescaling Kafka while preserving per-key ordering, enabling dynamic computations over streams, long-term storage for real-time and historical data access, and achieving zero downtime deployments for streaming services. Lyft is still working on these challenging problems and is hiring for engineers interested in streaming systems.

Uploaded by

Jamie Grier
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Streaming

Jamie Grier | @jgrier

1
Agenda

• Goals of Lyft’s Streaming Platform

• Streaming Platform Overview

• Why Flink

• Why Kafka

• Open problems

2
Goals of Lyft’s Streaming Platform

• Make it easy to build real-time, event-driven, stateful,


microservices

• Solve the hard parts of stream processing ONCE for the entire
company

• Be a force multiplier for other teams within Lyft

3
Streaming Platform Overview
Stream Compute
Pub/Sub Streaming Pub/Sub
Service One

Streaming
Service Two

Streaming
Service Three

Stream / Schema Deployment Metrics &


Alerts Logging
Registry Tooling Dashboards

Amazon Salt
Amazon S3 Wavefront Docker
EC2 (Conifg / Orca) 4
Lyft Streaming Platform - Streaming Compute Criteria

API Considerations: Operational Considerations


● Stateful Computation and Exactly-
● Functional / Fluent API
once Processing Semantics
● Flexible Windowing API ● Robust State Management
● Event Time Support ● Data Reprocessing (backfill)
● Asynchronous Checkpoints
● Apache Beam Support
● Back-pressure
● Stream SQL
● High throughput and low-latency
● Powerful Direct API ● Deployment Architecture
● Late Data Handling

The contenders: Apache Flink, Apache Spark Streaming, Apache Kafka Streams
5
Why Flink? API Considerations
• Functional / Fluent API
• Flexible Windowing API
• Event Time Support
• Apache Beam Support
• Stream SQL
• Powerful Direct API
• Late Data Handling

6
Why Flink? Operational Considerations

• Stateful Computation and Exactly-once Processing Semantics


• Robust State Management
• Stateful Data Reprocessing (backfill)
• Asynchronous Checkpoints
• Back-pressure
• High throughput and low-latency
• Deployment Architecture

7
Lyft Streaming Platform - Pub/Sub Criteria

Semantics / Features Operational Considerations


● Write Latency
● Durability
● Read Latency
● Consumer Fanout
● Project Maturity
● Transactions / Idempotent Writes ● Vendor Support
● Per-Key Ordering Guarantees
● Long-Term Data Storage
● Auto-Scaling

The contenders: Apache Kafka, Amazon Kinesis, Pravega


8
Why Kafka?
Pros
• Durability & Write Latency
• Read Latency & Consumer Fanout
• Transactions & Idempotent Writes
• Operational Concerns & Vendor Support
Cons
• No ordering by key, only partition
• Long term data storage still an issue
• Auto-Scaling still an issue

9
Open Problems

• Rescaling Kafka while preserving per-key ordering

• Efficient Dynamic Computations over streams

• Long term storage for events: real-time and historical reads

• Zero Downtime deployments for streaming services

10
Rescaling Kafka

• Rescaling Kafka while preserving per-key ordering

• Kafka only provides partition ordering guarantees!

• We want per-key ordering guarantees

• Guarantees should hold across re-partitioning events

• Basic approach: Read old partitions completely before reading


new

• Achieve this using something akin to Flink’s checkpoint 11


Rescaling Kafka while preserving per-key ordering

12
Rescaling Kafka while preserving per-key ordering

13
Efficient Dynamic Computation Over Streams

• Enable many users to dynamically submit small streaming


computations

• Share bandwidth amongst multiple computations

• Share computed sub-results amongst multiple computations

• Correctly handle bootstrapping of computations which


depend on historical data

• Basic approach: Map any computation into a fixed/general


14
Efficient Dynamic Computations over streams

15
Efficient Dynamic Computations over streams

16
Long term storage for events: Real-time and historical reads

17
Zero Downtime deployments for streaming services

18
Summary

• Lyft is building a next generation streaming platform based


on Apache Flink and Apache Kafka

• Stateful stream processing is not a “solved problem”

• There are many hard / open problems left to solve

• If these sort of problems interest you please come join us!

We’re Hiring!
19
Thank you!
Jamie Grier

20

You might also like