SlideShare a Scribd company logo
Streaming your Lyft Ride Prices
Flink Forward, San Francisco, April 2nd 2019
Akshay Balwally | Engineer, Pricing
Thomas Weise | @thweise | Engineer, Streaming Platform
go.lyft.com/streaming-prices-ffsf-2019
Agenda
2
● Introduction to dynamic pricing
● Legacy pricing infrastructure
● Streaming based infrastructure
● Beam & multiple languages
● Beam Flink runner
● Lessons learned
3
Dynamic Pricing
Supply/Demand curve
ETA
Pricing
Notifications
Detect Delays
Coupons
User Delight
Fraud
Behaviour Fingerprinting
Monetary Impact
Imperative to act fast
Top Destinations
Core Experience
Introduction to
Dynamic Pricing
4
5
● Dynamic Pricing- price changes minutely at each
location bucket
● Why?
○ At face value, dynamic pricing is strange
○ But Lyft’s marketplace changes quickly
Dynamic Pricing- What and Why?
The Marketplace affects Prices
● An Imbalanced Market is Inefficient
○ Too many available drivers: bad
○ Too few available drivers: bad
○ Solution: Price lever controls passenger
request rate, which maintains healthy
supply levels
● Result: increase price if demand >> supply
6
What is PrimeTime?
● Belief: There exists some set of
optimal price multipliers per
location/time bucket
● PrimeTime- Lyft product that sets a
multiplier for each gh6 each
minute
● Example: In ‘9q8yyv’, at 5:01pm
PST, PrimeTime = 2.0
● Scale: Roughly 3 million
geohashes prices every minute
7
Why is PrimeTime Hard?
● 1. Need low-latency information about supply and demand
● 2. Pricing is an unsupervised problem- correct answer is never observed
● Solution: break the problem into multiple models that form a DAG, where
intermediate models are solving supervised problems
○ Example: f (available_supply) -> pickup_times
8
Legacy Pricing
Infrastructure
9
Legacy architecture: A series of cron jobs
● Ingest high volume of client app events
(Kinesis, KCL)
● Compute features (e.g. demand,
conversation rate, supply) from events
● Run ML models on features to compute
primetime for all regions (per min, per gh6)
SFO, calendar_min_1: {gh6: 1.0, gh6: 2.0, ...}
NYC: calendar_min_1: {gh6, 2.0, gh6: 1.0, ...}
10
Problems
1. Latency
2. Code complexity (LOC)
3. Hard to add new features involving windowing/join (i.e. arbitrary demand
windows, subregional computation)
4. No dynamic / smart triggers
11
Can we use Flink?
12
13
Streaming Stack
13
Streaming
Application
(SQL, Java)
Stream / Schema
Registry
Deployment
Tooling
Metrics &
Dashboards
Alerts Logging
Amazon
EC2
Amazon S3 Wavefront
Salt
(Config / Orca)
Docker
Source Sink
14
Streaming and Python
● Flink and many other big data ecosystem projects are Java / JVM based
○ Team wants to adopt streaming, but doesn’t have the Java skills
○ Jython != Python
● Use cases for different language environments
○ Python primary option for Machine Learning
● Cost of many API styles and runtime environments
15
Python via Beam
Streaming
Application
(Python/Beam)
Source Sink
Streaming based
Pricing Infrastructure
16
17
Pipeline (conceptual outline)
kinesis events
(source)
aggregate and
window
filter events
run models to
generate
features
(culminating in
PT)
internal services
redis
ride_requested,
app_open, ...
unique_users_per_min,
unique_requests_per_5_
min, ...
conversion learner,
eta learner, ...
Lyft apps
(phones)
valid sessions,
dedupe, ...
Details of implementation
1. Filtering (with internal service calls)
2. Aggregation with Beam windowing: 1min, 5min (by event time)
3. Triggers: watermark, data-driven (stateful processing)
4. Join multiple streams: CoGroup or stateful processing
5. Machine learning models invoked within Beam transforms
6. Final gh6:pt output from pipeline stored to Redis
18
Gains
• Latency: 3 minutes -> 30s
‒ Latency now dominated by model execution
• Reuse of model code
• 10K => 4K LOC
• 300 => 120 AWS instances
19
Beam and multiple
languages
20
21
The Beam Vision
1. End users: who want to write pipelines in a
language that’s familiar.
2. SDK writers: who want to make Beam
concepts available in new languages.
Includes IOs: connectors to data stores.
3. Runner writers: who have a distributed
processing environment and want to
support Beam pipelines
Beam Model: Fn Runners
Apache
Flink
Apache
Spark
Beam Model: Pipeline Construction
Other
LanguagesBeam Java
Beam
Python
Execution Execution
Cloud
Dataflow
Execution
https://s.apache.org/apache-beam-project-overview
22
Multi-Language Support
● Initially Java SDK and Java Runners
● 2016: Start of cross-language support effort
● 2017: Python SDK on Dataflow
● 2018: Go SDK (for portable runners)
● 2018: Python on Flink MVP
● Next: Cross-language pipelines, more portable runners
23
Python Example
p = beam.Pipeline(runner=runner, options=pipeline_options)
(p
| ReadFromText("/path/to/text*") | Map(lambda line: ...)
| WindowInto(FixedWindows(120)
trigger=AfterWatermark(
early=AfterProcessingTime(60),
late=AfterCount(1))
accumulation_mode=ACCUMULATING)
| CombinePerKey(sum))
| WriteToText("/path/to/outputs")
)
result = p.run()
( What, Where, When, How )
24
⋮
input | Sum.PerKey()
Python
input.apply(
Sum.integersPerKey())
Java
SELECT key, SUM(value)
FROM input GROUP BY key
SQL (via Java)
⋮
Cloud Dataflow
Apache Spark
Apache Flink
Apache Apex
Gearpump
Apache Samza
Apache Nemo
(incubating)
IBM Streams
Sum Per Key
Java objects
Sum Per Key
Dataflow JSON API
Portability (originally)
https://s.apache.org/state-of-beam-sfo-2018
25
⋮
input | Sum.PerKey()
Python
stats.Sum(s, input)
Go
SELECT key, SUM(value)
FROM input GROUP BY key
SQL (via Java)
⋮
input.apply(
Sum.integersPerKey())
Java Apache Spark
Apache Flink
Apache Apex
Gearpump
Cloud Dataflow
Apache Samza
Apache Nemo
(incubating)
IBM Streams
Sum Per Key
Java objects
Sum Per Key
Portable protos
Portability (current)
https://s.apache.org/state-of-beam-sfo-2018
Beam Flink Runner
26
27
Portable Flink Runner
SDK
(Python)
Job Service
Artifact
Staging
Job Manager
Fn Services
(Beam Flink Task)
Task Manager
Executor / Fn API
Provision Control Data
Artifact
Retrieval
State Logging
gRPC
Pipeline (protobuf)
ClusterRunner
Dependencies
(optional)
python -m
apache_beam.examples.wordcount 
--input=/etc/profile 
--output=/tmp/py-wordcount-direct 
--runner=PortableRunner 
--job_endpoint=localhost:8099 
--streaming
Staging Location
(DFS, S3, …)
SDK Worker
(UDFs)
SDK Worker
(UDFs)
SDK Worker
(Python)
Flink Job
28
Lyft Flink Runner Customizations
● Translator extension for streaming sources
○ Kinesis, Kafka consumers that we also use in Java Flink jobs
○ Message decoding, watermarks
● Python execution environment for SDK workers
○ Tailored to internal deployment tooling
○ Docker-free, frozen virtual envs
● https://github.com/lyft/beam/tree/release-2.11.0-lyft
Robert Bradshaw, Beam Summit London, 2018
Workers have Fn
Runner
Runner worker launches services.
Control
Service
Data
Service
State
Service
Logging
Service
30
Fn API
How slow is this ?
● Fn API Overhead 15% ?
● Fused stages
● Bundle size
● Parallel SDK workers
● TODO: Cython
● protobuf C++ bindings
decode, …, window count
(messages
| 'reshuffle' >> beam.Reshuffle()
| 'decode' >> beam.Map(lambda x: (__import__('random').randint(0, 511), 1))
| 'noop1' >> beam.Map(lambda x : x)
| 'noop2' >> beam.Map(lambda x : x)
| 'noop3' >> beam.Map(lambda x : x)
| 'window' >> beam.WindowInto(window.GlobalWindows(),
trigger=Repeatedly(AfterProcessingTime(5 * 1000)),
accumulation_mode= AccumulationMode.DISCARDING)
| 'group' >> beam.GroupByKey()
| 'count' >> beam.Map(count)
)
31
Fast enough for real Python work !
● c5.4xlarge machines (16 vCPU, 32 GB)
● 16 SDK workers / machine
● 1000 ms or 1000 records / bundle
● 280,000 transforms / second / machine (~ 17,500 per worker)
● Python user code will be gating factor
32
Beam Portability Recap
● Pipelines written in non-JVM languages on JVM runners
○ Python, Go on Flink (and others)
● Full isolation of user code
○ Native CPython execution w/o library restrictions
● Flexible SDK worker execution
○ Docker, Process, Embedded, ...
● Multiple languages in a single pipeline (WIP)
○ Use Java Beam IO with Python
○ Use TFX with Java
○ <your use case here>
33
Feature Support Matrix (Beam 2.11.0)
https://s.apache.org/apache-beam-portability-support-table
Lessons Learned
34
Lessons Learned
• Python Beam SDK and portable Flink runner evolving
• Keep pipeline simple - Flink tasks / shuffles are not free
• Stateful processing is essential for complex logic
• Model execution latency matters
• Instrument everything for monitoring
• Think about pipeline restart and upgrade
• Mind your dependencies - rate limit API calls
• Long running Python processes may expose memory leaks
35
We’re Hiring! Apply at www.lyft.com/careers
or email data-recruiting@lyft.com
Data Engineering
Engineering Manager
San Francisco
Software Engineer
San Francisco, Seattle, &
New York City
Data Infrastructure
Engineering Manager
San Francisco
Software Engineer
San Francisco & Seattle
Experimentation
Software Engineer
San Francisco
Streaming
Software Engineer
San Francisco
Observability
Software Engineer
San Francisco
Please ask questions!
This presentation:
http://go.lyft.com/streaming-prices-ffsf-2019

More Related Content

PDF
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
PDF
Streaming your Lyft Ride Prices - Flink Forward SF 2019
PDF
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
PDF
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
PDF
Flink Forward San Francisco 2019: Scaling a real-time streaming warehouse wit...
PPTX
A stream: Ad-hoc Shared Stream Processing - Jeyhun Karimov, DFKI GmbH
PDF
Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...
PDF
Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Streaming your Lyft Ride Prices - Flink Forward SF 2019
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
Flink Forward San Francisco 2019: Scaling a real-time streaming warehouse wit...
A stream: Ad-hoc Shared Stream Processing - Jeyhun Karimov, DFKI GmbH
Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...
Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...

What's hot (20)

PDF
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
PDF
Flink Forward San Francisco 2019: Elastic Data Processing with Apache Flink a...
PPTX
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
PPTX
KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...
PPTX
Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...
PDF
Flink Forward San Francisco 2019: Developing and operating real-time applicat...
PDF
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink - Jonathan ...
PPTX
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
PDF
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...
PDF
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
PDF
Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey...
PDF
Kubernetes + Operator + PaaSTA = Flink @ Yelp - Antonio Verardi, Yelp
PDF
Flink Forward San Francisco 2018 keynote: Anand Iyer - "Apache Flink + Apach...
PDF
Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...
PPTX
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Ver...
PPTX
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
PDF
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
PDF
Flink Forward San Francisco 2019: Real-time Processing with Flink for Machine...
PDF
Realizing the promise of portability with Apache Beam
PPTX
Do Flink on Web with FLOW
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2019: Elastic Data Processing with Apache Flink a...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...
Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...
Flink Forward San Francisco 2019: Developing and operating real-time applicat...
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink - Jonathan ...
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey...
Kubernetes + Operator + PaaSTA = Flink @ Yelp - Antonio Verardi, Yelp
Flink Forward San Francisco 2018 keynote: Anand Iyer - "Apache Flink + Apach...
Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Ver...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward San Francisco 2019: Real-time Processing with Flink for Machine...
Realizing the promise of portability with Apache Beam
Do Flink on Web with FLOW
Ad

Similar to Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas Weise & Akshay Balwally (20)

PDF
The magic behind your Lyft ride prices: A case study on machine learning and ...
PDF
Flink Forward Berlin 2018: Thomas Weise & Aljoscha Krettek - "Python Streamin...
PPTX
Python Streaming Pipelines with Beam on Flink
PDF
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
PPTX
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
PDF
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...
PDF
Near real-time anomaly detection at Lyft
PDF
Maintaining spatial data infrastructures (SDIs) using distributed task queues
PDF
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
PDF
Service Mesh with Envoy and Istio
PDF
P4_tutorial.pdf
PDF
Introduction to Apache Beam
PDF
apidays LIVE Helsinki - Implementing OpenAPI and GraphQL Services with gRPC b...
PDF
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
PDF
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
PDF
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
PPTX
Apache Flink@ Strata & Hadoop World London
PPTX
Onnc intro
PPTX
Apache Beam (incubating)
PDF
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
The magic behind your Lyft ride prices: A case study on machine learning and ...
Flink Forward Berlin 2018: Thomas Weise & Aljoscha Krettek - "Python Streamin...
Python Streaming Pipelines with Beam on Flink
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...
Near real-time anomaly detection at Lyft
Maintaining spatial data infrastructures (SDIs) using distributed task queues
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Service Mesh with Envoy and Istio
P4_tutorial.pdf
Introduction to Apache Beam
apidays LIVE Helsinki - Implementing OpenAPI and GraphQL Services with gRPC b...
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
Apache Flink@ Strata & Hadoop World London
Onnc intro
Apache Beam (incubating)
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Ad

More from Flink Forward (20)

PDF
Building a fully managed stream processing platform on Flink at scale for Lin...
PPTX
Evening out the uneven: dealing with skew in Flink
PPTX
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
PDF
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
PDF
Introducing the Apache Flink Kubernetes Operator
PPTX
Autoscaling Flink with Reactive Mode
PDF
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
PPTX
One sink to rule them all: Introducing the new Async Sink
PPTX
Tuning Apache Kafka Connectors for Flink.pptx
PDF
Flink powered stream processing platform at Pinterest
PPTX
Apache Flink in the Cloud-Native Era
PPTX
Where is my bottleneck? Performance troubleshooting in Flink
PPTX
Using the New Apache Flink Kubernetes Operator in a Production Deployment
PPTX
The Current State of Table API in 2022
PDF
Flink SQL on Pulsar made easy
PPTX
Dynamic Rule-based Real-time Market Data Alerts
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
PPTX
Processing Semantically-Ordered Streams in Financial Services
PDF
Tame the small files problem and optimize data layout for streaming ingestion...
PDF
Batch Processing at Scale with Flink & Iceberg
Building a fully managed stream processing platform on Flink at scale for Lin...
Evening out the uneven: dealing with skew in Flink
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing the Apache Flink Kubernetes Operator
Autoscaling Flink with Reactive Mode
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
One sink to rule them all: Introducing the new Async Sink
Tuning Apache Kafka Connectors for Flink.pptx
Flink powered stream processing platform at Pinterest
Apache Flink in the Cloud-Native Era
Where is my bottleneck? Performance troubleshooting in Flink
Using the New Apache Flink Kubernetes Operator in a Production Deployment
The Current State of Table API in 2022
Flink SQL on Pulsar made easy
Dynamic Rule-based Real-time Market Data Alerts
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Processing Semantically-Ordered Streams in Financial Services
Tame the small files problem and optimize data layout for streaming ingestion...
Batch Processing at Scale with Flink & Iceberg

Recently uploaded (20)

PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PDF
Empathic Computing: Creating Shared Understanding
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPT
Teaching material agriculture food technology
PPTX
Spectroscopy.pptx food analysis technology
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Sensors and Actuators in IoT Systems using pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Advanced Soft Computing BINUS July 2025.pdf
PPTX
Big Data Technologies - Introduction.pptx
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Electronic commerce courselecture one. Pdf
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PPTX
Cloud computing and distributed systems.
Understanding_Digital_Forensics_Presentation.pptx
Chapter 3 Spatial Domain Image Processing.pdf
GamePlan Trading System Review: Professional Trader's Honest Take
Empathic Computing: Creating Shared Understanding
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Teaching material agriculture food technology
Spectroscopy.pptx food analysis technology
Reach Out and Touch Someone: Haptics and Empathic Computing
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Sensors and Actuators in IoT Systems using pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Advanced Soft Computing BINUS July 2025.pdf
Big Data Technologies - Introduction.pptx
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Per capita expenditure prediction using model stacking based on satellite ima...
Electronic commerce courselecture one. Pdf
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
Cloud computing and distributed systems.

Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas Weise & Akshay Balwally

  • 1. Streaming your Lyft Ride Prices Flink Forward, San Francisco, April 2nd 2019 Akshay Balwally | Engineer, Pricing Thomas Weise | @thweise | Engineer, Streaming Platform go.lyft.com/streaming-prices-ffsf-2019
  • 2. Agenda 2 ● Introduction to dynamic pricing ● Legacy pricing infrastructure ● Streaming based infrastructure ● Beam & multiple languages ● Beam Flink runner ● Lessons learned
  • 3. 3 Dynamic Pricing Supply/Demand curve ETA Pricing Notifications Detect Delays Coupons User Delight Fraud Behaviour Fingerprinting Monetary Impact Imperative to act fast Top Destinations Core Experience
  • 5. 5 ● Dynamic Pricing- price changes minutely at each location bucket ● Why? ○ At face value, dynamic pricing is strange ○ But Lyft’s marketplace changes quickly Dynamic Pricing- What and Why?
  • 6. The Marketplace affects Prices ● An Imbalanced Market is Inefficient ○ Too many available drivers: bad ○ Too few available drivers: bad ○ Solution: Price lever controls passenger request rate, which maintains healthy supply levels ● Result: increase price if demand >> supply 6
  • 7. What is PrimeTime? ● Belief: There exists some set of optimal price multipliers per location/time bucket ● PrimeTime- Lyft product that sets a multiplier for each gh6 each minute ● Example: In ‘9q8yyv’, at 5:01pm PST, PrimeTime = 2.0 ● Scale: Roughly 3 million geohashes prices every minute 7
  • 8. Why is PrimeTime Hard? ● 1. Need low-latency information about supply and demand ● 2. Pricing is an unsupervised problem- correct answer is never observed ● Solution: break the problem into multiple models that form a DAG, where intermediate models are solving supervised problems ○ Example: f (available_supply) -> pickup_times 8
  • 10. Legacy architecture: A series of cron jobs ● Ingest high volume of client app events (Kinesis, KCL) ● Compute features (e.g. demand, conversation rate, supply) from events ● Run ML models on features to compute primetime for all regions (per min, per gh6) SFO, calendar_min_1: {gh6: 1.0, gh6: 2.0, ...} NYC: calendar_min_1: {gh6, 2.0, gh6: 1.0, ...} 10
  • 11. Problems 1. Latency 2. Code complexity (LOC) 3. Hard to add new features involving windowing/join (i.e. arbitrary demand windows, subregional computation) 4. No dynamic / smart triggers 11
  • 12. Can we use Flink? 12
  • 13. 13 Streaming Stack 13 Streaming Application (SQL, Java) Stream / Schema Registry Deployment Tooling Metrics & Dashboards Alerts Logging Amazon EC2 Amazon S3 Wavefront Salt (Config / Orca) Docker Source Sink
  • 14. 14 Streaming and Python ● Flink and many other big data ecosystem projects are Java / JVM based ○ Team wants to adopt streaming, but doesn’t have the Java skills ○ Jython != Python ● Use cases for different language environments ○ Python primary option for Machine Learning ● Cost of many API styles and runtime environments
  • 17. 17 Pipeline (conceptual outline) kinesis events (source) aggregate and window filter events run models to generate features (culminating in PT) internal services redis ride_requested, app_open, ... unique_users_per_min, unique_requests_per_5_ min, ... conversion learner, eta learner, ... Lyft apps (phones) valid sessions, dedupe, ...
  • 18. Details of implementation 1. Filtering (with internal service calls) 2. Aggregation with Beam windowing: 1min, 5min (by event time) 3. Triggers: watermark, data-driven (stateful processing) 4. Join multiple streams: CoGroup or stateful processing 5. Machine learning models invoked within Beam transforms 6. Final gh6:pt output from pipeline stored to Redis 18
  • 19. Gains • Latency: 3 minutes -> 30s ‒ Latency now dominated by model execution • Reuse of model code • 10K => 4K LOC • 300 => 120 AWS instances 19
  • 21. 21 The Beam Vision 1. End users: who want to write pipelines in a language that’s familiar. 2. SDK writers: who want to make Beam concepts available in new languages. Includes IOs: connectors to data stores. 3. Runner writers: who have a distributed processing environment and want to support Beam pipelines Beam Model: Fn Runners Apache Flink Apache Spark Beam Model: Pipeline Construction Other LanguagesBeam Java Beam Python Execution Execution Cloud Dataflow Execution https://s.apache.org/apache-beam-project-overview
  • 22. 22 Multi-Language Support ● Initially Java SDK and Java Runners ● 2016: Start of cross-language support effort ● 2017: Python SDK on Dataflow ● 2018: Go SDK (for portable runners) ● 2018: Python on Flink MVP ● Next: Cross-language pipelines, more portable runners
  • 23. 23 Python Example p = beam.Pipeline(runner=runner, options=pipeline_options) (p | ReadFromText("/path/to/text*") | Map(lambda line: ...) | WindowInto(FixedWindows(120) trigger=AfterWatermark( early=AfterProcessingTime(60), late=AfterCount(1)) accumulation_mode=ACCUMULATING) | CombinePerKey(sum)) | WriteToText("/path/to/outputs") ) result = p.run() ( What, Where, When, How )
  • 24. 24 ⋮ input | Sum.PerKey() Python input.apply( Sum.integersPerKey()) Java SELECT key, SUM(value) FROM input GROUP BY key SQL (via Java) ⋮ Cloud Dataflow Apache Spark Apache Flink Apache Apex Gearpump Apache Samza Apache Nemo (incubating) IBM Streams Sum Per Key Java objects Sum Per Key Dataflow JSON API Portability (originally) https://s.apache.org/state-of-beam-sfo-2018
  • 25. 25 ⋮ input | Sum.PerKey() Python stats.Sum(s, input) Go SELECT key, SUM(value) FROM input GROUP BY key SQL (via Java) ⋮ input.apply( Sum.integersPerKey()) Java Apache Spark Apache Flink Apache Apex Gearpump Cloud Dataflow Apache Samza Apache Nemo (incubating) IBM Streams Sum Per Key Java objects Sum Per Key Portable protos Portability (current) https://s.apache.org/state-of-beam-sfo-2018
  • 27. 27 Portable Flink Runner SDK (Python) Job Service Artifact Staging Job Manager Fn Services (Beam Flink Task) Task Manager Executor / Fn API Provision Control Data Artifact Retrieval State Logging gRPC Pipeline (protobuf) ClusterRunner Dependencies (optional) python -m apache_beam.examples.wordcount --input=/etc/profile --output=/tmp/py-wordcount-direct --runner=PortableRunner --job_endpoint=localhost:8099 --streaming Staging Location (DFS, S3, …) SDK Worker (UDFs) SDK Worker (UDFs) SDK Worker (Python) Flink Job
  • 28. 28 Lyft Flink Runner Customizations ● Translator extension for streaming sources ○ Kinesis, Kafka consumers that we also use in Java Flink jobs ○ Message decoding, watermarks ● Python execution environment for SDK workers ○ Tailored to internal deployment tooling ○ Docker-free, frozen virtual envs ● https://github.com/lyft/beam/tree/release-2.11.0-lyft
  • 29. Robert Bradshaw, Beam Summit London, 2018 Workers have Fn Runner Runner worker launches services. Control Service Data Service State Service Logging Service
  • 30. 30 Fn API How slow is this ? ● Fn API Overhead 15% ? ● Fused stages ● Bundle size ● Parallel SDK workers ● TODO: Cython ● protobuf C++ bindings decode, …, window count (messages | 'reshuffle' >> beam.Reshuffle() | 'decode' >> beam.Map(lambda x: (__import__('random').randint(0, 511), 1)) | 'noop1' >> beam.Map(lambda x : x) | 'noop2' >> beam.Map(lambda x : x) | 'noop3' >> beam.Map(lambda x : x) | 'window' >> beam.WindowInto(window.GlobalWindows(), trigger=Repeatedly(AfterProcessingTime(5 * 1000)), accumulation_mode= AccumulationMode.DISCARDING) | 'group' >> beam.GroupByKey() | 'count' >> beam.Map(count) )
  • 31. 31 Fast enough for real Python work ! ● c5.4xlarge machines (16 vCPU, 32 GB) ● 16 SDK workers / machine ● 1000 ms or 1000 records / bundle ● 280,000 transforms / second / machine (~ 17,500 per worker) ● Python user code will be gating factor
  • 32. 32 Beam Portability Recap ● Pipelines written in non-JVM languages on JVM runners ○ Python, Go on Flink (and others) ● Full isolation of user code ○ Native CPython execution w/o library restrictions ● Flexible SDK worker execution ○ Docker, Process, Embedded, ... ● Multiple languages in a single pipeline (WIP) ○ Use Java Beam IO with Python ○ Use TFX with Java ○ <your use case here>
  • 33. 33 Feature Support Matrix (Beam 2.11.0) https://s.apache.org/apache-beam-portability-support-table
  • 35. Lessons Learned • Python Beam SDK and portable Flink runner evolving • Keep pipeline simple - Flink tasks / shuffles are not free • Stateful processing is essential for complex logic • Model execution latency matters • Instrument everything for monitoring • Think about pipeline restart and upgrade • Mind your dependencies - rate limit API calls • Long running Python processes may expose memory leaks 35
  • 36. We’re Hiring! Apply at www.lyft.com/careers or email [email protected] Data Engineering Engineering Manager San Francisco Software Engineer San Francisco, Seattle, & New York City Data Infrastructure Engineering Manager San Francisco Software Engineer San Francisco & Seattle Experimentation Software Engineer San Francisco Streaming Software Engineer San Francisco Observability Software Engineer San Francisco
  • 37. Please ask questions! This presentation: http://go.lyft.com/streaming-prices-ffsf-2019