SlideShare a Scribd company logo
Apache Beam in the Google Cloud
Lessons learned from building and operating a serverless
streaming runtime
Reuven Lax, Google (@reuvenlax)
Sergei Sokolenko, Google (@datancoffee)
Common steps in Stream AnalyticsLessons we learned
Watermarks
Adaptive Scaling: Flow Control
Adaptive Scaling: Autoscaling
Separating compute state from storage
Common steps in Stream AnalyticsHistory Lesson
2012 20132002 2004 2006 2008 2010
Flume Millwheel
2015
DataflowMillwheelBespoke Streaming
Common steps in Stream AnalyticsLesson Learned: Watermarks
A pipeline stage S with a watermark value of t means that all future data that will be seen by S will
have a timestamp later than t. In other words, all data older than t has already been processed.
Key use case: process windows once the watermark passes the end of the window, since we
expect all data for that window to have arrived already
Common steps in Stream AnalyticsWhat Triggers Output?
Traditional batch: query triggers output Streaming: When to trigger?
● Standing Query
● Unbounded Data
Query Data
Output
Query Data?
Common steps in Stream AnalyticsUse Case: Anomaly Detection pipelines
Early Millwheel user was an anomaly detection
pipeline
Built cubic-spline model for each key
Once a spline was calculated, it could not be
modified. No late data, trigger only when ready!
Bob
Sara
Common steps in Stream AnalyticsFirst attempt: leading-edge watermark
Latest timestamp - δ
Graph shows that skew peaked at 10
minutes.
Set δ = 10 minutes to minimize data
drops.
Common steps in Stream AnalyticsFirst attempt: leading-edge watermark
Too fast
Too often a lot of data was behind this watermark
Ended up with many gaps in output
Impacted quality of results
Too slow
Subtracting fixed delta puts lower bound on latency
Subtracted 10 minutes because the system is
sometimes delayed by 10 minutes. However most
of the time the delay was under 1 minute!
Common steps in Stream AnalyticsSecond attempt: dynamic leading edge watermark
Leading edge watermark
Dynamic statistical models to compute how far the lookback should be
Common steps in Stream AnalyticsSecond attempt: dynamic leading edge watermark
Still many gaps in output data
Input is too noisy
Many delays are unpredictable (e.g. a machine restarting)
Models take time to adapt, in which time you are dropping data
Common steps in Stream AnalyticsTrailing edge watermark
Tracking the minimum event time instead generally solved the problem.
Common steps in Stream AnalyticsWatermark: Definition
Given a node N of a computation graph G, Let In
be the sequence of input elements processed with
the order provided by an oracle. t: In
-> R is a real-valued function on In
called the timestamp
function. A watermark function is a real-valued function W defined on prefixes of In
satisfying the
following:
{Wn
} = {W({I1
, …, In
})} is eventually increasing.
{Wn
} is monotonic.
W is said to be a temporal watermark, if it also satisfies Wn
< t(Im
) for m >= n
Common steps in Stream AnalyticsLoad Variance
Streaming pipelines must keep up with input.
Load varies throughout the day, throughout the week and spikes can happen at any time.
Common steps in Stream AnalyticsLoad Variance: Hand Tuning
Every pipeline is different, and hand tuning is hard
Eventually tuning parameters go stale
Hand-tuned flags become cargo cult science
Must tune for worst case
● Tuning for the peak is wasteful
● If pipeline ever falls behind, must be able to catch up faster than real time.
○ An exactly-once streaming system is a batch system whenever it falls behind.
Common steps in Stream AnalyticsTechniques: Batching
Always process data in batches
Batch sizes are dynamic: small when caught up, large when while catching up.
Lesson: be careful of putting arbitrary limits on batches.
● Don’t limit by event time ranges - event-time density changes.
● Don’t limit by windows - window policies change.
Batching limits will be especially painful while catching up a backlog.
Common steps in Stream AnalyticsTechniques: Flow Control
A good adaptive backpressure system is critical
● Prevents workers from overloading and crashing
● Adaptive backpressure adapts to changing load.
● Reduces need to perfectly tune cluster.
Common steps in Stream AnalyticsTechniques: Flow Control
Soft resources: CPU.
Hard resources: Memory.
Signals:
● Queue length
● Memory usage
Eventually flow control will pause
pulling from sources.
A
B C
Worker 1
A
B C
Worker 3
A
B C
Worker 2
Flow
controlled
Common steps in Stream AnalyticsTechniques: Flow Control
What happens if all streams are flow
controlled?
Deadlock!!!!!
● Every worker is holding
onto memory for pending
deliveries.
● Every worker is flow
controlling its input
streams.
A
B C
Worker 1
A
B C
Worker 3
A
B C
Worker 2
Flow
controlled
Flow
controlled
Flow
controlled
Common steps in Stream AnalyticsTechniques: Flow Control
To avoid deadlock, workers must be
able to release memory
This might involve canceling in-flight
work to be retried later
Dataflow streaming workers can spill
pending deliveries to disk to release
memory. Scanned back in later.
A
B C
Worker 1
A
B C
Worker 3
A
B C
Worker 2
Flow
controlled
Flow
controlled
Flow
controlled
Common steps in Stream AnalyticsTechniques: Auto Scaling
Adapative autoscaling allows elastic
scaling with load.
Work must by dynamically load
balanced to take advantage of
autoscaling.
Common steps in Stream AnalyticsTechniques: Auto Scaling
Never assume fixed workers.
Work ownership can be moved at any time.
All keys are hash sharded, and hash ranges
distributed among workers.
Separate storage from compute
Adds a lot of complexity to exactly once and
consistency protocols!
A
B C
worker23
A
B C
Worker
A
B C
worker32
[0, 3)
[a, f)
[3, a)
[0, 3): 23
[3, a): 32
[a, f): 32
RPCs addressed to
keys, not workers
Common steps in Stream AnalyticsLoad Variance: Lesson
Dynamic control is key
No amount of static configuration works
Eventually the universe will outsmart your configuration
Separating compute from
state storage to improve
scalability
Sergei Sokolenko, Google (@datancoffee)
Common steps in Stream Analytics
End-user
apps
Cloud Composer
IoT
Events
Cloud Pub/Sub Dataflow Streaming
DBs
Cloud AI
Platform
Bigtable Dataflow Batch
Action
Streaming processing options in GCP
BigQuery
BigQuery Streaming API
Machine Learning
Data Warehousing
Motivating Example:
Spotify migrating the largest European Hadoop cluster to Dataflow
● Run 80,000+ Dataflow jobs / month
● 90% batch, 10% streaming
Use Dataflow for “everything”
● Music recommendations, Ads targeting
● AB testing, Behavioral analysis, Business metrics
Huge batch jobs:
● 26000 CPUs, 166 TB RAM
● Processing 325 billion rows in 240TB from Bigtable
Traditional Distributed Data Processing Architecture
User code
VM
User code
VM
User code
VM
User code
VM
State storage
● Jobs executed on
clusters of VMs
● Job state stored on
network-attached
volumes
● Control plane
orchestrates data
plane
Network
Control plane
VM
State storage State storage State storage
Traditional Architecture works well ...
Filter
Filter
Join
Group
Filter
Filter
fs://
Databasefs://
Database
… except for Joins and
Group By’s
Shuffling key-value pairs
● Starting with <K,V> pairs
placed on different workers
● Goal: co-locate all pairs
with the same Key
<key1, record>
<key5, record>
<key3, record>
<key8, record>
<key4, record>
...
<key5, record>
<key5, record>
<key2, record>
<key3, record>
<key8, record>
...
<key3, record>
<key3, record>
<key8, record>
<key3, record>
<key6, record>
...
<key2, record>
<key1, record>
<key5, record>
<key8, record>
<key4, record>
...
● Starting with <K,V> pairs
placed on different workers
● Goal: co-locate all pairs
with the same Key
● Workers exchange <K,V>
<key1, record>
<key5, record>
<key3, record>
<key8, record>
<key4, record>
...
<key5, record>
<key5, record>
<key2, record>
<key3, record>
<key8, record>
...
<key3, record>
<key3, record>
<key8, record>
<key3, record>
<key6, record>
...
<key2, record>
<key1, record>
<key5, record>
<key8, record>
<key4, record>
...
Shuffling key-value pairs
Shuffling key-value pairs
<key1, record>
<key1, record>
<key2, record>
<key2, record>
<key2, record>
...
<key3, record>
<key3, record>
<key3, record>
<key3, record>
<key3, record>
<key4, record>
...
<key5, record>
<key5, record>
<key5, record>
<key5, record>
<key6, record>
...
<key7, record>
<key8, record>
<key8, record>
<key8, record>
...
key1, key 2 key3, key4 key5, key6 key7, key8
● Starting with <K,V> pairs
placed on different workers
● Goal: co-locate all pairs
with the same Key
● Workers exchange <K,V>
● Until everything is sorted
Traditional Architecture Requires Manual Tuning
User code
VM
User code
VM
User code
VM
User code
VM
State storage
● When data volumes
exceed dozens of TBs
Network
Control plane
VM
State storage State storage State storage
Distributed in-memory Shuffle in batch Cloud Dataflow
Compute
Petabit
network
Dataflow Shuffle
Region
Zone ‘a’ Zone ‘b’
Zone ‘c’Distributed
in-memory
file system
Distributed
on-disk
file system
Shuffle
proxy
Autozone placement
Pipeline user code Shuffling Operations
No tuning required
Dataflow Shuffle is usually
faster than worker-based
shuffle, including those using
SSD-PD.
Better autoscaling keeps
aggregate resource usage
same, but cuts processing
time.
Faster Processing
Runtime of shuffle
Runtime
(mins)
Shuffle 300TB+
Dataflow shuffle has been
used to shuffle 300TB+
datasets.
Supporting larger datasets
Dataset size of shuffle
Dataset
size (TB)
Storing state
What about streaming pipelines?
Streaming shuffle
Just like in batch, need to group and join
streams
Distributed streaming shuffle
Window data elements
Time window data aggregations need
to be buffered
Until triggering conditions occur
Goal: Grouping by Event Time into Time Windows
9:00 14:0013:0012:0011:0010:00Event
time
9:00 14:0013:0012:0011:0010:00Processing
time
Input
Output
Even more state to store on disks in streaming
User code
VM
User code
VM
User code
VM
User code
VM
Shuffle data elements
● Key ranges are assigned
to workers
● Data elements of these
keys is stored on
Persistent Disks
State storage State storage State storage State storage
key 0000 ...
… key 1234
key 1235 ...
… key ABC2
key ABC3 ...
… key DEF5
key DEF6 ...
… key GHI2
Time window data
● Also assigned to workers
● When time windows
close, data processed on
workers
Dataflow Streaming Engine
Benefits
● Better supportability
● Less worker resources
● More efficient autoscaling
User code
Streaming engine
Worker
User code
Worker
User code
Worker
User code
Worker
Window state storage Streaming shuffle
Autoscaling: Even better with separate Compute and State Storage
User code
Streaming engine
Worker
User code
Worker
Window state storage Streaming shuffle
Dataflow with Streaming Engine
User code
VM
User code
VM
State storage State storage
key 0000 ...
… key 1234
key 1235 ...
… key ABC2
Dataflow without Streaming Engine
Dataflow with Streaming Engine Dataflow without Streaming Engine
● Personalization and experimentation platform
● Wanted things to work out-of-the-box
Significant data volumes:
● 25 million user sessions per day
● 2B events per day
Dataflow usage profile:
● Streaming Engine for worryless autoscaling
● Batch processing with FlexRS for cost savings
AB Tasty is using Dataflow Streaming Engine
Main Takeaways
Trailing edge watermarks provided a solution for triggering aggregations
The system must be elastic and adaptive
Separating compute from state storage help make stream and batch processing scalable
Thank You!

More Related Content

What's hot (20)

PDF
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Flink Forward
 
PDF
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Flink Forward
 
PPTX
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward
 
PDF
Matthias J. Sax – A Tale of Squirrels and Storms
Flink Forward
 
PDF
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache Zeppelin
Flink Forward
 
PPTX
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
confluent
 
PPTX
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Robert Metzger
 
PDF
Flink Forward SF 2017: Cliff Resnick & Seth Wiesman - From Zero to Streami...
Flink Forward
 
PDF
Tech Talk @ Google on Flink Fault Tolerance and HA
Paris Carbone
 
PDF
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
Flink Forward
 
PDF
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward
 
PDF
Flink Connector Development Tips & Tricks
Eron Wright
 
PDF
Christian Kreuzfeld – Static vs Dynamic Stream Processing
Flink Forward
 
PDF
Flink forward SF 2017: Elizabeth K. Joseph and Ravi Yadav - Flink meet DC/OS ...
Flink Forward
 
PDF
Flink Forward San Francisco 2018: Steven Wu - "Scaling Flink in Cloud"
Flink Forward
 
PDF
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...
Flink Forward
 
PPTX
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
Flink Forward
 
PPTX
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward
 
PDF
Stateful Distributed Stream Processing
Gyula Fóra
 
PDF
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Vasia Kalavri
 
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Flink Forward
 
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Flink Forward
 
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward
 
Matthias J. Sax – A Tale of Squirrels and Storms
Flink Forward
 
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache Zeppelin
Flink Forward
 
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
confluent
 
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Robert Metzger
 
Flink Forward SF 2017: Cliff Resnick & Seth Wiesman - From Zero to Streami...
Flink Forward
 
Tech Talk @ Google on Flink Fault Tolerance and HA
Paris Carbone
 
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
Flink Forward
 
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward
 
Flink Connector Development Tips & Tricks
Eron Wright
 
Christian Kreuzfeld – Static vs Dynamic Stream Processing
Flink Forward
 
Flink forward SF 2017: Elizabeth K. Joseph and Ravi Yadav - Flink meet DC/OS ...
Flink Forward
 
Flink Forward San Francisco 2018: Steven Wu - "Scaling Flink in Cloud"
Flink Forward
 
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...
Flink Forward
 
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
Flink Forward
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward
 
Stateful Distributed Stream Processing
Gyula Fóra
 
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Vasia Kalavri
 

Similar to Keynote: Building and Operating A Serverless Streaming Runtime for Apache Beam in The Google Cloud - Sergei Sokolenko & Reuven lax, Google (20)

PPTX
Shikha fdp 62_14july2017
Dr. Shikha Mehta
 
PDF
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Data Con LA
 
PDF
Introduction to Stream Processing
Guido Schmutz
 
PDF
Introduction to Streaming Analytics
Guido Schmutz
 
PDF
Building Big Data Streaming Architectures
David Martínez Rego
 
PDF
Introduction to Stream Processing
Guido Schmutz
 
PDF
A primer on building real time data-driven products
Lars Albertsson
 
PPT
Moving Towards a Streaming Architecture
Gabriele Modena
 
PPTX
Have your cake and eat it too, further dispelling the myths of the lambda arc...
Dimos Raptis
 
PDF
Introduction to Stream Processing
Guido Schmutz
 
PDF
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...
HostedbyConfluent
 
PDF
Sergei Sokolenko "Advances in Stream Analytics: Apache Beam and Google Cloud ...
Fwdays
 
PDF
Uint-4 Mining Data Stream.pdf
Sitamarhi Institute of Technology
 
PDF
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
confluent
 
PDF
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
HostedbyConfluent
 
PDF
Streaming analytics
Gerard McNamee
 
PDF
Introduction to Streaming Analytics
Guido Schmutz
 
PDF
Have Your Cake and Eat It Too -- Further Dispelling the Myths of the Lambda A...
C4Media
 
PDF
Introduction to Data Stream Processing
Safe Software
 
PDF
Data Stream Processing - Concepts and Frameworks
Matthias Niehoff
 
Shikha fdp 62_14july2017
Dr. Shikha Mehta
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Data Con LA
 
Introduction to Stream Processing
Guido Schmutz
 
Introduction to Streaming Analytics
Guido Schmutz
 
Building Big Data Streaming Architectures
David Martínez Rego
 
Introduction to Stream Processing
Guido Schmutz
 
A primer on building real time data-driven products
Lars Albertsson
 
Moving Towards a Streaming Architecture
Gabriele Modena
 
Have your cake and eat it too, further dispelling the myths of the lambda arc...
Dimos Raptis
 
Introduction to Stream Processing
Guido Schmutz
 
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...
HostedbyConfluent
 
Sergei Sokolenko "Advances in Stream Analytics: Apache Beam and Google Cloud ...
Fwdays
 
Uint-4 Mining Data Stream.pdf
Sitamarhi Institute of Technology
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
confluent
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
HostedbyConfluent
 
Streaming analytics
Gerard McNamee
 
Introduction to Streaming Analytics
Guido Schmutz
 
Have Your Cake and Eat It Too -- Further Dispelling the Myths of the Lambda A...
C4Media
 
Introduction to Data Stream Processing
Safe Software
 
Data Stream Processing - Concepts and Frameworks
Matthias Niehoff
 
Ad

More from Flink Forward (20)

PDF
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
PPTX
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
PPTX
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Flink Forward
 
PDF
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 
PDF
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
PPTX
Autoscaling Flink with Reactive Mode
Flink Forward
 
PDF
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
PPTX
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
PPTX
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
PDF
Flink powered stream processing platform at Pinterest
Flink Forward
 
PPTX
Apache Flink in the Cloud-Native Era
Flink Forward
 
PPTX
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
PPTX
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
PPTX
The Current State of Table API in 2022
Flink Forward
 
PDF
Flink SQL on Pulsar made easy
Flink Forward
 
PPTX
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
PPTX
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
 
PDF
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
PDF
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
Autoscaling Flink with Reactive Mode
Flink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
Flink powered stream processing platform at Pinterest
Flink Forward
 
Apache Flink in the Cloud-Native Era
Flink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
The Current State of Table API in 2022
Flink Forward
 
Flink SQL on Pulsar made easy
Flink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
Ad

Recently uploaded (20)

PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 

Keynote: Building and Operating A Serverless Streaming Runtime for Apache Beam in The Google Cloud - Sergei Sokolenko & Reuven lax, Google

  • 1. Apache Beam in the Google Cloud Lessons learned from building and operating a serverless streaming runtime Reuven Lax, Google (@reuvenlax) Sergei Sokolenko, Google (@datancoffee)
  • 2. Common steps in Stream AnalyticsLessons we learned Watermarks Adaptive Scaling: Flow Control Adaptive Scaling: Autoscaling Separating compute state from storage
  • 3. Common steps in Stream AnalyticsHistory Lesson 2012 20132002 2004 2006 2008 2010 Flume Millwheel 2015 DataflowMillwheelBespoke Streaming
  • 4. Common steps in Stream AnalyticsLesson Learned: Watermarks A pipeline stage S with a watermark value of t means that all future data that will be seen by S will have a timestamp later than t. In other words, all data older than t has already been processed. Key use case: process windows once the watermark passes the end of the window, since we expect all data for that window to have arrived already
  • 5. Common steps in Stream AnalyticsWhat Triggers Output? Traditional batch: query triggers output Streaming: When to trigger? ● Standing Query ● Unbounded Data Query Data Output Query Data?
  • 6. Common steps in Stream AnalyticsUse Case: Anomaly Detection pipelines Early Millwheel user was an anomaly detection pipeline Built cubic-spline model for each key Once a spline was calculated, it could not be modified. No late data, trigger only when ready! Bob Sara
  • 7. Common steps in Stream AnalyticsFirst attempt: leading-edge watermark Latest timestamp - δ Graph shows that skew peaked at 10 minutes. Set δ = 10 minutes to minimize data drops.
  • 8. Common steps in Stream AnalyticsFirst attempt: leading-edge watermark Too fast Too often a lot of data was behind this watermark Ended up with many gaps in output Impacted quality of results Too slow Subtracting fixed delta puts lower bound on latency Subtracted 10 minutes because the system is sometimes delayed by 10 minutes. However most of the time the delay was under 1 minute!
  • 9. Common steps in Stream AnalyticsSecond attempt: dynamic leading edge watermark Leading edge watermark Dynamic statistical models to compute how far the lookback should be
  • 10. Common steps in Stream AnalyticsSecond attempt: dynamic leading edge watermark Still many gaps in output data Input is too noisy Many delays are unpredictable (e.g. a machine restarting) Models take time to adapt, in which time you are dropping data
  • 11. Common steps in Stream AnalyticsTrailing edge watermark Tracking the minimum event time instead generally solved the problem.
  • 12. Common steps in Stream AnalyticsWatermark: Definition Given a node N of a computation graph G, Let In be the sequence of input elements processed with the order provided by an oracle. t: In -> R is a real-valued function on In called the timestamp function. A watermark function is a real-valued function W defined on prefixes of In satisfying the following: {Wn } = {W({I1 , …, In })} is eventually increasing. {Wn } is monotonic. W is said to be a temporal watermark, if it also satisfies Wn < t(Im ) for m >= n
  • 13. Common steps in Stream AnalyticsLoad Variance Streaming pipelines must keep up with input. Load varies throughout the day, throughout the week and spikes can happen at any time.
  • 14. Common steps in Stream AnalyticsLoad Variance: Hand Tuning Every pipeline is different, and hand tuning is hard Eventually tuning parameters go stale Hand-tuned flags become cargo cult science Must tune for worst case ● Tuning for the peak is wasteful ● If pipeline ever falls behind, must be able to catch up faster than real time. ○ An exactly-once streaming system is a batch system whenever it falls behind.
  • 15. Common steps in Stream AnalyticsTechniques: Batching Always process data in batches Batch sizes are dynamic: small when caught up, large when while catching up. Lesson: be careful of putting arbitrary limits on batches. ● Don’t limit by event time ranges - event-time density changes. ● Don’t limit by windows - window policies change. Batching limits will be especially painful while catching up a backlog.
  • 16. Common steps in Stream AnalyticsTechniques: Flow Control A good adaptive backpressure system is critical ● Prevents workers from overloading and crashing ● Adaptive backpressure adapts to changing load. ● Reduces need to perfectly tune cluster.
  • 17. Common steps in Stream AnalyticsTechniques: Flow Control Soft resources: CPU. Hard resources: Memory. Signals: ● Queue length ● Memory usage Eventually flow control will pause pulling from sources. A B C Worker 1 A B C Worker 3 A B C Worker 2 Flow controlled
  • 18. Common steps in Stream AnalyticsTechniques: Flow Control What happens if all streams are flow controlled? Deadlock!!!!! ● Every worker is holding onto memory for pending deliveries. ● Every worker is flow controlling its input streams. A B C Worker 1 A B C Worker 3 A B C Worker 2 Flow controlled Flow controlled Flow controlled
  • 19. Common steps in Stream AnalyticsTechniques: Flow Control To avoid deadlock, workers must be able to release memory This might involve canceling in-flight work to be retried later Dataflow streaming workers can spill pending deliveries to disk to release memory. Scanned back in later. A B C Worker 1 A B C Worker 3 A B C Worker 2 Flow controlled Flow controlled Flow controlled
  • 20. Common steps in Stream AnalyticsTechniques: Auto Scaling Adapative autoscaling allows elastic scaling with load. Work must by dynamically load balanced to take advantage of autoscaling.
  • 21. Common steps in Stream AnalyticsTechniques: Auto Scaling Never assume fixed workers. Work ownership can be moved at any time. All keys are hash sharded, and hash ranges distributed among workers. Separate storage from compute Adds a lot of complexity to exactly once and consistency protocols! A B C worker23 A B C Worker A B C worker32 [0, 3) [a, f) [3, a) [0, 3): 23 [3, a): 32 [a, f): 32 RPCs addressed to keys, not workers
  • 22. Common steps in Stream AnalyticsLoad Variance: Lesson Dynamic control is key No amount of static configuration works Eventually the universe will outsmart your configuration
  • 23. Separating compute from state storage to improve scalability Sergei Sokolenko, Google (@datancoffee)
  • 24. Common steps in Stream Analytics End-user apps Cloud Composer IoT Events Cloud Pub/Sub Dataflow Streaming DBs Cloud AI Platform Bigtable Dataflow Batch Action Streaming processing options in GCP BigQuery BigQuery Streaming API Machine Learning Data Warehousing
  • 25. Motivating Example: Spotify migrating the largest European Hadoop cluster to Dataflow ● Run 80,000+ Dataflow jobs / month ● 90% batch, 10% streaming Use Dataflow for “everything” ● Music recommendations, Ads targeting ● AB testing, Behavioral analysis, Business metrics Huge batch jobs: ● 26000 CPUs, 166 TB RAM ● Processing 325 billion rows in 240TB from Bigtable
  • 26. Traditional Distributed Data Processing Architecture User code VM User code VM User code VM User code VM State storage ● Jobs executed on clusters of VMs ● Job state stored on network-attached volumes ● Control plane orchestrates data plane Network Control plane VM State storage State storage State storage
  • 27. Traditional Architecture works well ... Filter Filter Join Group Filter Filter fs:// Databasefs:// Database … except for Joins and Group By’s
  • 28. Shuffling key-value pairs ● Starting with <K,V> pairs placed on different workers ● Goal: co-locate all pairs with the same Key <key1, record> <key5, record> <key3, record> <key8, record> <key4, record> ... <key5, record> <key5, record> <key2, record> <key3, record> <key8, record> ... <key3, record> <key3, record> <key8, record> <key3, record> <key6, record> ... <key2, record> <key1, record> <key5, record> <key8, record> <key4, record> ...
  • 29. ● Starting with <K,V> pairs placed on different workers ● Goal: co-locate all pairs with the same Key ● Workers exchange <K,V> <key1, record> <key5, record> <key3, record> <key8, record> <key4, record> ... <key5, record> <key5, record> <key2, record> <key3, record> <key8, record> ... <key3, record> <key3, record> <key8, record> <key3, record> <key6, record> ... <key2, record> <key1, record> <key5, record> <key8, record> <key4, record> ... Shuffling key-value pairs
  • 30. Shuffling key-value pairs <key1, record> <key1, record> <key2, record> <key2, record> <key2, record> ... <key3, record> <key3, record> <key3, record> <key3, record> <key3, record> <key4, record> ... <key5, record> <key5, record> <key5, record> <key5, record> <key6, record> ... <key7, record> <key8, record> <key8, record> <key8, record> ... key1, key 2 key3, key4 key5, key6 key7, key8 ● Starting with <K,V> pairs placed on different workers ● Goal: co-locate all pairs with the same Key ● Workers exchange <K,V> ● Until everything is sorted
  • 31. Traditional Architecture Requires Manual Tuning User code VM User code VM User code VM User code VM State storage ● When data volumes exceed dozens of TBs Network Control plane VM State storage State storage State storage
  • 32. Distributed in-memory Shuffle in batch Cloud Dataflow Compute Petabit network Dataflow Shuffle Region Zone ‘a’ Zone ‘b’ Zone ‘c’Distributed in-memory file system Distributed on-disk file system Shuffle proxy Autozone placement Pipeline user code Shuffling Operations
  • 33. No tuning required Dataflow Shuffle is usually faster than worker-based shuffle, including those using SSD-PD. Better autoscaling keeps aggregate resource usage same, but cuts processing time. Faster Processing Runtime of shuffle Runtime (mins)
  • 34. Shuffle 300TB+ Dataflow shuffle has been used to shuffle 300TB+ datasets. Supporting larger datasets Dataset size of shuffle Dataset size (TB)
  • 35. Storing state What about streaming pipelines? Streaming shuffle Just like in batch, need to group and join streams Distributed streaming shuffle Window data elements Time window data aggregations need to be buffered Until triggering conditions occur
  • 36. Goal: Grouping by Event Time into Time Windows 9:00 14:0013:0012:0011:0010:00Event time 9:00 14:0013:0012:0011:0010:00Processing time Input Output
  • 37. Even more state to store on disks in streaming User code VM User code VM User code VM User code VM Shuffle data elements ● Key ranges are assigned to workers ● Data elements of these keys is stored on Persistent Disks State storage State storage State storage State storage key 0000 ... … key 1234 key 1235 ... … key ABC2 key ABC3 ... … key DEF5 key DEF6 ... … key GHI2 Time window data ● Also assigned to workers ● When time windows close, data processed on workers
  • 38. Dataflow Streaming Engine Benefits ● Better supportability ● Less worker resources ● More efficient autoscaling User code Streaming engine Worker User code Worker User code Worker User code Worker Window state storage Streaming shuffle
  • 39. Autoscaling: Even better with separate Compute and State Storage User code Streaming engine Worker User code Worker Window state storage Streaming shuffle Dataflow with Streaming Engine User code VM User code VM State storage State storage key 0000 ... … key 1234 key 1235 ... … key ABC2 Dataflow without Streaming Engine
  • 40. Dataflow with Streaming Engine Dataflow without Streaming Engine
  • 41. ● Personalization and experimentation platform ● Wanted things to work out-of-the-box Significant data volumes: ● 25 million user sessions per day ● 2B events per day Dataflow usage profile: ● Streaming Engine for worryless autoscaling ● Batch processing with FlexRS for cost savings AB Tasty is using Dataflow Streaming Engine
  • 42. Main Takeaways Trailing edge watermarks provided a solution for triggering aggregations The system must be elastic and adaptive Separating compute from state storage help make stream and batch processing scalable