SlideShare a Scribd company logo
Future-proof, portable batch
and streaming pipelines
using Apache Beam
Malo Deniélou
Senior Software Engineer at Google
malo@google.com
Big Data Apps Meetup, May 2017
Apache Beam: Open Source data processing APIs
● Expresses data-parallel batch and streaming
algorithms using one unified API
● Cleanly separates data processing logic
from runtime requirements
● Supports execution on multiple distributed
processing runtime environments
Why use Apache Beam?
1. Unified: The full spectrum from batch to streaming
2. Extensible: Transform libraries, IO Connectors, and DSLs -- oh my!
3. Portable: Write once, run anywhere
4. Demo: Beam, Dataflow, Spark, Kafka, Flink, ...
5. Getting Started: Beaming into the Future
A free-to-play gaming analytics use case
● How many players made it to stage 12?
● What path did they take through the stage?
● Team points and other stats at this point in time?
● Of the players who took the same route where a certain
condition was true, how many made an in-app purchase?
● What are the characteristics of the player segment who
didn’t make the purchase vs. those who did?
● Why was this custom event so successful in driving in-app
purchases compared to others?
You need Key indicators specific to your game in order to increase adoption, engagement, etc.
The Solution
Collect real-time
game events and
player data
Combine data in
meaningful ways
Apply realtime
and historical
batch analysis
=
Impact engagement,
retention, and spend.
How this would look on Google Cloud
Lots of alternatives …
No
API!
… but I need to rewrite all my pipelines!
Here comes Beam: your portable data processing API
No technological or environmental lock-in!
Portable batch and streaming pipelines with Apache Beam (Big Data Applications Meetup May 2017)
What is being computed?
Extensible PTransforms let you
build pipelines modularly.
What is being computed?
Where in
event time?
Leaderboard
Streaming
example
Demo!
Beam, Dataflow, Spark, Kafka, Flink, ...
● Beam’s Java SDK runs on multiple
runtime environments, including:
○ Apache Apex
○ Apache Spark
○ Apache Flink
○ Google Cloud Dataflow
○ [in development] Apache Gearpump
● Cross-language infrastructure is in
progress.
○ Beam’s Python SDK currently runs
on Google Cloud Dataflow
● First stable version at the end of May!
Beam Vision: as of May 2017
Beam Model: Fn Runners
Apache
Spark
Cloud
Dataflow
Beam Model: Pipeline Construction
Apache
Flink
Java
Java
Python
Python
Apache
Apex
Apache
Gearpump
How do you build an abstraction layer?
Apache
Spark
Cloud
Dataflow
Apache
Flink
????????
????????
Beam: the intersection of runner functionality?
Beam: the union of runner functionality?
Beam: the future!
Getting Started with Apache Beam
Beaming into the Future
The Beam Model
The Dataflow Model paper from VLDB 2015
http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
Streaming 101 and 102: The World Beyond Batch
https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102
Beam
Apache Beam: http://beam.apache.org
Quickstarts
● Java SDK
● Python SDK
Example walkthroughs
● Word Count
● Mobile Gaming
Detailed documentation
Thanks!
Additional slides
Unified
The full spectrum from batch to streaming
Less code, better code
Processing time vs. event time
The Beam Model: asking the right questions
What results are calculated?
Where in event time are results calculated?
When in processing time are results materialized?
How do refinements of results relate?
PCollection<KV<String, Integer>> scores = input
.apply(Sum.integersPerKey());
The Beam Model: What is being computed?
The Beam Model: What is being computed?
PCollection<KV<String, Integer>> scores = input
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(2)))
.apply(Sum.integersPerKey());
The Beam Model: Where in event time?
The Beam Model: Where in event time?
PCollection<KV<String, Integer>> scores = input
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(2))
.triggering(AtWatermark()))
.apply(Sum.integersPerKey());
The Beam Model: When in processing time?
The Beam Model: When in processing time?
PCollection<KV<String, Integer>> scores = input
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(2))
.triggering(AtWatermark()
.withEarlyFirings(
AtPeriod(Duration.standardMinutes(1)))
.withLateFirings(AtCount(1)))
.accumulatingFiredPanes())
.apply(Sum.integersPerKey());
The Beam Model: How do refinements relate?
The Beam Model: How do refinements relate?
Customizing What Where When How
3
Streaming
4
Streaming
+ Accumulation
1
Classic
Batch
2
Windowed
Batch
Extensible
Transform libraries, IO connectors, and DSLs -- Oh my!
PTransforms
● PTransforms let you build pipelines modularly.
● All transformations are equal, so it’s easy to add new
libraries.
● Runtime can use this structure to communicate with the user.
More details on the Beam Blog: Where’s My PCollection.map()?
Source API
● Source API allows users to
teach the system about new
bounded and unbounded
data formats.
More details on the Beam Blog: Dynamic Work Rebalancing for Beam
● Even the most careful
hand-tuning will fail as data,
code, and environments shift.
● Beam’s Source API is
designed to provide runtime
hooks for efficient scaling.
Processed & Committed
Processed and Uncommitted
Unprocessed
Language-specific SDKS and DSLs
Language B
SDK
Language A
SDK
Language C
SDK
The Beam Model
DSL X DSL Z
● Multiple language-specific
SDKs that implement the full
Beam model.
● Optional domain specific
languages that narrow or
transform the abstractions
to align with simple use
cases or specific user
communities.
Portable
Write once, run anywhere
Beam Vision: mix and match SDKs and runtimes
● The Beam Model: the abstractions
at the core of Apache Beam
Language B
SDK
Language A
SDK
Language C
SDK
Runner 1 Runner 3Runner 2
● Choice of API: Users write their
pipelines in a language that’s
familiar and integrated with their
other tooling
● Choice of Runtime: Users choose
the right runner for their current
needs -- on-prem / cloud, open
source / not, fully managed / not
● Scalability for Developers: Clean
APIs allow developers to contribute
modules independently
The Beam Model
Language A Language CLanguage B
The Beam Model
Example Beam Runners
Apache Spark
● Open-source
cluster-computing
framework
● Large ecosystem of
APIs and tools
● Runs on premise or in
the cloud
Apache Flink
● Open-source
distributed data
processing engine
● High-throughput and
low-latency stream
processing
● Runs on premise or in
the cloud
Google Cloud Dataflow
● Fully-managed service
for batch and stream
data processing
● Provides dynamic
auto-scaling,
monitoring tools, and
tight integration with
Google Cloud
Platform
Demo screenshots
because if I make them, I won’t need to use them
Portable batch and streaming pipelines with Apache Beam (Big Data Applications Meetup May 2017)
Portable batch and streaming pipelines with Apache Beam (Big Data Applications Meetup May 2017)
Portable batch and streaming pipelines with Apache Beam (Big Data Applications Meetup May 2017)
Portable batch and streaming pipelines with Apache Beam (Big Data Applications Meetup May 2017)
Portable batch and streaming pipelines with Apache Beam (Big Data Applications Meetup May 2017)
Portable batch and streaming pipelines with Apache Beam (Big Data Applications Meetup May 2017)
Portable batch and streaming pipelines with Apache Beam (Big Data Applications Meetup May 2017)
Portable batch and streaming pipelines with Apache Beam (Big Data Applications Meetup May 2017)
Portable batch and streaming pipelines with Apache Beam (Big Data Applications Meetup May 2017)
Portable batch and streaming pipelines with Apache Beam (Big Data Applications Meetup May 2017)
Portable batch and streaming pipelines with Apache Beam (Big Data Applications Meetup May 2017)
Portable batch and streaming pipelines with Apache Beam (Big Data Applications Meetup May 2017)
Portable batch and streaming pipelines with Apache Beam (Big Data Applications Meetup May 2017)
Portable batch and streaming pipelines with Apache Beam (Big Data Applications Meetup May 2017)

More Related Content

What's hot (20)

Alexander Kolb – Flink. Yet another Streaming Framework?
Alexander Kolb – Flink. Yet another Streaming Framework?Alexander Kolb – Flink. Yet another Streaming Framework?
Alexander Kolb – Flink. Yet another Streaming Framework?
Flink Forward
 
Flink Forward SF 2017: Dean Wampler - Streaming Deep Learning Scenarios with...
Flink Forward SF 2017: Dean Wampler -  Streaming Deep Learning Scenarios with...Flink Forward SF 2017: Dean Wampler -  Streaming Deep Learning Scenarios with...
Flink Forward SF 2017: Dean Wampler - Streaming Deep Learning Scenarios with...
Flink Forward
 
Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019
Thomas Weise
 
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
confluent
 
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward
 
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
Chris Fregly
 
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
confluent
 
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamMalo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Flink Forward
 
Flink Forward San Francisco 2018: Jörg Schad and Biswajit Das - "Operating Fl...
Flink Forward San Francisco 2018: Jörg Schad and Biswajit Das - "Operating Fl...Flink Forward San Francisco 2018: Jörg Schad and Biswajit Das - "Operating Fl...
Flink Forward San Francisco 2018: Jörg Schad and Biswajit Das - "Operating Fl...
Flink Forward
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - final
Sub Szabolcs Feczak
 
Google cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache FlinkGoogle cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache Flink
Iván Fernández Perea
 
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Databricks
 
Apache Spark vs Apache Flink
Apache Spark vs Apache FlinkApache Spark vs Apache Flink
Apache Spark vs Apache Flink
AKASH SIHAG
 
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Till Rohrmann
 
KFServing and Kubeflow Pipelines
KFServing and Kubeflow PipelinesKFServing and Kubeflow Pipelines
KFServing and Kubeflow Pipelines
Animesh Singh
 
Seattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp APISeattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp API
shareddatamsft
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Flink Forward
 
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
confluent
 
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Flink Forward
 
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward
 
Alexander Kolb – Flink. Yet another Streaming Framework?
Alexander Kolb – Flink. Yet another Streaming Framework?Alexander Kolb – Flink. Yet another Streaming Framework?
Alexander Kolb – Flink. Yet another Streaming Framework?
Flink Forward
 
Flink Forward SF 2017: Dean Wampler - Streaming Deep Learning Scenarios with...
Flink Forward SF 2017: Dean Wampler -  Streaming Deep Learning Scenarios with...Flink Forward SF 2017: Dean Wampler -  Streaming Deep Learning Scenarios with...
Flink Forward SF 2017: Dean Wampler - Streaming Deep Learning Scenarios with...
Flink Forward
 
Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019
Thomas Weise
 
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
confluent
 
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward
 
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
Chris Fregly
 
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
confluent
 
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamMalo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Flink Forward
 
Flink Forward San Francisco 2018: Jörg Schad and Biswajit Das - "Operating Fl...
Flink Forward San Francisco 2018: Jörg Schad and Biswajit Das - "Operating Fl...Flink Forward San Francisco 2018: Jörg Schad and Biswajit Das - "Operating Fl...
Flink Forward San Francisco 2018: Jörg Schad and Biswajit Das - "Operating Fl...
Flink Forward
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - final
Sub Szabolcs Feczak
 
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Databricks
 
Apache Spark vs Apache Flink
Apache Spark vs Apache FlinkApache Spark vs Apache Flink
Apache Spark vs Apache Flink
AKASH SIHAG
 
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Till Rohrmann
 
KFServing and Kubeflow Pipelines
KFServing and Kubeflow PipelinesKFServing and Kubeflow Pipelines
KFServing and Kubeflow Pipelines
Animesh Singh
 
Seattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp APISeattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp API
shareddatamsft
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Flink Forward
 
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
confluent
 
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Flink Forward
 
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward
 

Similar to Portable batch and streaming pipelines with Apache Beam (Big Data Applications Meetup May 2017) (20)

Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...
DataWorks Summit
 
Portable Streaming Pipelines with Apache Beam
Portable Streaming Pipelines with Apache BeamPortable Streaming Pipelines with Apache Beam
Portable Streaming Pipelines with Apache Beam
confluent
 
Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...
DataWorks Summit
 
Realizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamRealizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache Beam
DataWorks Summit
 
Unified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamUnified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache Beam
DataWorks Summit/Hadoop Summit
 
Introduction to Apache Beam
Introduction to Apache BeamIntroduction to Apache Beam
Introduction to Apache Beam
Knoldus Inc.
 
Realizing the Promise of Portable Data Processing with Apache Beam
Realizing the Promise of Portable Data Processing with Apache BeamRealizing the Promise of Portable Data Processing with Apache Beam
Realizing the Promise of Portable Data Processing with Apache Beam
DataWorks Summit
 
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
Wes McKinney
 
APIdays Paris 2014 - The State of Web API Languages
APIdays Paris 2014 - The State of Web API LanguagesAPIdays Paris 2014 - The State of Web API Languages
APIdays Paris 2014 - The State of Web API Languages
Restlet
 
ApacheBeam_Google_Theater_TalendConnect2017.pdf
ApacheBeam_Google_Theater_TalendConnect2017.pdfApacheBeam_Google_Theater_TalendConnect2017.pdf
ApacheBeam_Google_Theater_TalendConnect2017.pdf
RAJA RAY
 
ApacheBeam_Google_Theater_TalendConnect2017.pptx
ApacheBeam_Google_Theater_TalendConnect2017.pptxApacheBeam_Google_Theater_TalendConnect2017.pptx
ApacheBeam_Google_Theater_TalendConnect2017.pptx
RAJA RAY
 
.NET per la Data Science e oltre
.NET per la Data Science e oltre.NET per la Data Science e oltre
.NET per la Data Science e oltre
Marco Parenzan
 
Flink Forward Berlin 2018: Robert Bradshaw & Maximilian Michels - "Universal ...
Flink Forward Berlin 2018: Robert Bradshaw & Maximilian Michels - "Universal ...Flink Forward Berlin 2018: Robert Bradshaw & Maximilian Michels - "Universal ...
Flink Forward Berlin 2018: Robert Bradshaw & Maximilian Michels - "Universal ...
Flink Forward
 
Hyperion EPM APIs - Added value from HFM, Workspace, FDM, Smartview, and Shar...
Hyperion EPM APIs - Added value from HFM, Workspace, FDM, Smartview, and Shar...Hyperion EPM APIs - Added value from HFM, Workspace, FDM, Smartview, and Shar...
Hyperion EPM APIs - Added value from HFM, Workspace, FDM, Smartview, and Shar...
Charles Beyer
 
Delivering Developer Tools at Scale
Delivering Developer Tools at ScaleDelivering Developer Tools at Scale
Delivering Developer Tools at Scale
Oracle Developers
 
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
Alluxio, Inc.
 
Spring Roo Flex Add-on
Spring Roo Flex Add-onSpring Roo Flex Add-on
Spring Roo Flex Add-on
Bill Ott
 
Build Great Networked APIs with Swift, OpenAPI, and gRPC
Build Great Networked APIs with Swift, OpenAPI, and gRPCBuild Great Networked APIs with Swift, OpenAPI, and gRPC
Build Great Networked APIs with Swift, OpenAPI, and gRPC
Tim Burks
 
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBaseHBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon
 
APIdays 2015 - The State of Web API Languages
APIdays 2015 - The State of Web API LanguagesAPIdays 2015 - The State of Web API Languages
APIdays 2015 - The State of Web API Languages
Jerome Louvel
 
Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...
DataWorks Summit
 
Portable Streaming Pipelines with Apache Beam
Portable Streaming Pipelines with Apache BeamPortable Streaming Pipelines with Apache Beam
Portable Streaming Pipelines with Apache Beam
confluent
 
Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...
DataWorks Summit
 
Realizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamRealizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache Beam
DataWorks Summit
 
Unified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamUnified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache Beam
DataWorks Summit/Hadoop Summit
 
Introduction to Apache Beam
Introduction to Apache BeamIntroduction to Apache Beam
Introduction to Apache Beam
Knoldus Inc.
 
Realizing the Promise of Portable Data Processing with Apache Beam
Realizing the Promise of Portable Data Processing with Apache BeamRealizing the Promise of Portable Data Processing with Apache Beam
Realizing the Promise of Portable Data Processing with Apache Beam
DataWorks Summit
 
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
Wes McKinney
 
APIdays Paris 2014 - The State of Web API Languages
APIdays Paris 2014 - The State of Web API LanguagesAPIdays Paris 2014 - The State of Web API Languages
APIdays Paris 2014 - The State of Web API Languages
Restlet
 
ApacheBeam_Google_Theater_TalendConnect2017.pdf
ApacheBeam_Google_Theater_TalendConnect2017.pdfApacheBeam_Google_Theater_TalendConnect2017.pdf
ApacheBeam_Google_Theater_TalendConnect2017.pdf
RAJA RAY
 
ApacheBeam_Google_Theater_TalendConnect2017.pptx
ApacheBeam_Google_Theater_TalendConnect2017.pptxApacheBeam_Google_Theater_TalendConnect2017.pptx
ApacheBeam_Google_Theater_TalendConnect2017.pptx
RAJA RAY
 
.NET per la Data Science e oltre
.NET per la Data Science e oltre.NET per la Data Science e oltre
.NET per la Data Science e oltre
Marco Parenzan
 
Flink Forward Berlin 2018: Robert Bradshaw & Maximilian Michels - "Universal ...
Flink Forward Berlin 2018: Robert Bradshaw & Maximilian Michels - "Universal ...Flink Forward Berlin 2018: Robert Bradshaw & Maximilian Michels - "Universal ...
Flink Forward Berlin 2018: Robert Bradshaw & Maximilian Michels - "Universal ...
Flink Forward
 
Hyperion EPM APIs - Added value from HFM, Workspace, FDM, Smartview, and Shar...
Hyperion EPM APIs - Added value from HFM, Workspace, FDM, Smartview, and Shar...Hyperion EPM APIs - Added value from HFM, Workspace, FDM, Smartview, and Shar...
Hyperion EPM APIs - Added value from HFM, Workspace, FDM, Smartview, and Shar...
Charles Beyer
 
Delivering Developer Tools at Scale
Delivering Developer Tools at ScaleDelivering Developer Tools at Scale
Delivering Developer Tools at Scale
Oracle Developers
 
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
Alluxio, Inc.
 
Spring Roo Flex Add-on
Spring Roo Flex Add-onSpring Roo Flex Add-on
Spring Roo Flex Add-on
Bill Ott
 
Build Great Networked APIs with Swift, OpenAPI, and gRPC
Build Great Networked APIs with Swift, OpenAPI, and gRPCBuild Great Networked APIs with Swift, OpenAPI, and gRPC
Build Great Networked APIs with Swift, OpenAPI, and gRPC
Tim Burks
 
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBaseHBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon
 
APIdays 2015 - The State of Web API Languages
APIdays 2015 - The State of Web API LanguagesAPIdays 2015 - The State of Web API Languages
APIdays 2015 - The State of Web API Languages
Jerome Louvel
 

Recently uploaded (20)

Chapter 2 protozoa and their phylum to get
Chapter 2 protozoa and their phylum to getChapter 2 protozoa and their phylum to get
Chapter 2 protozoa and their phylum to get
hamzagobena8
 
apidays New York 2025 - Agentic AI Future by Seena Ganesh (Staples)
apidays New York 2025 - Agentic AI Future by Seena Ganesh (Staples)apidays New York 2025 - Agentic AI Future by Seena Ganesh (Staples)
apidays New York 2025 - Agentic AI Future by Seena Ganesh (Staples)
apidays
 
Splunk itsi infrastructure components implementation and integration
Splunk itsi infrastructure components implementation and integrationSplunk itsi infrastructure components implementation and integration
Splunk itsi infrastructure components implementation and integration
willmorekanan
 
this is the Dr. ibrahim presentations ppt.pptx
this is the Dr. ibrahim presentations ppt.pptxthis is the Dr. ibrahim presentations ppt.pptx
this is the Dr. ibrahim presentations ppt.pptx
ibrahimabdi22
 
RRTgvghfjfguyfgyuguyguyfgyfyfytffff.pptx
RRTgvghfjfguyfgyuguyguyfgyfyfytffff.pptxRRTgvghfjfguyfgyuguyguyfgyfyfytffff.pptx
RRTgvghfjfguyfgyuguyguyfgyfyfytffff.pptx
ravindersaini1616
 
time_series_forecasting_constructor_uni.pptx
time_series_forecasting_constructor_uni.pptxtime_series_forecasting_constructor_uni.pptx
time_series_forecasting_constructor_uni.pptx
stefanopinto1113
 
Embracing AI in Project Management: Final Insights & Future Vision
Embracing AI in Project Management: Final Insights & Future VisionEmbracing AI in Project Management: Final Insights & Future Vision
Embracing AI in Project Management: Final Insights & Future Vision
KavehMomeni1
 
Splunk_ITSI_Interview_Prep_Deck.pptx interview
Splunk_ITSI_Interview_Prep_Deck.pptx interviewSplunk_ITSI_Interview_Prep_Deck.pptx interview
Splunk_ITSI_Interview_Prep_Deck.pptx interview
willmorekanan
 
apidays New York 2025 - From UX to AX by Karin Hendrikse (Netlify)
apidays New York 2025 - From UX to AX by Karin Hendrikse (Netlify)apidays New York 2025 - From UX to AX by Karin Hendrikse (Netlify)
apidays New York 2025 - From UX to AX by Karin Hendrikse (Netlify)
apidays
 
Understanding LLM Temperature: A comprehensive Guide
Understanding LLM Temperature: A comprehensive GuideUnderstanding LLM Temperature: A comprehensive Guide
Understanding LLM Temperature: A comprehensive Guide
Tamanna36
 
Lec 12.pdfghhjjhhjkkkkkkkkkkkjfcvhiiugcvvh
Lec 12.pdfghhjjhhjkkkkkkkkkkkjfcvhiiugcvvhLec 12.pdfghhjjhhjkkkkkkkkkkkjfcvhiiugcvvh
Lec 12.pdfghhjjhhjkkkkkkkkkkkjfcvhiiugcvvh
saifalroby72
 
The fundamental concept of nature of knowledge
The fundamental concept of nature of knowledgeThe fundamental concept of nature of knowledge
The fundamental concept of nature of knowledge
tarrebulehora
 
apidays New York 2025 - Build for ALL of Your Users by Anthony Lusardi (liblab)
apidays New York 2025 - Build for ALL of Your Users by Anthony Lusardi (liblab)apidays New York 2025 - Build for ALL of Your Users by Anthony Lusardi (liblab)
apidays New York 2025 - Build for ALL of Your Users by Anthony Lusardi (liblab)
apidays
 
Cyber Security Presentation(Neon)xu.pptx
Cyber Security Presentation(Neon)xu.pptxCyber Security Presentation(Neon)xu.pptx
Cyber Security Presentation(Neon)xu.pptx
vilakshbhargava
 
artificial intelligence (1).pptx hgggfcgfch
artificial intelligence (1).pptx hgggfcgfchartificial intelligence (1).pptx hgggfcgfch
artificial intelligence (1).pptx hgggfcgfch
DevAnshGupta609215
 
PM003_SERENE-CM-PM-Training Material-EAM Maintenance Notification.pptx
PM003_SERENE-CM-PM-Training Material-EAM Maintenance Notification.pptxPM003_SERENE-CM-PM-Training Material-EAM Maintenance Notification.pptx
PM003_SERENE-CM-PM-Training Material-EAM Maintenance Notification.pptx
afriyanrtanjung007
 
Veterinary Anatomy, The Regional Gross Anatomy of Domestic Animals (VetBooks....
Veterinary Anatomy, The Regional Gross Anatomy of Domestic Animals (VetBooks....Veterinary Anatomy, The Regional Gross Anatomy of Domestic Animals (VetBooks....
Veterinary Anatomy, The Regional Gross Anatomy of Domestic Animals (VetBooks....
JazmnAltamirano1
 
apidays New York 2025 - How AI is Transforming Product Management by Shereen ...
apidays New York 2025 - How AI is Transforming Product Management by Shereen ...apidays New York 2025 - How AI is Transforming Product Management by Shereen ...
apidays New York 2025 - How AI is Transforming Product Management by Shereen ...
apidays
 
Role_Based_Permissions_Kick-off_Deck_202203.pptx
Role_Based_Permissions_Kick-off_Deck_202203.pptxRole_Based_Permissions_Kick-off_Deck_202203.pptx
Role_Based_Permissions_Kick-off_Deck_202203.pptx
SystemsBenya
 
Magician PeterMagician PeterMagician PeterMagician Peter
Magician PeterMagician PeterMagician PeterMagician PeterMagician PeterMagician PeterMagician PeterMagician Peter
Magician PeterMagician PeterMagician PeterMagician Peter
seomarket363
 
Chapter 2 protozoa and their phylum to get
Chapter 2 protozoa and their phylum to getChapter 2 protozoa and their phylum to get
Chapter 2 protozoa and their phylum to get
hamzagobena8
 
apidays New York 2025 - Agentic AI Future by Seena Ganesh (Staples)
apidays New York 2025 - Agentic AI Future by Seena Ganesh (Staples)apidays New York 2025 - Agentic AI Future by Seena Ganesh (Staples)
apidays New York 2025 - Agentic AI Future by Seena Ganesh (Staples)
apidays
 
Splunk itsi infrastructure components implementation and integration
Splunk itsi infrastructure components implementation and integrationSplunk itsi infrastructure components implementation and integration
Splunk itsi infrastructure components implementation and integration
willmorekanan
 
this is the Dr. ibrahim presentations ppt.pptx
this is the Dr. ibrahim presentations ppt.pptxthis is the Dr. ibrahim presentations ppt.pptx
this is the Dr. ibrahim presentations ppt.pptx
ibrahimabdi22
 
RRTgvghfjfguyfgyuguyguyfgyfyfytffff.pptx
RRTgvghfjfguyfgyuguyguyfgyfyfytffff.pptxRRTgvghfjfguyfgyuguyguyfgyfyfytffff.pptx
RRTgvghfjfguyfgyuguyguyfgyfyfytffff.pptx
ravindersaini1616
 
time_series_forecasting_constructor_uni.pptx
time_series_forecasting_constructor_uni.pptxtime_series_forecasting_constructor_uni.pptx
time_series_forecasting_constructor_uni.pptx
stefanopinto1113
 
Embracing AI in Project Management: Final Insights & Future Vision
Embracing AI in Project Management: Final Insights & Future VisionEmbracing AI in Project Management: Final Insights & Future Vision
Embracing AI in Project Management: Final Insights & Future Vision
KavehMomeni1
 
Splunk_ITSI_Interview_Prep_Deck.pptx interview
Splunk_ITSI_Interview_Prep_Deck.pptx interviewSplunk_ITSI_Interview_Prep_Deck.pptx interview
Splunk_ITSI_Interview_Prep_Deck.pptx interview
willmorekanan
 
apidays New York 2025 - From UX to AX by Karin Hendrikse (Netlify)
apidays New York 2025 - From UX to AX by Karin Hendrikse (Netlify)apidays New York 2025 - From UX to AX by Karin Hendrikse (Netlify)
apidays New York 2025 - From UX to AX by Karin Hendrikse (Netlify)
apidays
 
Understanding LLM Temperature: A comprehensive Guide
Understanding LLM Temperature: A comprehensive GuideUnderstanding LLM Temperature: A comprehensive Guide
Understanding LLM Temperature: A comprehensive Guide
Tamanna36
 
Lec 12.pdfghhjjhhjkkkkkkkkkkkjfcvhiiugcvvh
Lec 12.pdfghhjjhhjkkkkkkkkkkkjfcvhiiugcvvhLec 12.pdfghhjjhhjkkkkkkkkkkkjfcvhiiugcvvh
Lec 12.pdfghhjjhhjkkkkkkkkkkkjfcvhiiugcvvh
saifalroby72
 
The fundamental concept of nature of knowledge
The fundamental concept of nature of knowledgeThe fundamental concept of nature of knowledge
The fundamental concept of nature of knowledge
tarrebulehora
 
apidays New York 2025 - Build for ALL of Your Users by Anthony Lusardi (liblab)
apidays New York 2025 - Build for ALL of Your Users by Anthony Lusardi (liblab)apidays New York 2025 - Build for ALL of Your Users by Anthony Lusardi (liblab)
apidays New York 2025 - Build for ALL of Your Users by Anthony Lusardi (liblab)
apidays
 
Cyber Security Presentation(Neon)xu.pptx
Cyber Security Presentation(Neon)xu.pptxCyber Security Presentation(Neon)xu.pptx
Cyber Security Presentation(Neon)xu.pptx
vilakshbhargava
 
artificial intelligence (1).pptx hgggfcgfch
artificial intelligence (1).pptx hgggfcgfchartificial intelligence (1).pptx hgggfcgfch
artificial intelligence (1).pptx hgggfcgfch
DevAnshGupta609215
 
PM003_SERENE-CM-PM-Training Material-EAM Maintenance Notification.pptx
PM003_SERENE-CM-PM-Training Material-EAM Maintenance Notification.pptxPM003_SERENE-CM-PM-Training Material-EAM Maintenance Notification.pptx
PM003_SERENE-CM-PM-Training Material-EAM Maintenance Notification.pptx
afriyanrtanjung007
 
Veterinary Anatomy, The Regional Gross Anatomy of Domestic Animals (VetBooks....
Veterinary Anatomy, The Regional Gross Anatomy of Domestic Animals (VetBooks....Veterinary Anatomy, The Regional Gross Anatomy of Domestic Animals (VetBooks....
Veterinary Anatomy, The Regional Gross Anatomy of Domestic Animals (VetBooks....
JazmnAltamirano1
 
apidays New York 2025 - How AI is Transforming Product Management by Shereen ...
apidays New York 2025 - How AI is Transforming Product Management by Shereen ...apidays New York 2025 - How AI is Transforming Product Management by Shereen ...
apidays New York 2025 - How AI is Transforming Product Management by Shereen ...
apidays
 
Role_Based_Permissions_Kick-off_Deck_202203.pptx
Role_Based_Permissions_Kick-off_Deck_202203.pptxRole_Based_Permissions_Kick-off_Deck_202203.pptx
Role_Based_Permissions_Kick-off_Deck_202203.pptx
SystemsBenya
 
Magician PeterMagician PeterMagician PeterMagician Peter
Magician PeterMagician PeterMagician PeterMagician PeterMagician PeterMagician PeterMagician PeterMagician Peter
Magician PeterMagician PeterMagician PeterMagician Peter
seomarket363
 

Portable batch and streaming pipelines with Apache Beam (Big Data Applications Meetup May 2017)

  • 1. Future-proof, portable batch and streaming pipelines using Apache Beam Malo Deniélou Senior Software Engineer at Google [email protected] Big Data Apps Meetup, May 2017
  • 2. Apache Beam: Open Source data processing APIs ● Expresses data-parallel batch and streaming algorithms using one unified API ● Cleanly separates data processing logic from runtime requirements ● Supports execution on multiple distributed processing runtime environments
  • 3. Why use Apache Beam? 1. Unified: The full spectrum from batch to streaming 2. Extensible: Transform libraries, IO Connectors, and DSLs -- oh my! 3. Portable: Write once, run anywhere 4. Demo: Beam, Dataflow, Spark, Kafka, Flink, ... 5. Getting Started: Beaming into the Future
  • 4. A free-to-play gaming analytics use case ● How many players made it to stage 12? ● What path did they take through the stage? ● Team points and other stats at this point in time? ● Of the players who took the same route where a certain condition was true, how many made an in-app purchase? ● What are the characteristics of the player segment who didn’t make the purchase vs. those who did? ● Why was this custom event so successful in driving in-app purchases compared to others? You need Key indicators specific to your game in order to increase adoption, engagement, etc.
  • 5. The Solution Collect real-time game events and player data Combine data in meaningful ways Apply realtime and historical batch analysis = Impact engagement, retention, and spend.
  • 6. How this would look on Google Cloud
  • 8. No API! … but I need to rewrite all my pipelines!
  • 9. Here comes Beam: your portable data processing API No technological or environmental lock-in!
  • 11. What is being computed?
  • 12. Extensible PTransforms let you build pipelines modularly.
  • 13. What is being computed? Where in event time?
  • 15. Demo! Beam, Dataflow, Spark, Kafka, Flink, ...
  • 16. ● Beam’s Java SDK runs on multiple runtime environments, including: ○ Apache Apex ○ Apache Spark ○ Apache Flink ○ Google Cloud Dataflow ○ [in development] Apache Gearpump ● Cross-language infrastructure is in progress. ○ Beam’s Python SDK currently runs on Google Cloud Dataflow ● First stable version at the end of May! Beam Vision: as of May 2017 Beam Model: Fn Runners Apache Spark Cloud Dataflow Beam Model: Pipeline Construction Apache Flink Java Java Python Python Apache Apex Apache Gearpump
  • 17. How do you build an abstraction layer? Apache Spark Cloud Dataflow Apache Flink ???????? ????????
  • 18. Beam: the intersection of runner functionality?
  • 19. Beam: the union of runner functionality?
  • 21. Getting Started with Apache Beam Beaming into the Future
  • 22. The Beam Model The Dataflow Model paper from VLDB 2015 http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf Streaming 101 and 102: The World Beyond Batch https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101 https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102 Beam
  • 23. Apache Beam: http://beam.apache.org Quickstarts ● Java SDK ● Python SDK Example walkthroughs ● Word Count ● Mobile Gaming Detailed documentation
  • 26. Unified The full spectrum from batch to streaming Less code, better code
  • 27. Processing time vs. event time
  • 28. The Beam Model: asking the right questions What results are calculated? Where in event time are results calculated? When in processing time are results materialized? How do refinements of results relate?
  • 29. PCollection<KV<String, Integer>> scores = input .apply(Sum.integersPerKey()); The Beam Model: What is being computed?
  • 30. The Beam Model: What is being computed?
  • 31. PCollection<KV<String, Integer>> scores = input .apply(Window.into(FixedWindows.of(Duration.standardMinutes(2))) .apply(Sum.integersPerKey()); The Beam Model: Where in event time?
  • 32. The Beam Model: Where in event time?
  • 33. PCollection<KV<String, Integer>> scores = input .apply(Window.into(FixedWindows.of(Duration.standardMinutes(2)) .triggering(AtWatermark())) .apply(Sum.integersPerKey()); The Beam Model: When in processing time?
  • 34. The Beam Model: When in processing time?
  • 35. PCollection<KV<String, Integer>> scores = input .apply(Window.into(FixedWindows.of(Duration.standardMinutes(2)) .triggering(AtWatermark() .withEarlyFirings( AtPeriod(Duration.standardMinutes(1))) .withLateFirings(AtCount(1))) .accumulatingFiredPanes()) .apply(Sum.integersPerKey()); The Beam Model: How do refinements relate?
  • 36. The Beam Model: How do refinements relate?
  • 37. Customizing What Where When How 3 Streaming 4 Streaming + Accumulation 1 Classic Batch 2 Windowed Batch
  • 38. Extensible Transform libraries, IO connectors, and DSLs -- Oh my!
  • 39. PTransforms ● PTransforms let you build pipelines modularly. ● All transformations are equal, so it’s easy to add new libraries. ● Runtime can use this structure to communicate with the user. More details on the Beam Blog: Where’s My PCollection.map()?
  • 40. Source API ● Source API allows users to teach the system about new bounded and unbounded data formats. More details on the Beam Blog: Dynamic Work Rebalancing for Beam ● Even the most careful hand-tuning will fail as data, code, and environments shift. ● Beam’s Source API is designed to provide runtime hooks for efficient scaling. Processed & Committed Processed and Uncommitted Unprocessed
  • 41. Language-specific SDKS and DSLs Language B SDK Language A SDK Language C SDK The Beam Model DSL X DSL Z ● Multiple language-specific SDKs that implement the full Beam model. ● Optional domain specific languages that narrow or transform the abstractions to align with simple use cases or specific user communities.
  • 43. Beam Vision: mix and match SDKs and runtimes ● The Beam Model: the abstractions at the core of Apache Beam Language B SDK Language A SDK Language C SDK Runner 1 Runner 3Runner 2 ● Choice of API: Users write their pipelines in a language that’s familiar and integrated with their other tooling ● Choice of Runtime: Users choose the right runner for their current needs -- on-prem / cloud, open source / not, fully managed / not ● Scalability for Developers: Clean APIs allow developers to contribute modules independently The Beam Model Language A Language CLanguage B The Beam Model
  • 44. Example Beam Runners Apache Spark ● Open-source cluster-computing framework ● Large ecosystem of APIs and tools ● Runs on premise or in the cloud Apache Flink ● Open-source distributed data processing engine ● High-throughput and low-latency stream processing ● Runs on premise or in the cloud Google Cloud Dataflow ● Fully-managed service for batch and stream data processing ● Provides dynamic auto-scaling, monitoring tools, and tight integration with Google Cloud Platform
  • 45. Demo screenshots because if I make them, I won’t need to use them