SlideShare a Scribd company logo
Stream Processing as a
Foundational Paradigm and
Apache Flink's approach to it
Stephan Ewen, Apache Flink PMC, CTO @ data Artisans
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It
Streaming technology is enabling the obvious:
continuous processing on data that is
continuously produced
Hint: you already have streaming data
4
Streaming Subsumes Batch
5
2016-3-1
12:00 am
2016-3-1
1:00 am
2016-3-1
2:00 am
2016-3-11
11:00pm
2016-3-12
12:00am
2016-3-12
1:00am
2016-3-11
10:00pm
2016-3-12
2:00am
2016-3-12
3:00am…
partition
partition
Streaming Subsumes Batch
6
2016-3-1
12:00 am
2016-3-1
1:00 am
2016-3-1
2:00 am
2016-3-11
11:00pm
2016-3-12
12:00am
2016-3-12
1:00am
2016-3-11
10:00pm
2016-3-12
2:00am
2016-3-12
3:00am…
partition
partition
Stream (low latency)
Stream (high latency)
Streaming Subsumes Batch
7
2016-3-1
12:00 am
2016-3-1
1:00 am
2016-3-1
2:00 am
2016-3-11
11:00pm
2016-3-12
12:00am
2016-3-12
1:00am
2016-3-11
10:00pm
2016-3-12
2:00am
2016-3-12
3:00am…
partition
partition
Stream (low latency)
Batch
(bounded stream)
Stream (high latency)
Stream Processing Decouples
8
Database
(State)
App a App b
App c
App a
App b
App c
Applications build their own stateState managed centralized
Time Travel
9
Process a period of
historic data
partition
partition
Process latest data
with low latency
(tail of the log)
Reprocess stream
(historic data first, catches up with realtime data)
10
But why has it started so recently?
Stream Processing is taking off.
(just look at this year's talks)
11
Latency
Volume/
Throughput
State &
Accuracy
The combination is what makes
steaming powerful
Only recently available together
12
Latency
Volume/
Throughput
State &
Accuracy
Exactly-once semantics
Event time processing
10s of millions evts/sec
for stateful applications
Latency down to
the milliseconds
Apache Flink was the first open-source
system to eliminate these tradeoffs
Flink's Approach
13
Stateful Steam Processing
Fluent API, Windows, Event Time
Table API
Stream SQL
Core API
Declarative DSL
High-level Language
Building Block
Stateful Steam Processing
14
Source
Filter /
Transform
State
read/write
Sink
Stateful Steam Processing
15
Scalable embedded state
Access at memory speed &
scales with parallel operators
Stateful Steam Processing
16
Re-load state
Reset positions
in input streams
Rolling back computation
Re-processing
Stateful Steam Processing
17
Restore to different
programs
Bugfixes, Upgrades, A/B testing, etc
Versioning the state of applications
18
Savepoint
Savepoint
Savepoint
App. A
App. B
App. C
Time
Savepoint
Flink's Approach
19
Stateful Steam Processing
Fluent API, Windows, Event Time
Table API
Stream SQL
Core API
Declarative DSL
High-level Language
Building Block
Event Time / Out-of-Order
20
1977 1980 1983 1999 2002 2005 2015
Processing Time
Episode
IV
Episode
V
Episode
VI
Episode
I
Episode
II
Episode
III
Episode
VII
Event Time
(Stream) SQL & Table API
21
Table API
// convert stream into Table
val sensorTable: Table = sensorData
.toTable(tableEnv, 'location, 'time, 'tempF)
// define query on Table
val avgTempCTable: Table = sensorTable
.groupBy('location)
.window(Tumble over 1.days on 'rowtime as 'w)
.select('w.start as 'day, 'location,
(('tempF.avg - 32) * 0.556) as 'avgTempC)
.where('location like "room%")
SQL
sensorTable.sql("""
SELECT day, location,
avg((tempF - 32) * 0.556) AS avgTempC
FROM sensorData
WHERE location LIKE 'room%'
GROUP BY day, location
""")
What can you do with that?
22
10 billion events (2TB) processed daily across multiple
Flink jobs for the telco network control center.
Ad-hoc realtime queries, > 30 operators, processing
30 billion events daily, maintaining state of 100s of GB
inside Flink with exactly-once guarantees
Jobs with > 20 operators, runs on > 5000 vCores in
1000-node cluster, processes millions of events per
second
Flink's Streams playing at Batch
23
TeraSort
Relational Join
Classic Batch Jobs
Graph
Processing
Linear
Algebra
24
Streaming Technology is already awesome,
but what are the next steps?
A.k.a, what can we expect in the "next gen" ?
A lot of things are "next gen" when looking
at the program, so here is my take on it…
"Next Gen"
25
Queryable State
"Next Gen"
26
Elastic Parallelism
Maintaining exactly-once
state consistency
No extra effort for the user
No need to carefully plan
partitions
"Next Gen"
27
Terabytes of state inside the
stream processor
Maintaining fast checkpoints and recovery
E.g., long histories of windows, large join tables
State at local memory speed
"Next Gen"
28
Full SQL on Streams
Continuous queries, incremental results
Windows, event time, processing time
Consistent with SQL on bounded data
29
Thank you!
Appendix
30
31
We are hiring!
data-artisans.com/careers

More Related Content

What's hot (20)

PDF
Streaming Analytics & CEP - Two sides of the same coin?
Till Rohrmann
 
PPTX
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Ververica
 
PDF
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Ververica
 
PPTX
Kostas Kloudas - Complex Event Processing with Flink: the state of FlinkCEP
Ververica
 
PDF
Apache Flink's Table & SQL API - unified APIs for batch and stream processing
Timo Walther
 
PPTX
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Ververica
 
PPTX
Stephan Ewen - Experiences running Flink at Very Large Scale
Ververica
 
PPTX
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward
 
PDF
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Ververica
 
PDF
Francesco Versaci - Flink in genomics - efficient and scalable processing of ...
Flink Forward
 
PDF
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
Flink Forward
 
PPTX
Flink Streaming Hadoop Summit San Jose
Kostas Tzoumas
 
PDF
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
Jonas Traub
 
PPTX
Apache Flink at Strata San Jose 2016
Kostas Tzoumas
 
PPTX
Aljoscha Krettek - The Future of Apache Flink
Flink Forward
 
PPTX
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
Flink Forward
 
PDF
Flink Forward Berlin 2017: Francesco Versaci - Integrating Flink and Kafka in...
Flink Forward
 
PPTX
Apache Flink Community Updates November 2016 @ Berlin Meetup
Robert Metzger
 
PDF
Big Data Warsaw
Maximilian Michels
 
PDF
Flink forward SF 2017: Ufuk Celebi - The Stream Processor as a Database: Buil...
Flink Forward
 
Streaming Analytics & CEP - Two sides of the same coin?
Till Rohrmann
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Ververica
 
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Ververica
 
Kostas Kloudas - Complex Event Processing with Flink: the state of FlinkCEP
Ververica
 
Apache Flink's Table & SQL API - unified APIs for batch and stream processing
Timo Walther
 
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Ververica
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Ververica
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Ververica
 
Francesco Versaci - Flink in genomics - efficient and scalable processing of ...
Flink Forward
 
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
Flink Forward
 
Flink Streaming Hadoop Summit San Jose
Kostas Tzoumas
 
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
Jonas Traub
 
Apache Flink at Strata San Jose 2016
Kostas Tzoumas
 
Aljoscha Krettek - The Future of Apache Flink
Flink Forward
 
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
Flink Forward
 
Flink Forward Berlin 2017: Francesco Versaci - Integrating Flink and Kafka in...
Flink Forward
 
Apache Flink Community Updates November 2016 @ Berlin Meetup
Robert Metzger
 
Big Data Warsaw
Maximilian Michels
 
Flink forward SF 2017: Ufuk Celebi - The Stream Processor as a Database: Buil...
Flink Forward
 

Viewers also liked (19)

PPTX
Kostas Tzoumas - Stream Processing with Apache Flink®
Ververica
 
PPTX
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
Ververica
 
PDF
Jamie Grier - Robust Stream Processing with Apache Flink
Flink Forward
 
PDF
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
Flink Forward
 
PPTX
Robert Metzger - Apache Flink Community Updates November 2016 @ Berlin Meetup
Ververica
 
PDF
Stefan Richter - A look at Flink 1.2 and beyond @ Berlin Meetup
Ververica
 
PDF
Timo Walther - Table & SQL API - unified APIs for batch and stream processing
Ververica
 
PPTX
Stephan Ewen - Running Flink Everywhere
Flink Forward
 
PPTX
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Flink Forward
 
PPTX
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Flink Forward
 
PPTX
Streaming in the Wild with Apache Flink
Kostas Tzoumas
 
PPTX
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
Stephan Ewen
 
PPTX
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
Robert Metzger
 
PPTX
Taking a look under the hood of Apache Flink's relational APIs.
Fabian Hueske
 
PDF
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Dan Halperin
 
PPTX
Stephan Ewen - Scaling to large State
Flink Forward
 
PDF
Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data...
Flink Forward
 
PPTX
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
confluent
 
PDF
Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Ververica
 
Kostas Tzoumas - Stream Processing with Apache Flink®
Ververica
 
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
Ververica
 
Jamie Grier - Robust Stream Processing with Apache Flink
Flink Forward
 
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
Flink Forward
 
Robert Metzger - Apache Flink Community Updates November 2016 @ Berlin Meetup
Ververica
 
Stefan Richter - A look at Flink 1.2 and beyond @ Berlin Meetup
Ververica
 
Timo Walther - Table & SQL API - unified APIs for batch and stream processing
Ververica
 
Stephan Ewen - Running Flink Everywhere
Flink Forward
 
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Flink Forward
 
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Flink Forward
 
Streaming in the Wild with Apache Flink
Kostas Tzoumas
 
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
Stephan Ewen
 
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
Robert Metzger
 
Taking a look under the hood of Apache Flink's relational APIs.
Fabian Hueske
 
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Dan Halperin
 
Stephan Ewen - Scaling to large State
Flink Forward
 
Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data...
Flink Forward
 
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
confluent
 
Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Ververica
 
Ad

Similar to Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It (20)

PDF
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward
 
PDF
Designing for Distributed Systems with Reactor and Reactive Streams
Stéphane Maldini
 
PPTX
Flink 0.10 @ Bay Area Meetup (October 2015)
Stephan Ewen
 
PDF
Streaming Dataflow with Apache Flink
huguk
 
PDF
Real-Time Data Pipelines Made Easy with Structured Streaming in Apache Spark.pdf
nilanjan172nsvian
 
PPTX
O'Reilly Media Webcast: Building Real-Time Data Pipelines
SingleStore
 
PDF
Continuous SQL with Apache Streaming (FLaNK and FLiP)
Timothy Spann
 
PDF
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Databricks
 
PDF
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
Databricks
 
PDF
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Data Con LA
 
PPT
Spark streaming
Venkateswaran Kandasamy
 
PPTX
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
StreamNative
 
PDF
Building Continuous Application with Structured Streaming and Real-Time Data ...
Databricks
 
PDF
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
Helena Edelson
 
PPT
strata_spark_streaming.ppt
rveiga100
 
PDF
K. Tzoumas & S. Ewen – Flink Forward Keynote
Flink Forward
 
PDF
So you think you can stream.pptx
Prakash Chockalingam
 
PDF
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Databricks
 
PDF
Taking Spark Streaming to the Next Level with Datasets and DataFrames
Databricks
 
PDF
Streaming all the things with akka streams
Johan Andrén
 
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward
 
Designing for Distributed Systems with Reactor and Reactive Streams
Stéphane Maldini
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Stephan Ewen
 
Streaming Dataflow with Apache Flink
huguk
 
Real-Time Data Pipelines Made Easy with Structured Streaming in Apache Spark.pdf
nilanjan172nsvian
 
O'Reilly Media Webcast: Building Real-Time Data Pipelines
SingleStore
 
Continuous SQL with Apache Streaming (FLaNK and FLiP)
Timothy Spann
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Databricks
 
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
Databricks
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Data Con LA
 
Spark streaming
Venkateswaran Kandasamy
 
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
StreamNative
 
Building Continuous Application with Structured Streaming and Real-Time Data ...
Databricks
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
Helena Edelson
 
strata_spark_streaming.ppt
rveiga100
 
K. Tzoumas & S. Ewen – Flink Forward Keynote
Flink Forward
 
So you think you can stream.pptx
Prakash Chockalingam
 
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Databricks
 
Taking Spark Streaming to the Next Level with Datasets and DataFrames
Databricks
 
Streaming all the things with akka streams
Johan Andrén
 
Ad

More from Ververica (9)

PDF
2020-05-06 Apache Flink Meetup London: The Easiest Way to Get Operational wit...
Ververica
 
PDF
Webinar: How to contribute to Apache Flink - Robert Metzger
Ververica
 
PDF
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Ververica
 
PDF
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Ververica
 
PDF
Webinar: Detecting row patterns with Flink SQL - Dawid Wysakowicz
Ververica
 
PDF
Deploying Flink on Kubernetes - David Anderson
Ververica
 
PPTX
Webinar: Flink SQL in Action - Fabian Hueske
Ververica
 
PDF
2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2
Ververica
 
PPTX
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Ververica
 
2020-05-06 Apache Flink Meetup London: The Easiest Way to Get Operational wit...
Ververica
 
Webinar: How to contribute to Apache Flink - Robert Metzger
Ververica
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Ververica
 
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Ververica
 
Webinar: Detecting row patterns with Flink SQL - Dawid Wysakowicz
Ververica
 
Deploying Flink on Kubernetes - David Anderson
Ververica
 
Webinar: Flink SQL in Action - Fabian Hueske
Ververica
 
2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2
Ververica
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Ververica
 

Recently uploaded (20)

PDF
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
PDF
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
PPTX
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
PPTX
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
PDF
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
PPTX
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PPTX
The _Operations_on_Functions_Addition subtruction Multiplication and Division...
mdregaspi24
 
PDF
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
PDF
Early_Diabetes_Detection_using_Machine_L.pdf
maria879693
 
PDF
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
PDF
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PDF
Merits and Demerits of DBMS over File System & 3-Tier Architecture in DBMS
MD RIZWAN MOLLA
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
PPTX
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
The _Operations_on_Functions_Addition subtruction Multiplication and Division...
mdregaspi24
 
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
Early_Diabetes_Detection_using_Machine_L.pdf
maria879693
 
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
Merits and Demerits of DBMS over File System & 3-Tier Architecture in DBMS
MD RIZWAN MOLLA
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 

Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It

  • 1. Stream Processing as a Foundational Paradigm and Apache Flink's approach to it Stephan Ewen, Apache Flink PMC, CTO @ data Artisans
  • 3. Streaming technology is enabling the obvious: continuous processing on data that is continuously produced Hint: you already have streaming data 4
  • 4. Streaming Subsumes Batch 5 2016-3-1 12:00 am 2016-3-1 1:00 am 2016-3-1 2:00 am 2016-3-11 11:00pm 2016-3-12 12:00am 2016-3-12 1:00am 2016-3-11 10:00pm 2016-3-12 2:00am 2016-3-12 3:00am… partition partition
  • 5. Streaming Subsumes Batch 6 2016-3-1 12:00 am 2016-3-1 1:00 am 2016-3-1 2:00 am 2016-3-11 11:00pm 2016-3-12 12:00am 2016-3-12 1:00am 2016-3-11 10:00pm 2016-3-12 2:00am 2016-3-12 3:00am… partition partition Stream (low latency) Stream (high latency)
  • 6. Streaming Subsumes Batch 7 2016-3-1 12:00 am 2016-3-1 1:00 am 2016-3-1 2:00 am 2016-3-11 11:00pm 2016-3-12 12:00am 2016-3-12 1:00am 2016-3-11 10:00pm 2016-3-12 2:00am 2016-3-12 3:00am… partition partition Stream (low latency) Batch (bounded stream) Stream (high latency)
  • 7. Stream Processing Decouples 8 Database (State) App a App b App c App a App b App c Applications build their own stateState managed centralized
  • 8. Time Travel 9 Process a period of historic data partition partition Process latest data with low latency (tail of the log) Reprocess stream (historic data first, catches up with realtime data)
  • 9. 10 But why has it started so recently? Stream Processing is taking off. (just look at this year's talks)
  • 10. 11 Latency Volume/ Throughput State & Accuracy The combination is what makes steaming powerful Only recently available together
  • 11. 12 Latency Volume/ Throughput State & Accuracy Exactly-once semantics Event time processing 10s of millions evts/sec for stateful applications Latency down to the milliseconds Apache Flink was the first open-source system to eliminate these tradeoffs
  • 12. Flink's Approach 13 Stateful Steam Processing Fluent API, Windows, Event Time Table API Stream SQL Core API Declarative DSL High-level Language Building Block
  • 13. Stateful Steam Processing 14 Source Filter / Transform State read/write Sink
  • 14. Stateful Steam Processing 15 Scalable embedded state Access at memory speed & scales with parallel operators
  • 15. Stateful Steam Processing 16 Re-load state Reset positions in input streams Rolling back computation Re-processing
  • 16. Stateful Steam Processing 17 Restore to different programs Bugfixes, Upgrades, A/B testing, etc
  • 17. Versioning the state of applications 18 Savepoint Savepoint Savepoint App. A App. B App. C Time Savepoint
  • 18. Flink's Approach 19 Stateful Steam Processing Fluent API, Windows, Event Time Table API Stream SQL Core API Declarative DSL High-level Language Building Block
  • 19. Event Time / Out-of-Order 20 1977 1980 1983 1999 2002 2005 2015 Processing Time Episode IV Episode V Episode VI Episode I Episode II Episode III Episode VII Event Time
  • 20. (Stream) SQL & Table API 21 Table API // convert stream into Table val sensorTable: Table = sensorData .toTable(tableEnv, 'location, 'time, 'tempF) // define query on Table val avgTempCTable: Table = sensorTable .groupBy('location) .window(Tumble over 1.days on 'rowtime as 'w) .select('w.start as 'day, 'location, (('tempF.avg - 32) * 0.556) as 'avgTempC) .where('location like "room%") SQL sensorTable.sql(""" SELECT day, location, avg((tempF - 32) * 0.556) AS avgTempC FROM sensorData WHERE location LIKE 'room%' GROUP BY day, location """)
  • 21. What can you do with that? 22 10 billion events (2TB) processed daily across multiple Flink jobs for the telco network control center. Ad-hoc realtime queries, > 30 operators, processing 30 billion events daily, maintaining state of 100s of GB inside Flink with exactly-once guarantees Jobs with > 20 operators, runs on > 5000 vCores in 1000-node cluster, processes millions of events per second
  • 22. Flink's Streams playing at Batch 23 TeraSort Relational Join Classic Batch Jobs Graph Processing Linear Algebra
  • 23. 24 Streaming Technology is already awesome, but what are the next steps? A.k.a, what can we expect in the "next gen" ? A lot of things are "next gen" when looking at the program, so here is my take on it…
  • 25. "Next Gen" 26 Elastic Parallelism Maintaining exactly-once state consistency No extra effort for the user No need to carefully plan partitions
  • 26. "Next Gen" 27 Terabytes of state inside the stream processor Maintaining fast checkpoints and recovery E.g., long histories of windows, large join tables State at local memory speed
  • 27. "Next Gen" 28 Full SQL on Streams Continuous queries, incremental results Windows, event time, processing time Consistent with SQL on bounded data
  • 30. 31