SlideShare a Scribd company logo
© 2019 Ververica
Stephan Ewen
CTO @ Ververica, Apache Flink PMC
A Unified Analytics Platform with Apache Flink and
Apache Kafka
a.k.a. Unified Streaming & Batch Processing
© 2019 Ververica
2
Apache Kafka and Apache Flink
© 2019 Ververica
3
Apache Kafka and Apache Flink
Log Storage File Storage
DB/Table
s
JDB
C
Debezium
Analytics and Applications on
Streaming Data &
Data-at-Rest
© 2019 Ververica
4
Flink Runtime
Stateful Stream- and Batch Processing
DataStream API StateFun
SQL & Table API
more declarative
more explicit
control
© 2019 Ververica
5
How big can you go? – Alibaba Singles Day
Search Rec. Security
BI
Ads
incl. sub-second updates to the GMV dashboard
Real-time Data Applications
Infrastructure
>5K
nodes
Data Size
985PB
Throughput (Peak)
2.5B
events/sec
Latency
Sub-sec
State Size (Biggest)
100TB
>500K
CPU cores
© 2019 Ververica
6
How small can you go? - U-Hopper FogGuru
Cluster of 5 Raspberry Pi 3b+ Data volume: 800 events/sec
Docker Swarm + Flink + Mosquitto
“The Fridge”
© 2019 Ververica
Streaming & Batch SQL
= SQL!
© 2019 Ververica
8
Our Sample Setup
Real-time
Queries
real-time
events
historical data
ingest
Historical
Queries
Combined Queries
© 2019 Ververica
9
SQL – Static Data (Batch Case)
user cnt
Mary 2
Bob 1
Liz 1
SELECT
user,
COUNT(url) as cnt
FROM clicks
GROUP BY user
user cTime url
Mary 12:00:00 https://…
Bob 12:00:00 https://…
Mary 12:00:02 https://…
Liz 12:00:03 https://…
© 2019 Ververica
10
SQL – Static Data (Streaming Case)
user cTime url
user cnt
SELECT
user,
COUNT(url) as cnt
FROM clicks
GROUP BY user
Mary 12:00:00 https://…
Bob 12:00:00 https://…
Mary 12:00:02 https://…
Liz 12:00:03 https://…
Bob 1
Liz 1
Mary 1
Mary 2
© 2019 Ververica
11
Demo Time
© 2019 Ververica
Stream to Batches and Back Again
© 2019 Ververica
13
ye olde stuff new events
2020-4-1
12:00 am
2016-4-1
1:00 am
2016-4-1
2:00 am
2016-4-7
11:00pm
2016-4-7
10:00pm
…
Inges
t
© 2019 Ververica
14
ye olde stuff new events
2020-4-1
12:00 am
2016-4-1
1:00 am
2016-4-1
2:00 am
2016-4-7
11:00pm
2016-4-7
10:00pm
…
Real-tim
e
Queries
Historical
Queries
Inges
t
© 2019 Ververica
15
Why combine data in Kafka and in Object Stores?
•Why not just longer Kafka retention?
•Sample Numbers from a petabyte-scale Flink + Kafka user
– Compressed columnar data avg. 5x smaller
– Object Store 3-4x cost cheaper than persistent volumes (EBS)
– Object Store already replicated (cross AZ), replaces broker replication (3x)
🡪 ~50x cheaper data storage
•Faster access to compressed columnar data for
– Higher read parallelism
– Read optimizations: Predicate pushdowns, projection pushdowns, partition pruning, etc.
© 2019 Ververica
16
ye olde stuff new events
2020-4-1
12:00 am
2016-4-1
1:00 am
2016-4-1
2:00 am
2016-4-7
11:00pm
2016-4-7
10:00pm
…
Inges
t
while we are at it…
…normalize our data on the way
and convert currencies.
© 2019 Ververica
17
Temporal Join
Symbol Exchange Rate Timestamp Order Timestamp Currency
EUR 1.0 12.00 A @ 17.00 $ 12:05 USD
USD 0.88 12:01 B @ 12.32 £ 12:11 GBP
USD 0.90 12:03
GBP 1.17 12:04
USD 0.89 12:12
GBP 1.20 12:13
USD 0.87 12:17
C @ 111.51 $ 12:01 USD
D @ 2.39 $ 12:02 USD
E @ 17.11 $ 12:20 USD
F @ 243.50 £ 12:15 GBP
G @ 3.49 $ 12:10 USD
H @ 0.99 £ 12:16 GBP
© 2019 Ververica
18
Temporal Join
Symbol Exchange Rate Timestamp Order Timestamp Currency
EUR 1.0 12.00 A @ 17.00 $ 12:05 USD
B @ 12.32 £ 12:11 GBP
GBP 1.17 12:04
USD 0.90 12:03 C @ 111.51 $ 12:01 USD
D @ 2.39 $ 12:02 USD
E @ 17.11 $ 12:20 USD
F @ 243.50 £ 12:15 GBP
G @ 3.49 $ 12:07 USD
H @ 0.99 £ 12:16 GBP
@ 12:07h
Symbol Exchange Rate Timestamp
EUR 1.0 12.00
GBP 1.20 12:13
USD 0.89 12:12
@ 12:15h
© 2019 Ververica
19
Demo Time
© 2019 Ververica
Connecting the Streams
© 2019 Ververica
21
ye olde stuff new events
2020-4-1
12:00 am
2016-4-1
1:00 am
2016-4-1
2:00 am
2016-4-7
11:00pm
2016-4-7
10:00pm
…
Bootstrap Streaming Queries
Inges
t
© 2019 Ververica
22
Demo Time
© 2019 Ververica
23
Event Time in
Kafka Topic
Event time
Data (orders / currency rates)
Temporal Join
Time bucketing
on file system
Time partition pruning
for queries
Stitching together
Stream again from
files and stream tail
© 2019 Ververica
EOF
© 2019 Ververica
25
Apache Kafka and Apache Flink
© 2019 Ververica
26
Everything is a Stream
Bounded & Unbounded
Batch Processing complements
Stream Processing
(Compressed Columnar) File
Storage complements Log Storage
make the most of your
(event) time
SQL generalizes well
across batch & streaming
a unified engine makes
things easier

More Related Content

What's hot (19)

PDF
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
HostedbyConfluent
 
PDF
DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...
HostedbyConfluent
 
PDF
Cloud Connect 2012, Big Data @ Netflix
Jerome Boulon
 
PDF
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
Spark Summit
 
PDF
Money Heist - A Stream Processing Original! | Meha Pandey and Shengze Yu, Net...
HostedbyConfluent
 
PDF
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
HostedbyConfluent
 
PDF
Big Data Kappa | Mark Senerth, The Walt Disney Company - DMED, Data Tech
HostedbyConfluent
 
PPTX
Netflix incloudsmarch8 2011forwiki
Kevin McEntee
 
PDF
Personalization Journey: From Single Node to Cloud Streaming
Databricks
 
PDF
Maximize the Business Value of Machine Learning and Data Science with Kafka (...
confluent
 
PDF
The Netflix data platform: Now and in the future by Kurt Brown
Data Con LA
 
PDF
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
confluent
 
PDF
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
Monal Daxini
 
PDF
Real-Time Market Data Analytics Using Kafka Streams
confluent
 
PDF
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Databricks
 
PDF
Low-latency real-time data processing at giga-scale with Kafka | John DesJard...
HostedbyConfluent
 
PDF
Introduction to Data Engineer and Data Pipeline at Credit OK
Kriangkrai Chaonithi
 
PDF
Real Time Data Infrastructure team overview
Monal Daxini
 
PPTX
Stream Processing Live Traffic Data with Kafka Streams
Tom Van den Bulck
 
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
HostedbyConfluent
 
DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...
HostedbyConfluent
 
Cloud Connect 2012, Big Data @ Netflix
Jerome Boulon
 
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
Spark Summit
 
Money Heist - A Stream Processing Original! | Meha Pandey and Shengze Yu, Net...
HostedbyConfluent
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
HostedbyConfluent
 
Big Data Kappa | Mark Senerth, The Walt Disney Company - DMED, Data Tech
HostedbyConfluent
 
Netflix incloudsmarch8 2011forwiki
Kevin McEntee
 
Personalization Journey: From Single Node to Cloud Streaming
Databricks
 
Maximize the Business Value of Machine Learning and Data Science with Kafka (...
confluent
 
The Netflix data platform: Now and in the future by Kurt Brown
Data Con LA
 
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
confluent
 
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
Monal Daxini
 
Real-Time Market Data Analytics Using Kafka Streams
confluent
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Databricks
 
Low-latency real-time data processing at giga-scale with Kafka | John DesJard...
HostedbyConfluent
 
Introduction to Data Engineer and Data Pipeline at Credit OK
Kriangkrai Chaonithi
 
Real Time Data Infrastructure team overview
Monal Daxini
 
Stream Processing Live Traffic Data with Kafka Streams
Tom Van den Bulck
 

Similar to A unified analytics platform with Kafka and Flink | Stephan Ewen, Ververica (20)

PDF
Unified Data Processing with Apache Flink and Apache Pulsar_Seth Wiesman
StreamNative
 
PDF
Stream processing with Apache Flink (Timo Walther - Ververica)
KafkaZone
 
PDF
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Timo Walther
 
PPTX
KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...
Flink Forward
 
PDF
Don't Cross the Streams! (or do, we got you)
Caito Scherr
 
PPTX
Flink SQL in Action
Fabian Hueske
 
PDF
Apache Kafka’s Transactions in the Wild! Developing an exactly-once KafkaSink...
HostedbyConfluent
 
PDF
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Ververica
 
PDF
What's new for Apache Flink's Table & SQL APIs?
Timo Walther
 
PDF
Stream Processing Solution for the Enterprise
HostedbyConfluent
 
PDF
Better, Faster, Stronger Streaming: Your First Dive into Flink SQL
Caito Scherr
 
PDF
What is Apache Kafka? Why is it so popular? Should I use it?
Guido Schmutz
 
PDF
Santander Stream Processing with Apache Flink
confluent
 
PDF
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
Timo Walther
 
PPTX
Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...
Flink Forward
 
PPTX
The Past, Present, and Future of Apache Flink
Aljoscha Krettek
 
PPTX
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
Flink Forward
 
PDF
Rivivi il Data in Motion Tour Milano 2024
mtabrea
 
PPTX
The Past, Present, and Future of Apache Flink®
Aljoscha Krettek
 
PPTX
Apache Flink: Past, Present and Future
Gyula Fóra
 
Unified Data Processing with Apache Flink and Apache Pulsar_Seth Wiesman
StreamNative
 
Stream processing with Apache Flink (Timo Walther - Ververica)
KafkaZone
 
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Timo Walther
 
KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...
Flink Forward
 
Don't Cross the Streams! (or do, we got you)
Caito Scherr
 
Flink SQL in Action
Fabian Hueske
 
Apache Kafka’s Transactions in the Wild! Developing an exactly-once KafkaSink...
HostedbyConfluent
 
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Ververica
 
What's new for Apache Flink's Table & SQL APIs?
Timo Walther
 
Stream Processing Solution for the Enterprise
HostedbyConfluent
 
Better, Faster, Stronger Streaming: Your First Dive into Flink SQL
Caito Scherr
 
What is Apache Kafka? Why is it so popular? Should I use it?
Guido Schmutz
 
Santander Stream Processing with Apache Flink
confluent
 
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
Timo Walther
 
Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...
Flink Forward
 
The Past, Present, and Future of Apache Flink
Aljoscha Krettek
 
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
Flink Forward
 
Rivivi il Data in Motion Tour Milano 2024
mtabrea
 
The Past, Present, and Future of Apache Flink®
Aljoscha Krettek
 
Apache Flink: Past, Present and Future
Gyula Fóra
 
Ad

More from HostedbyConfluent (20)

PDF
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
PDF
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
PDF
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
PDF
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
PDF
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
PDF
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
PDF
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
PDF
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
PDF
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
PDF
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
PDF
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
PDF
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
PDF
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
PDF
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
PDF
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
PDF
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
PDF
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
PDF
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
PDF
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
PDF
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 
Ad

Recently uploaded (20)

PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PPTX
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
PDF
HR agent at Mediq: Lessons learned on Agent Builder & Maestro by Tacstone Tec...
UiPathCommunity
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
Market Wrap for 18th July 2025 by CIFDAQ
CIFDAQ
 
PDF
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
PDF
Shuen Mei Parth Sharma Boost Productivity, Innovation and Efficiency wit...
AWS Chicago
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PPTX
Darren Mills The Migration Modernization Balancing Act: Navigating Risks and...
AWS Chicago
 
PDF
Rethinking Security Operations - SOC Evolution Journey.pdf
Haris Chughtai
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
PPTX
Simplifying End-to-End Apache CloudStack Deployment with a Web-Based Automati...
ShapeBlue
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
Upgrading to z_OS V2R4 Part 01 of 02.pdf
Flavio787771
 
PPTX
Building and Operating a Private Cloud with CloudStack and LINBIT CloudStack ...
ShapeBlue
 
PDF
Ampere Offers Energy-Efficient Future For AI And Cloud
ShapeBlue
 
PDF
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
PPTX
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
HR agent at Mediq: Lessons learned on Agent Builder & Maestro by Tacstone Tec...
UiPathCommunity
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
Market Wrap for 18th July 2025 by CIFDAQ
CIFDAQ
 
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
Shuen Mei Parth Sharma Boost Productivity, Innovation and Efficiency wit...
AWS Chicago
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Darren Mills The Migration Modernization Balancing Act: Navigating Risks and...
AWS Chicago
 
Rethinking Security Operations - SOC Evolution Journey.pdf
Haris Chughtai
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
Simplifying End-to-End Apache CloudStack Deployment with a Web-Based Automati...
ShapeBlue
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
Upgrading to z_OS V2R4 Part 01 of 02.pdf
Flavio787771
 
Building and Operating a Private Cloud with CloudStack and LINBIT CloudStack ...
ShapeBlue
 
Ampere Offers Energy-Efficient Future For AI And Cloud
ShapeBlue
 
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 

A unified analytics platform with Kafka and Flink | Stephan Ewen, Ververica

  • 1. © 2019 Ververica Stephan Ewen CTO @ Ververica, Apache Flink PMC A Unified Analytics Platform with Apache Flink and Apache Kafka a.k.a. Unified Streaming & Batch Processing
  • 2. © 2019 Ververica 2 Apache Kafka and Apache Flink
  • 3. © 2019 Ververica 3 Apache Kafka and Apache Flink Log Storage File Storage DB/Table s JDB C Debezium Analytics and Applications on Streaming Data & Data-at-Rest
  • 4. © 2019 Ververica 4 Flink Runtime Stateful Stream- and Batch Processing DataStream API StateFun SQL & Table API more declarative more explicit control
  • 5. © 2019 Ververica 5 How big can you go? – Alibaba Singles Day Search Rec. Security BI Ads incl. sub-second updates to the GMV dashboard Real-time Data Applications Infrastructure >5K nodes Data Size 985PB Throughput (Peak) 2.5B events/sec Latency Sub-sec State Size (Biggest) 100TB >500K CPU cores
  • 6. © 2019 Ververica 6 How small can you go? - U-Hopper FogGuru Cluster of 5 Raspberry Pi 3b+ Data volume: 800 events/sec Docker Swarm + Flink + Mosquitto “The Fridge”
  • 7. © 2019 Ververica Streaming & Batch SQL = SQL!
  • 8. © 2019 Ververica 8 Our Sample Setup Real-time Queries real-time events historical data ingest Historical Queries Combined Queries
  • 9. © 2019 Ververica 9 SQL – Static Data (Batch Case) user cnt Mary 2 Bob 1 Liz 1 SELECT user, COUNT(url) as cnt FROM clicks GROUP BY user user cTime url Mary 12:00:00 https://… Bob 12:00:00 https://… Mary 12:00:02 https://… Liz 12:00:03 https://…
  • 10. © 2019 Ververica 10 SQL – Static Data (Streaming Case) user cTime url user cnt SELECT user, COUNT(url) as cnt FROM clicks GROUP BY user Mary 12:00:00 https://… Bob 12:00:00 https://… Mary 12:00:02 https://… Liz 12:00:03 https://… Bob 1 Liz 1 Mary 1 Mary 2
  • 12. © 2019 Ververica Stream to Batches and Back Again
  • 13. © 2019 Ververica 13 ye olde stuff new events 2020-4-1 12:00 am 2016-4-1 1:00 am 2016-4-1 2:00 am 2016-4-7 11:00pm 2016-4-7 10:00pm … Inges t
  • 14. © 2019 Ververica 14 ye olde stuff new events 2020-4-1 12:00 am 2016-4-1 1:00 am 2016-4-1 2:00 am 2016-4-7 11:00pm 2016-4-7 10:00pm … Real-tim e Queries Historical Queries Inges t
  • 15. © 2019 Ververica 15 Why combine data in Kafka and in Object Stores? •Why not just longer Kafka retention? •Sample Numbers from a petabyte-scale Flink + Kafka user – Compressed columnar data avg. 5x smaller – Object Store 3-4x cost cheaper than persistent volumes (EBS) – Object Store already replicated (cross AZ), replaces broker replication (3x) 🡪 ~50x cheaper data storage •Faster access to compressed columnar data for – Higher read parallelism – Read optimizations: Predicate pushdowns, projection pushdowns, partition pruning, etc.
  • 16. © 2019 Ververica 16 ye olde stuff new events 2020-4-1 12:00 am 2016-4-1 1:00 am 2016-4-1 2:00 am 2016-4-7 11:00pm 2016-4-7 10:00pm … Inges t while we are at it… …normalize our data on the way and convert currencies.
  • 17. © 2019 Ververica 17 Temporal Join Symbol Exchange Rate Timestamp Order Timestamp Currency EUR 1.0 12.00 A @ 17.00 $ 12:05 USD USD 0.88 12:01 B @ 12.32 £ 12:11 GBP USD 0.90 12:03 GBP 1.17 12:04 USD 0.89 12:12 GBP 1.20 12:13 USD 0.87 12:17 C @ 111.51 $ 12:01 USD D @ 2.39 $ 12:02 USD E @ 17.11 $ 12:20 USD F @ 243.50 £ 12:15 GBP G @ 3.49 $ 12:10 USD H @ 0.99 £ 12:16 GBP
  • 18. © 2019 Ververica 18 Temporal Join Symbol Exchange Rate Timestamp Order Timestamp Currency EUR 1.0 12.00 A @ 17.00 $ 12:05 USD B @ 12.32 £ 12:11 GBP GBP 1.17 12:04 USD 0.90 12:03 C @ 111.51 $ 12:01 USD D @ 2.39 $ 12:02 USD E @ 17.11 $ 12:20 USD F @ 243.50 £ 12:15 GBP G @ 3.49 $ 12:07 USD H @ 0.99 £ 12:16 GBP @ 12:07h Symbol Exchange Rate Timestamp EUR 1.0 12.00 GBP 1.20 12:13 USD 0.89 12:12 @ 12:15h
  • 21. © 2019 Ververica 21 ye olde stuff new events 2020-4-1 12:00 am 2016-4-1 1:00 am 2016-4-1 2:00 am 2016-4-7 11:00pm 2016-4-7 10:00pm … Bootstrap Streaming Queries Inges t
  • 23. © 2019 Ververica 23 Event Time in Kafka Topic Event time Data (orders / currency rates) Temporal Join Time bucketing on file system Time partition pruning for queries Stitching together Stream again from files and stream tail
  • 25. © 2019 Ververica 25 Apache Kafka and Apache Flink
  • 26. © 2019 Ververica 26 Everything is a Stream Bounded & Unbounded Batch Processing complements Stream Processing (Compressed Columnar) File Storage complements Log Storage make the most of your (event) time SQL generalizes well across batch & streaming a unified engine makes things easier