SlideShare a Scribd company logo
Aljoscha Krettek - Co-Founder & Software Engineer at data Artisans
THE EVOLUTION OF
(OPEN SOURCE) DATA PROCESSING
© 2018 data Artisans2
ABOUT DATA ARTISANS
Original Creators of
Apache Flink®
RealTime Stream Processing Enterprise
Ready
© 2018 data Artisans3
POWERED BY APACHE FLINK
© 2018 data Artisans4
Disclaimer:
I might forget systems or misrepresent their use or
when they were created.This is not intentional. Please
come discuss with me afterwards!
© 2018 data Artisans5
How do we process data and what are the systems
available to us?
© 2018 data Artisans6
PRE-HISTORIC
© 2018 data Artisans7
Purpose-built
programs
Since the beginning of computers.
© 2018 data Artisans8
Programming is kinda hard.
Data analysis is only available to a
small circle of
programmers/engineers.
© 2018 data Artisans9
(Big) Data Bases
Since the 1970s
© 2018 data Artisans10
SQL is approachable to a wider
range of people.
Data analysis is no longer
restricted to “programmers”.
There are even tools that create
SQL: BI tools and whatnot.
© 2018 data Artisans11
Application Services
talking to data
bases, event-driven
applications
Since quite a while… 😉
© 2018 data Artisans12
THE ADVENT OF BIG DATA
© 2018 data Artisans13
MapReduce
2004
© 2018 data Artisans14
Apache Hadoop®
2006
© 2018 data Artisans15
Store first, ask
questions later*
* we’ll get back to this later
© 2018 data Artisans16
Programming is kinda hard.
Data analysis is only available to a
small circle of
programmers/engineers.
© 2018 data Artisans17
Apache Hive™ 2009
Apache Pig™
2008
*it’s tricky with release dates and when they incubated and whatnot
© 2018 data Artisans18
SQL is approachable to a wider
range of people.
Data analysis is no longer
restricted to “programmers”.
There are even tools that create
SQL: BI tools and whatnot.
© 2018 data Artisans19
Apache Spark™
2012? – non-apache release
2014 – first apache release
© 2018 data Artisans20
THE RISE OF STREAMING
© 2018 data Artisans21
Apache Storm™
2011 – first non-apache release
2014 – Storm 0.9.1, first Apache release
© 2018 data Artisans22
Apache Kafka®
2011 – non-apache release
2013 – first apache release
© 2018 data Artisans23
Lambda Architecture
At some point in between.
Was a bit of a dead end.
© 2018 data Artisans24
Apache Flink®
2010 - under the name Stratosphere
2014 - Flink 0.6, first Apache release
2015 – Flink 0.9, first release with exactly-once stream processing
© 2018 data Artisans25
Reliable Stream Processing
No more need for the lambda architecture.
© 2018 data Artisans26
Ask questions
first, then wait for
things to happen*
* i.e., we put in place a program, and get real-
time updates when things happen
© 2018 data Artisans27
And of course…
Programming this was hard.
Then we had “SQL” on streams.
© 2018 data Artisans28
APACHE FLINK
© 2018 data Artisans29
batch
streaming analytics &
continuous processing
event-driven applications
offline real-time
The processing landscape
© 2018 data Artisans30
What’s in a processing system/framework?
1.
Engine
2.
APIs
3.
Connectors
© 2018 data Artisans31
1. Flink Engine
Deployment
• YARN
• Mesos
• Kubernetes
• Resource elasticity
Stateful stream processing
• Network shuffle
• State & timers
• Fault tolerance
• Exactly once
• Savepoints
© 2018 data Artisans32
2. Flink APIs
DataSet API
DataStream API
Table API/SQL
and more …
© 2018 data Artisans33
2. Flink APIs – DataStream API
• Stateful stream processing
• Windowing
• State & timers
• Complete control over what is
going on
© 2018 data Artisans34
2. Flink APIs –Table API/SQL
• Declarative/relational API
• “No programming required” SQL (ANSI SQL)
• Same SQL for batch and streaming
• Pluggable connectors / data formats
© 2018 data Artisans35 https://data-artisans.com/blog/flink-sql-powerful-querying-of-data-streams
© 2018 data Artisans36
3. Flink Connectors
The usual suspects: Kafka, Kinesis, HDFS/S3,
Elasticsearch, Cassandra, …
Table API / SQL has a modular library of connectors &
formats that can be extended by users.
© 2018 data Artisans37
SQL connector definition
- name:TaxiRides
type: source
update-mode: append
schema:
- name: rideId
type: LONG
- name: rowTime
type:TIMESTAMP
rowtime:
timestamps:
type: "from-field"
from: "rideTime"
watermarks:
type: "periodic-bounded"
delay: "60000"
- name: isStart
type: BOOLEAN
- name: lon
type: FLOAT
- name: lat
type: FLOAT
- name: taxiId
type: LONG
- name: driverId
type: LONG
- name: psgCnt
type: INT
connector:
property-version: 1
type: kafka
version: 0.11
topic:TaxiRides
startup-mode: earliest-offset
properties:
- key: zookeeper.connect
value: zookeeper:2181
- key: bootstrap.servers
value: kafka:9092
- key: group.id
value: testGroup
format:
property-version: 1
type: json
schema: "ROW(rideId LONG, isStart,
BOOLEAN, rideTimeTIMESTAMP,
lon FLOAT, lat FLOAT, psgCnt INT,
taxiId LONG, driverId LONG)"
© 2018 data Artisans38
DataSetAPI DataStreamAPI
TableAPI / SQL
batch
streaming analytics &
continuous processing
event-driven applications
offline real-time
© 2018 data Artisans39
What’s the next
evolution?
© 2018 data Artisans40
DataSetAPI DataStreamAPI
TableAPI / SQL
batch
streaming analytics &
continuous processing
event-driven applications
offline real-time
* this is where we are now
Different algorithms/data
structures optimized for the
use case.
© 2018 data Artisans41
Grand Unification
Truly unified runtime that adapts to the workload.
Seamless integration of batch and streaming data
sources.
© 2018 data Artisans42
DataStreamAPITableAPI / SQL
batch
streaming analytics &
continuous processing
event-driven applications
offline real-time
* possible future evolution
© 2018 data Artisans43
http://flink.apache.org
THANK YOU!
aljoscha@apache.org
@dataArtisans
@ApacheFlink
WE ARE HIRING
data-artisans.com/careers
© 2018 data Artisans45
FREE TRIAL DOWNLOAD
data-artisans.com/download
© 2018 data Artisans46
DOWNLOAD REPORT
data-artisans.com/download-report-
stream-processing-da-platform-apache-flink
Stream processing for real-time businesses
powered by Apache FlinkⓇ
© 2018 data Artisans47
BACKUP
© 2018 data Artisans48
Akka
2010 – 0.5, first public release

More Related Content

What's hot (20)

PPTX
Data Stream Processing with Apache Flink
Fabian Hueske
 
PPTX
Apache Flink and what it is used for
Aljoscha Krettek
 
PPTX
Flink 1.0-slides
Jamie Grier
 
PPTX
Flink Community Update December 2015: Year in Review
Robert Metzger
 
PDF
Apache Spark vs Apache Flink
AKASH SIHAG
 
PDF
Christian Kreuzfeld – Static vs Dynamic Stream Processing
Flink Forward
 
PPTX
Extending the Yahoo Streaming Benchmark
Jamie Grier
 
PPTX
Real-time Stream Processing with Apache Flink
DataWorks Summit
 
PDF
Baymeetup-FlinkResearch
Foo Sounds
 
PPTX
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Flink Forward
 
PDF
Maximilian Michels - Flink and Beam
Flink Forward
 
PPTX
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Slim Baltagi
 
PPTX
GOTO Night Amsterdam - Stream processing with Apache Flink
Robert Metzger
 
PDF
A look at Flink 1.2
Stefan Richter
 
PPTX
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Robert Metzger
 
PPTX
data Artisans Product Announcement
Flink Forward
 
PDF
Introduction to Apache Flink
datamantra
 
PDF
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Apache Flink Taiwan User Group
 
PDF
Jamie Grier - Robust Stream Processing with Apache Flink
Flink Forward
 
PPTX
Taking a look under the hood of Apache Flink's relational APIs.
Fabian Hueske
 
Data Stream Processing with Apache Flink
Fabian Hueske
 
Apache Flink and what it is used for
Aljoscha Krettek
 
Flink 1.0-slides
Jamie Grier
 
Flink Community Update December 2015: Year in Review
Robert Metzger
 
Apache Spark vs Apache Flink
AKASH SIHAG
 
Christian Kreuzfeld – Static vs Dynamic Stream Processing
Flink Forward
 
Extending the Yahoo Streaming Benchmark
Jamie Grier
 
Real-time Stream Processing with Apache Flink
DataWorks Summit
 
Baymeetup-FlinkResearch
Foo Sounds
 
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Flink Forward
 
Maximilian Michels - Flink and Beam
Flink Forward
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Slim Baltagi
 
GOTO Night Amsterdam - Stream processing with Apache Flink
Robert Metzger
 
A look at Flink 1.2
Stefan Richter
 
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Robert Metzger
 
data Artisans Product Announcement
Flink Forward
 
Introduction to Apache Flink
datamantra
 
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Apache Flink Taiwan User Group
 
Jamie Grier - Robust Stream Processing with Apache Flink
Flink Forward
 
Taking a look under the hood of Apache Flink's relational APIs.
Fabian Hueske
 

Similar to The Evolution of (Open Source) Data Processing (20)

PDF
Introducing Kafka's Streams API
confluent
 
PPTX
Presto for the Enterprise @ Hadoop Meetup
Wojciech Biela
 
PPTX
The structured streaming upgrade to Apache Spark and how enterprises can bene...
Impetus Technologies
 
PDF
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Databricks
 
PDF
GSJUG: Mastering Data Streaming Pipelines 09May2023
Timothy Spann
 
PDF
Visualizing Big Data in Realtime
DataWorks Summit
 
PPTX
[Strata] Sparkta
Stratio
 
PPTX
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
DataWorks Summit/Hadoop Summit
 
PPTX
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Flink Forward
 
PDF
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Paco Nathan
 
PDF
ASPgems - kappa architecture
Juantomás García Molina
 
PPTX
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
Deepak Chandramouli
 
PPTX
Lambda architecture with Spark
Vincent GALOPIN
 
PPTX
SETCON'18 - Ilya labacheuski - GraphQL adventures
Nadzeya Pus
 
PPTX
Connecting your .Net Applications to NoSQL Databases - MongoDB & Cassandra
Lohith Goudagere Nagaraj
 
PDF
Streaming Sensor Data Slides_Virender
vithakur
 
PDF
Presto @ Zalando - Big Data Tech Warsaw 2020
Piotr Findeisen
 
ODP
Lambda Architecture with Spark
Knoldus Inc.
 
PDF
Avoiding Common Pitfalls: Spark Structured Streaming with Kafka
HostedbyConfluent
 
PDF
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
Timothy Spann
 
Introducing Kafka's Streams API
confluent
 
Presto for the Enterprise @ Hadoop Meetup
Wojciech Biela
 
The structured streaming upgrade to Apache Spark and how enterprises can bene...
Impetus Technologies
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Databricks
 
GSJUG: Mastering Data Streaming Pipelines 09May2023
Timothy Spann
 
Visualizing Big Data in Realtime
DataWorks Summit
 
[Strata] Sparkta
Stratio
 
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
DataWorks Summit/Hadoop Summit
 
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Flink Forward
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Paco Nathan
 
ASPgems - kappa architecture
Juantomás García Molina
 
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
Deepak Chandramouli
 
Lambda architecture with Spark
Vincent GALOPIN
 
SETCON'18 - Ilya labacheuski - GraphQL adventures
Nadzeya Pus
 
Connecting your .Net Applications to NoSQL Databases - MongoDB & Cassandra
Lohith Goudagere Nagaraj
 
Streaming Sensor Data Slides_Virender
vithakur
 
Presto @ Zalando - Big Data Tech Warsaw 2020
Piotr Findeisen
 
Lambda Architecture with Spark
Knoldus Inc.
 
Avoiding Common Pitfalls: Spark Structured Streaming with Kafka
HostedbyConfluent
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
Timothy Spann
 
Ad

More from Aljoscha Krettek (13)

PPTX
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
Aljoscha Krettek
 
PPTX
(Past), Present, and Future of Apache Flink
Aljoscha Krettek
 
PPTX
Python Streaming Pipelines with Beam on Flink
Aljoscha Krettek
 
PPTX
The Past, Present, and Future of Apache Flink
Aljoscha Krettek
 
PPTX
Robust stream processing with Apache Flink
Aljoscha Krettek
 
PDF
Unified stateful big data processing in Apache Beam (incubating)
Aljoscha Krettek
 
PPTX
Stream processing for the practitioner: Blueprints for common stream processi...
Aljoscha Krettek
 
PPTX
Advanced Flink Training - Design patterns for streaming applications
Aljoscha Krettek
 
PPTX
Apache Flink - A Stream Processing Engine
Aljoscha Krettek
 
PPTX
Adventures in Timespace - How Apache Flink Handles Time and Windows
Aljoscha Krettek
 
PPTX
Flink 0.10 - Upcoming Features
Aljoscha Krettek
 
PPTX
Data Analysis with Apache Flink (Hadoop Summit, 2015)
Aljoscha Krettek
 
PPTX
Apache Flink Hands-On
Aljoscha Krettek
 
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
Aljoscha Krettek
 
(Past), Present, and Future of Apache Flink
Aljoscha Krettek
 
Python Streaming Pipelines with Beam on Flink
Aljoscha Krettek
 
The Past, Present, and Future of Apache Flink
Aljoscha Krettek
 
Robust stream processing with Apache Flink
Aljoscha Krettek
 
Unified stateful big data processing in Apache Beam (incubating)
Aljoscha Krettek
 
Stream processing for the practitioner: Blueprints for common stream processi...
Aljoscha Krettek
 
Advanced Flink Training - Design patterns for streaming applications
Aljoscha Krettek
 
Apache Flink - A Stream Processing Engine
Aljoscha Krettek
 
Adventures in Timespace - How Apache Flink Handles Time and Windows
Aljoscha Krettek
 
Flink 0.10 - Upcoming Features
Aljoscha Krettek
 
Data Analysis with Apache Flink (Hadoop Summit, 2015)
Aljoscha Krettek
 
Apache Flink Hands-On
Aljoscha Krettek
 
Ad

Recently uploaded (20)

PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 

The Evolution of (Open Source) Data Processing

  • 1. Aljoscha Krettek - Co-Founder & Software Engineer at data Artisans THE EVOLUTION OF (OPEN SOURCE) DATA PROCESSING
  • 2. © 2018 data Artisans2 ABOUT DATA ARTISANS Original Creators of Apache Flink® RealTime Stream Processing Enterprise Ready
  • 3. © 2018 data Artisans3 POWERED BY APACHE FLINK
  • 4. © 2018 data Artisans4 Disclaimer: I might forget systems or misrepresent their use or when they were created.This is not intentional. Please come discuss with me afterwards!
  • 5. © 2018 data Artisans5 How do we process data and what are the systems available to us?
  • 6. © 2018 data Artisans6 PRE-HISTORIC
  • 7. © 2018 data Artisans7 Purpose-built programs Since the beginning of computers.
  • 8. © 2018 data Artisans8 Programming is kinda hard. Data analysis is only available to a small circle of programmers/engineers.
  • 9. © 2018 data Artisans9 (Big) Data Bases Since the 1970s
  • 10. © 2018 data Artisans10 SQL is approachable to a wider range of people. Data analysis is no longer restricted to “programmers”. There are even tools that create SQL: BI tools and whatnot.
  • 11. © 2018 data Artisans11 Application Services talking to data bases, event-driven applications Since quite a while… 😉
  • 12. © 2018 data Artisans12 THE ADVENT OF BIG DATA
  • 13. © 2018 data Artisans13 MapReduce 2004
  • 14. © 2018 data Artisans14 Apache Hadoop® 2006
  • 15. © 2018 data Artisans15 Store first, ask questions later* * we’ll get back to this later
  • 16. © 2018 data Artisans16 Programming is kinda hard. Data analysis is only available to a small circle of programmers/engineers.
  • 17. © 2018 data Artisans17 Apache Hive™ 2009 Apache Pig™ 2008 *it’s tricky with release dates and when they incubated and whatnot
  • 18. © 2018 data Artisans18 SQL is approachable to a wider range of people. Data analysis is no longer restricted to “programmers”. There are even tools that create SQL: BI tools and whatnot.
  • 19. © 2018 data Artisans19 Apache Spark™ 2012? – non-apache release 2014 – first apache release
  • 20. © 2018 data Artisans20 THE RISE OF STREAMING
  • 21. © 2018 data Artisans21 Apache Storm™ 2011 – first non-apache release 2014 – Storm 0.9.1, first Apache release
  • 22. © 2018 data Artisans22 Apache Kafka® 2011 – non-apache release 2013 – first apache release
  • 23. © 2018 data Artisans23 Lambda Architecture At some point in between. Was a bit of a dead end.
  • 24. © 2018 data Artisans24 Apache Flink® 2010 - under the name Stratosphere 2014 - Flink 0.6, first Apache release 2015 – Flink 0.9, first release with exactly-once stream processing
  • 25. © 2018 data Artisans25 Reliable Stream Processing No more need for the lambda architecture.
  • 26. © 2018 data Artisans26 Ask questions first, then wait for things to happen* * i.e., we put in place a program, and get real- time updates when things happen
  • 27. © 2018 data Artisans27 And of course… Programming this was hard. Then we had “SQL” on streams.
  • 28. © 2018 data Artisans28 APACHE FLINK
  • 29. © 2018 data Artisans29 batch streaming analytics & continuous processing event-driven applications offline real-time The processing landscape
  • 30. © 2018 data Artisans30 What’s in a processing system/framework? 1. Engine 2. APIs 3. Connectors
  • 31. © 2018 data Artisans31 1. Flink Engine Deployment • YARN • Mesos • Kubernetes • Resource elasticity Stateful stream processing • Network shuffle • State & timers • Fault tolerance • Exactly once • Savepoints
  • 32. © 2018 data Artisans32 2. Flink APIs DataSet API DataStream API Table API/SQL and more …
  • 33. © 2018 data Artisans33 2. Flink APIs – DataStream API • Stateful stream processing • Windowing • State & timers • Complete control over what is going on
  • 34. © 2018 data Artisans34 2. Flink APIs –Table API/SQL • Declarative/relational API • “No programming required” SQL (ANSI SQL) • Same SQL for batch and streaming • Pluggable connectors / data formats
  • 35. © 2018 data Artisans35 https://data-artisans.com/blog/flink-sql-powerful-querying-of-data-streams
  • 36. © 2018 data Artisans36 3. Flink Connectors The usual suspects: Kafka, Kinesis, HDFS/S3, Elasticsearch, Cassandra, … Table API / SQL has a modular library of connectors & formats that can be extended by users.
  • 37. © 2018 data Artisans37 SQL connector definition - name:TaxiRides type: source update-mode: append schema: - name: rideId type: LONG - name: rowTime type:TIMESTAMP rowtime: timestamps: type: "from-field" from: "rideTime" watermarks: type: "periodic-bounded" delay: "60000" - name: isStart type: BOOLEAN - name: lon type: FLOAT - name: lat type: FLOAT - name: taxiId type: LONG - name: driverId type: LONG - name: psgCnt type: INT connector: property-version: 1 type: kafka version: 0.11 topic:TaxiRides startup-mode: earliest-offset properties: - key: zookeeper.connect value: zookeeper:2181 - key: bootstrap.servers value: kafka:9092 - key: group.id value: testGroup format: property-version: 1 type: json schema: "ROW(rideId LONG, isStart, BOOLEAN, rideTimeTIMESTAMP, lon FLOAT, lat FLOAT, psgCnt INT, taxiId LONG, driverId LONG)"
  • 38. © 2018 data Artisans38 DataSetAPI DataStreamAPI TableAPI / SQL batch streaming analytics & continuous processing event-driven applications offline real-time
  • 39. © 2018 data Artisans39 What’s the next evolution?
  • 40. © 2018 data Artisans40 DataSetAPI DataStreamAPI TableAPI / SQL batch streaming analytics & continuous processing event-driven applications offline real-time * this is where we are now Different algorithms/data structures optimized for the use case.
  • 41. © 2018 data Artisans41 Grand Unification Truly unified runtime that adapts to the workload. Seamless integration of batch and streaming data sources.
  • 42. © 2018 data Artisans42 DataStreamAPITableAPI / SQL batch streaming analytics & continuous processing event-driven applications offline real-time * possible future evolution
  • 43. © 2018 data Artisans43 http://flink.apache.org
  • 45. © 2018 data Artisans45 FREE TRIAL DOWNLOAD data-artisans.com/download
  • 46. © 2018 data Artisans46 DOWNLOAD REPORT data-artisans.com/download-report- stream-processing-da-platform-apache-flink Stream processing for real-time businesses powered by Apache FlinkⓇ
  • 47. © 2018 data Artisans47 BACKUP
  • 48. © 2018 data Artisans48 Akka 2010 – 0.5, first public release

Editor's Notes

  • #3: • data Artisans was founded by the original creators of Apache Flink • We provide dA Platform, a complete stream processing infrastructure with open-source Apache Flink
  • #4: • These companies are among many users of Apache Flink, and during this conference you’ll meet folks from some of these companies as well as others using Flink. • If your company would like to be represented on the “Powered by Apache Flink” page, email me.
  • #10: Think Oracle IBM DB2 PostgreSQL MySQL Also think data warehouses, BI tools
  • #22: There have been other stream processing systems before this but Storm was the most popular, widely used. Open-sourced after being acquired by Twitter.
  • #30: Batch is really just batch Most streaming analytics cases you can solve by doing more batches Event-driven applications need a streaming system
  • #36: https://data-artisans.com/blog/flink-sql-powerful-querying-of-data-streams
  • #39: Batch is really just batch Most streaming analytics cases you can solve by doing more batches Event-driven applications need a streaming system
  • #41: Batch is really just batch Most streaming analytics cases you can solve by doing more batches Event-driven applications need a streaming system
  • #43: Batch is really just batch Most streaming analytics cases you can solve by doing more batches Event-driven applications need a streaming system
  • #46: • Also included is the Application Manager, which turns dA Platform into a self-service platform for stateful stream processing applications. • dA Platform is generally available, and you can download a free trial today!
  • #47: (Optional slide – may not be appropriate for advanced audience. Helps us capture leads.)