SlideShare a Scribd company logo
Writing an interactive interface for SQL on Flink
How and why we created SQLStreamBuilder—and the lessons learned along the way
Kenny Gorman
Co-Founder and CEO
www.eventador.io
2019 Flink Forward Berlin
Background and motivations
● Eventador.io has offered a managed Flink runtime for a few years now. We
started to see some customer patterns emerge.
● The state of the art today is to write Flink jobs in Java or Scala using the
DataStream/Set API and/or the Table API’s.
● While powerful, the time and expertise needed isn’t trivial. Adoption and time to
market lags.
● Teams are busy writing code. Completely swamped to be precise.
Why SQL anyway?
● SQL is > 30 years old. It’s massively useful for inspecting and reasoning about
data. Everyone knows SQL.
● It’s declarative, just ask for what you want to see.
● It’s been extended to accommodate streaming constructs like windows
(Flink/Calcite).
● Streaming SQL never completes, it’s a query on boundless data.
● It’s an amazing way to interact with streaming data.
Of workloads could be represented
with SQL, and we plan to grow that.
Require more complex logic best
represented in Java/Scala.
80%
20%
What if we could go beyond simply building
processors in SQL - do it interactively, manage
schema’s and make it all easy?
Could building logic on streams be as productive
and intuitive as using a database yet as scalable
and powerful as Flink?
Eventador SQLStreamBuilder
● Interactive SQL editor - create and submit any Flink compatible SQL
● Virtual Table Registry - source/sink + schema definition
● Query Parser - Gives instant feedback
● Job payload management - Builds job payloads
● Flink runner - Takes the payload and runs the job
● Delivered as a cloud service - in your AWS account
Feedback on SQL
execution
Where do I send results?
Where to run the job
The SQL statement Sampling rather than a
result-set
A sample of results
in browser
Schema management - Virtual Table Registry
● SQL requires a schema of typed
columns - streams don’t have have to
have this.
● It’s common to use AVRO (easy to solve
for) but also free-form JSON
● Free form means - a total F**ing mess.
● Sources - Kafka/Kinesis (soon)
● Sinks - Kafka,S3, JDBC, ELK (soon)
SQLStreamBuilder Components
● Interactive SQL interface
○ Handles query creation and submission.
○ Handles feedback from SQLIO
○ Interface to build queries, sources and sinks
○ Python + Vue.js
○ Results are sampled back to interface
● SQL engine (SQLIO)
○ Parse incoming statements
○ Map data sources/sinks
○ Parse schema (Schema Registry+AVRO / JSON)
○ Build POJOs
○ Submit payload to runner (Flink)
○ Java
● Virtual Table Registry
○ Creation of schema for streams
○ AVRO + JSON
○ Python
SQLStreamBuilder (con’t)
● Job Management Interface
○ Stop/Start/Edit/etc
○ Python + Vue.js
○ Uses Flink APIs
● Builder
○ Handles creation of assets via K8’s
○ Python
○ PostgreSQL backend
○ Kubernetes orchestration
● Flink runner
○ Run jobs on Flink 1.8.2
○ Kubernetes orchestration
○ Any Flink compatible SQL statement
Query Lifecycle - Parse
HTTPS POST {
“schema”: { },
“topic”: “myTopic”,
“virtualTable: “myTable”,
“sqlStmt”: “SELECT ….”
}
SQLIO
RESULT (if error)
{ “errorMsg”: “unable to find table myTable” }
final StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
...
Table parse = mytopic
.select()
.sort()
...
- Validate SQL against Flink/Calcite
- JSON -> AVRO
- AVRO Schema
- Generate Classes (POJO’s)
SQL console
Query Lifecycle - Execute
SQLIO
Apache Kafka / Socket.io
SQL console
Column Column Column
Value Value Value
- If class exists
- class, method, params
.connect(
new Kafka()
.version("0.11")
.topic("...")
.sinkPartitionerXX
result.writeToSink(..);
env.execute(..);
- Enhanced schema typing
- Enhanced feedback/logging
- Sends base64 encoded payload to Flink
Job
SAMPLE THE DATA TO USER
SQL join streams from multiple clusters/types
Write to multiple types of sinks, building complex
processing pipelines
Aggregate data before pushing to expensive/slow
database endpoints
Conditionally write to multiple S3 buckets
Building Processing Environments
SELECT * FROM sensors
JOIN account_info ON ...
SELECT sensorid, max(temp)
FROM stream
GROUP BY sensorid, tumble(..)
SELECT sensorid, region
FROM stream
WHERE region IN [...]
s3://xxx/yyys3://xxx/yyy
sms
SELECT * FROM table
WHERE user_selected_thing = ‘foo’;
SELECT sensorid, message
FROM stream
WHERE is_alert = ‘t’ Data Science
ML Team(s)
SnowFlakeDB
or other Data
Warehouse
Javascript User Functions - Introduced Today
function ICAO_lookup(icao) {
try {
var c = new java.net.URL('http://tornado.beebe.cc/' + icao).openConnection();
c.requestMethod='GET';
var reader = new java.io.BufferedReader(new java.io.InputStreamReader(c.inputStream));
return reader.readLine();
} catch(err) {
return "Unknown: " + err;
}
}
ICAO_lookup($p0);
Whats next?
We are hiring
Number of changes for
community evaluation
Whats Next?
● Read Consistent (Snapshot) Sink
● Auto-detect and type schema
● SQL API
● Intelligent Auto-scale
● AWS Spot instance support/management
Streams
Are
The
Database
Thank You
hello@eventador.io
Ad

More Related Content

What's hot (20)

Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward
 
Building an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowBuilding an analytics workflow using Apache Airflow
Building an analytics workflow using Apache Airflow
Yohei Onishi
 
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward
 
Enhancements in Java 9 Streams
Enhancements in Java 9 StreamsEnhancements in Java 9 Streams
Enhancements in Java 9 Streams
Corneil du Plessis
 
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming APIFlink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward
 
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Flink Forward
 
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Flink Forward
 
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward
 
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetupKafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Mingmin Chen
 
AIRflow at Scale
AIRflow at ScaleAIRflow at Scale
AIRflow at Scale
Digital Vidya
 
Apache Airflow overview
Apache Airflow overviewApache Airflow overview
Apache Airflow overview
NikolayGrishchenkov
 
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward
 
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward
 
Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019
Thomas Weise
 
Airflow introduction
Airflow introductionAirflow introduction
Airflow introduction
Chandler Huang
 
SAIS2018 - Fact Store At Netflix Scale
SAIS2018 - Fact Store At Netflix ScaleSAIS2018 - Fact Store At Netflix Scale
SAIS2018 - Fact Store At Netflix Scale
Nitin S
 
Real-Time Stream Processing with KSQL and Apache Kafka
Real-Time Stream Processing with KSQL and Apache KafkaReal-Time Stream Processing with KSQL and Apache Kafka
Real-Time Stream Processing with KSQL and Apache Kafka
confluent
 
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
Spark Summit
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analytics
Xiang Fu
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
Knoldus Inc.
 
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward
 
Building an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowBuilding an analytics workflow using Apache Airflow
Building an analytics workflow using Apache Airflow
Yohei Onishi
 
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward
 
Enhancements in Java 9 Streams
Enhancements in Java 9 StreamsEnhancements in Java 9 Streams
Enhancements in Java 9 Streams
Corneil du Plessis
 
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming APIFlink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward
 
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Flink Forward
 
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Flink Forward
 
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward
 
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetupKafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Mingmin Chen
 
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward
 
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward
 
Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019
Thomas Weise
 
SAIS2018 - Fact Store At Netflix Scale
SAIS2018 - Fact Store At Netflix ScaleSAIS2018 - Fact Store At Netflix Scale
SAIS2018 - Fact Store At Netflix Scale
Nitin S
 
Real-Time Stream Processing with KSQL and Apache Kafka
Real-Time Stream Processing with KSQL and Apache KafkaReal-Time Stream Processing with KSQL and Apache Kafka
Real-Time Stream Processing with KSQL and Apache Kafka
confluent
 
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
Spark Summit
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analytics
Xiang Fu
 

Similar to Writing an Interactive Interface for SQL on Flink (20)

Introduction to SQLStreamBuilder: Rich Streaming SQL Interface for Creating a...
Introduction to SQLStreamBuilder: Rich Streaming SQL Interface for Creating a...Introduction to SQLStreamBuilder: Rich Streaming SQL Interface for Creating a...
Introduction to SQLStreamBuilder: Rich Streaming SQL Interface for Creating a...
Eventador
 
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Thomas Weise
 
Google cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache FlinkGoogle cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache Flink
Iván Fernández Perea
 
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward
 
Couchbase@live person meetup july 22nd
Couchbase@live person meetup   july 22ndCouchbase@live person meetup   july 22nd
Couchbase@live person meetup july 22nd
Ido Shilon
 
XStream: stream processing platform at facebook
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebook
Aniket Mokashi
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
Chris Baynes
 
OpenLineage for Stream Processing | Kafka Summit London
OpenLineage for Stream Processing | Kafka Summit LondonOpenLineage for Stream Processing | Kafka Summit London
OpenLineage for Stream Processing | Kafka Summit London
HostedbyConfluent
 
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
HostedbyConfluent
 
OpenDaylight and YANG
OpenDaylight and YANGOpenDaylight and YANG
OpenDaylight and YANG
CoreStack
 
Apache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's NextApache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's Next
Prateek Maheshwari
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyft
markgrover
 
Understanding Framework Architecture using Eclipse
Understanding Framework Architecture using EclipseUnderstanding Framework Architecture using Eclipse
Understanding Framework Architecture using Eclipse
anshunjain
 
Experiences with Evangelizing Java Within the Database
Experiences with Evangelizing Java Within the DatabaseExperiences with Evangelizing Java Within the Database
Experiences with Evangelizing Java Within the Database
Marcelo Ochoa
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Guido Schmutz
 
Road to sbt 1.0 paved with server
Road to sbt 1.0   paved with serverRoad to sbt 1.0   paved with server
Road to sbt 1.0 paved with server
Eugene Yokota
 
Real-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNKReal-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNK
Data Con LA
 
Automate the operation of your Oracle Cloud infrastructure v2.0
Automate the operation of your Oracle Cloud infrastructure v2.0Automate the operation of your Oracle Cloud infrastructure v2.0
Automate the operation of your Oracle Cloud infrastructure v2.0
Nelson Calero
 
Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...
Claus Ibsen
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
Introduction to SQLStreamBuilder: Rich Streaming SQL Interface for Creating a...
Introduction to SQLStreamBuilder: Rich Streaming SQL Interface for Creating a...Introduction to SQLStreamBuilder: Rich Streaming SQL Interface for Creating a...
Introduction to SQLStreamBuilder: Rich Streaming SQL Interface for Creating a...
Eventador
 
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Thomas Weise
 
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward
 
Couchbase@live person meetup july 22nd
Couchbase@live person meetup   july 22ndCouchbase@live person meetup   july 22nd
Couchbase@live person meetup july 22nd
Ido Shilon
 
XStream: stream processing platform at facebook
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebook
Aniket Mokashi
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
Chris Baynes
 
OpenLineage for Stream Processing | Kafka Summit London
OpenLineage for Stream Processing | Kafka Summit LondonOpenLineage for Stream Processing | Kafka Summit London
OpenLineage for Stream Processing | Kafka Summit London
HostedbyConfluent
 
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
HostedbyConfluent
 
OpenDaylight and YANG
OpenDaylight and YANGOpenDaylight and YANG
OpenDaylight and YANG
CoreStack
 
Apache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's NextApache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's Next
Prateek Maheshwari
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyft
markgrover
 
Understanding Framework Architecture using Eclipse
Understanding Framework Architecture using EclipseUnderstanding Framework Architecture using Eclipse
Understanding Framework Architecture using Eclipse
anshunjain
 
Experiences with Evangelizing Java Within the Database
Experiences with Evangelizing Java Within the DatabaseExperiences with Evangelizing Java Within the Database
Experiences with Evangelizing Java Within the Database
Marcelo Ochoa
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Guido Schmutz
 
Road to sbt 1.0 paved with server
Road to sbt 1.0   paved with serverRoad to sbt 1.0   paved with server
Road to sbt 1.0 paved with server
Eugene Yokota
 
Real-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNKReal-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNK
Data Con LA
 
Automate the operation of your Oracle Cloud infrastructure v2.0
Automate the operation of your Oracle Cloud infrastructure v2.0Automate the operation of your Oracle Cloud infrastructure v2.0
Automate the operation of your Oracle Cloud infrastructure v2.0
Nelson Calero
 
Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...
Claus Ibsen
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
Ad

Recently uploaded (20)

Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Modern_Distribution_Presentation.pptx Aa
Modern_Distribution_Presentation.pptx AaModern_Distribution_Presentation.pptx Aa
Modern_Distribution_Presentation.pptx Aa
MuhammadAwaisKamboh
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docxMASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
santosh162
 
How to join illuminati Agent in uganda call+256776963507/0741506136
How to join illuminati Agent in uganda call+256776963507/0741506136How to join illuminati Agent in uganda call+256776963507/0741506136
How to join illuminati Agent in uganda call+256776963507/0741506136
illuminati Agent uganda call+256776963507/0741506136
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptxISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
pankaj6188303
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
AWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATA
AWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATAAWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATA
AWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATA
SnehaBoja
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia
Alexander Romero Arosquipa
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Modern_Distribution_Presentation.pptx Aa
Modern_Distribution_Presentation.pptx AaModern_Distribution_Presentation.pptx Aa
Modern_Distribution_Presentation.pptx Aa
MuhammadAwaisKamboh
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docxMASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
santosh162
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptxISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
pankaj6188303
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
AWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATA
AWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATAAWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATA
AWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATA
SnehaBoja
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Ad

Writing an Interactive Interface for SQL on Flink

  • 1. Writing an interactive interface for SQL on Flink How and why we created SQLStreamBuilder—and the lessons learned along the way Kenny Gorman Co-Founder and CEO www.eventador.io 2019 Flink Forward Berlin
  • 2. Background and motivations ● Eventador.io has offered a managed Flink runtime for a few years now. We started to see some customer patterns emerge. ● The state of the art today is to write Flink jobs in Java or Scala using the DataStream/Set API and/or the Table API’s. ● While powerful, the time and expertise needed isn’t trivial. Adoption and time to market lags. ● Teams are busy writing code. Completely swamped to be precise.
  • 3. Why SQL anyway? ● SQL is > 30 years old. It’s massively useful for inspecting and reasoning about data. Everyone knows SQL. ● It’s declarative, just ask for what you want to see. ● It’s been extended to accommodate streaming constructs like windows (Flink/Calcite). ● Streaming SQL never completes, it’s a query on boundless data. ● It’s an amazing way to interact with streaming data.
  • 4. Of workloads could be represented with SQL, and we plan to grow that. Require more complex logic best represented in Java/Scala. 80% 20%
  • 5. What if we could go beyond simply building processors in SQL - do it interactively, manage schema’s and make it all easy? Could building logic on streams be as productive and intuitive as using a database yet as scalable and powerful as Flink?
  • 6. Eventador SQLStreamBuilder ● Interactive SQL editor - create and submit any Flink compatible SQL ● Virtual Table Registry - source/sink + schema definition ● Query Parser - Gives instant feedback ● Job payload management - Builds job payloads ● Flink runner - Takes the payload and runs the job ● Delivered as a cloud service - in your AWS account
  • 7. Feedback on SQL execution Where do I send results? Where to run the job The SQL statement Sampling rather than a result-set A sample of results in browser
  • 8. Schema management - Virtual Table Registry ● SQL requires a schema of typed columns - streams don’t have have to have this. ● It’s common to use AVRO (easy to solve for) but also free-form JSON ● Free form means - a total F**ing mess. ● Sources - Kafka/Kinesis (soon) ● Sinks - Kafka,S3, JDBC, ELK (soon)
  • 9. SQLStreamBuilder Components ● Interactive SQL interface ○ Handles query creation and submission. ○ Handles feedback from SQLIO ○ Interface to build queries, sources and sinks ○ Python + Vue.js ○ Results are sampled back to interface ● SQL engine (SQLIO) ○ Parse incoming statements ○ Map data sources/sinks ○ Parse schema (Schema Registry+AVRO / JSON) ○ Build POJOs ○ Submit payload to runner (Flink) ○ Java ● Virtual Table Registry ○ Creation of schema for streams ○ AVRO + JSON ○ Python
  • 10. SQLStreamBuilder (con’t) ● Job Management Interface ○ Stop/Start/Edit/etc ○ Python + Vue.js ○ Uses Flink APIs ● Builder ○ Handles creation of assets via K8’s ○ Python ○ PostgreSQL backend ○ Kubernetes orchestration ● Flink runner ○ Run jobs on Flink 1.8.2 ○ Kubernetes orchestration ○ Any Flink compatible SQL statement
  • 11. Query Lifecycle - Parse HTTPS POST { “schema”: { }, “topic”: “myTopic”, “virtualTable: “myTable”, “sqlStmt”: “SELECT ….” } SQLIO RESULT (if error) { “errorMsg”: “unable to find table myTable” } final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); ... Table parse = mytopic .select() .sort() ... - Validate SQL against Flink/Calcite - JSON -> AVRO - AVRO Schema - Generate Classes (POJO’s) SQL console
  • 12. Query Lifecycle - Execute SQLIO Apache Kafka / Socket.io SQL console Column Column Column Value Value Value - If class exists - class, method, params .connect( new Kafka() .version("0.11") .topic("...") .sinkPartitionerXX result.writeToSink(..); env.execute(..); - Enhanced schema typing - Enhanced feedback/logging - Sends base64 encoded payload to Flink Job SAMPLE THE DATA TO USER
  • 13. SQL join streams from multiple clusters/types Write to multiple types of sinks, building complex processing pipelines Aggregate data before pushing to expensive/slow database endpoints Conditionally write to multiple S3 buckets
  • 14. Building Processing Environments SELECT * FROM sensors JOIN account_info ON ... SELECT sensorid, max(temp) FROM stream GROUP BY sensorid, tumble(..) SELECT sensorid, region FROM stream WHERE region IN [...] s3://xxx/yyys3://xxx/yyy sms SELECT * FROM table WHERE user_selected_thing = ‘foo’; SELECT sensorid, message FROM stream WHERE is_alert = ‘t’ Data Science ML Team(s) SnowFlakeDB or other Data Warehouse
  • 15. Javascript User Functions - Introduced Today function ICAO_lookup(icao) { try { var c = new java.net.URL('http://tornado.beebe.cc/' + icao).openConnection(); c.requestMethod='GET'; var reader = new java.io.BufferedReader(new java.io.InputStreamReader(c.inputStream)); return reader.readLine(); } catch(err) { return "Unknown: " + err; } } ICAO_lookup($p0);
  • 17. We are hiring Number of changes for community evaluation
  • 18. Whats Next? ● Read Consistent (Snapshot) Sink ● Auto-detect and type schema ● SQL API ● Intelligent Auto-scale ● AWS Spot instance support/management