Writing an Interactive Interface for SQL on Flink

Writing an interactive interface for SQL on Flink
How and why we created SQLStreamBuilder—and the lessons learned along the way
Kenny Gorman
Co-Founder and CEO
www.eventador.io
2019 Flink Forward Berlin

Background and motivations
● Eventador.io has offered a managed Flink runtime for a few years now. We
started to see some customer patterns emerge.
● The state of the art today is to write Flink jobs in Java or Scala using the
DataStream/Set API and/or the Table API’s.
● While powerful, the time and expertise needed isn’t trivial. Adoption and time to
market lags.
● Teams are busy writing code. Completely swamped to be precise.

Why SQL anyway?
● SQL is > 30 years old. It’s massively useful for inspecting and reasoning about
data. Everyone knows SQL.
● It’s declarative, just ask for what you want to see.
● It’s been extended to accommodate streaming constructs like windows
(Flink/Calcite).
● Streaming SQL never completes, it’s a query on boundless data.
● It’s an amazing way to interact with streaming data.

Of workloads could be represented
with SQL, and we plan to grow that.
Require more complex logic best
represented in Java/Scala.
80%
20%

What if we could go beyond simply building
processors in SQL - do it interactively, manage
schema’s and make it all easy?
Could building logic on streams be as productive
and intuitive as using a database yet as scalable
and powerful as Flink?

Eventador SQLStreamBuilder
● Interactive SQL editor - create and submit any Flink compatible SQL
● Virtual Table Registry - source/sink + schema definition
● Query Parser - Gives instant feedback
● Job payload management - Builds job payloads
● Flink runner - Takes the payload and runs the job
● Delivered as a cloud service - in your AWS account

Feedback on SQL
execution
Where do I send results?
Where to run the job
The SQL statement Sampling rather than a
result-set
A sample of results
in browser

Schema management - Virtual Table Registry
● SQL requires a schema of typed
columns - streams don’t have have to
have this.
● It’s common to use AVRO (easy to solve
for) but also free-form JSON
● Free form means - a total F**ing mess.
● Sources - Kafka/Kinesis (soon)
● Sinks - Kafka,S3, JDBC, ELK (soon)

SQLStreamBuilder Components
● Interactive SQL interface
○ Handles query creation and submission.
○ Handles feedback from SQLIO
○ Interface to build queries, sources and sinks
○ Python + Vue.js
○ Results are sampled back to interface
● SQL engine (SQLIO)
○ Parse incoming statements
○ Map data sources/sinks
○ Parse schema (Schema Registry+AVRO / JSON)
○ Build POJOs
○ Submit payload to runner (Flink)
○ Java
● Virtual Table Registry
○ Creation of schema for streams
○ AVRO + JSON
○ Python

SQLStreamBuilder (con’t)
● Job Management Interface
○ Stop/Start/Edit/etc
○ Python + Vue.js
○ Uses Flink APIs
● Builder
○ Handles creation of assets via K8’s
○ Python
○ PostgreSQL backend
○ Kubernetes orchestration
● Flink runner
○ Run jobs on Flink 1.8.2
○ Kubernetes orchestration
○ Any Flink compatible SQL statement

Query Lifecycle - Parse
HTTPS POST {
“schema”: { },
“topic”: “myTopic”,
“virtualTable: “myTable”,
“sqlStmt”: “SELECT ….”
}
SQLIO
RESULT (if error)
{ “errorMsg”: “unable to find table myTable” }
final StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
...
Table parse = mytopic
.select()
.sort()
...
- Validate SQL against Flink/Calcite
- JSON -> AVRO
- AVRO Schema
- Generate Classes (POJO’s)
SQL console

Query Lifecycle - Execute
SQLIO
Apache Kafka / Socket.io
SQL console
Column Column Column
Value Value Value
- If class exists
- class, method, params
.connect(
new Kafka()
.version("0.11")
.topic("...")
.sinkPartitionerXX
result.writeToSink(..);
env.execute(..);
- Enhanced schema typing
- Enhanced feedback/logging
- Sends base64 encoded payload to Flink
Job
SAMPLE THE DATA TO USER

SQL join streams from multiple clusters/types
Write to multiple types of sinks, building complex
processing pipelines
Aggregate data before pushing to expensive/slow
database endpoints
Conditionally write to multiple S3 buckets

Building Processing Environments
SELECT * FROM sensors
JOIN account_info ON ...
SELECT sensorid, max(temp)
FROM stream
GROUP BY sensorid, tumble(..)
SELECT sensorid, region
FROM stream
WHERE region IN [...]
s3://xxx/yyys3://xxx/yyy
sms
SELECT * FROM table
WHERE user_selected_thing = ‘foo’;
SELECT sensorid, message
FROM stream
WHERE is_alert = ‘t’ Data Science
ML Team(s)
SnowFlakeDB
or other Data
Warehouse

Javascript User Functions - Introduced Today
function ICAO_lookup(icao) {
try {
var c = new java.net.URL('http://tornado.beebe.cc/' + icao).openConnection();
c.requestMethod='GET';
var reader = new java.io.BufferedReader(new java.io.InputStreamReader(c.inputStream));
return reader.readLine();
} catch(err) {
return "Unknown: " + err;
}
}
ICAO_lookup($p0);

We are hiring
Number of changes for
community evaluation

Whats Next?
● Read Consistent (Snapshot) Sink
● Auto-detect and type schema
● SQL API
● Intelligent Auto-scale
● AWS Spot instance support/management

Writing an Interactive Interface for SQL on Flink

Recommended

More Related Content

What's hot (20)

Similar to Writing an Interactive Interface for SQL on Flink (20)

Recently uploaded (20)

Writing an Interactive Interface for SQL on Flink