SlideShare a Scribd company logo
Continuous SQL with SQL Stream Builder
Kenny Gorman - Product Owner
Timothy Spann - Principal DataFlow Field Engineer
John Kuchmek - Senior Solutions Engineer
06-May-2021
https://www.meetup.com/futureofdata-newyork/
@PaasDev
© 2021 Cloudera, Inc. All rights reserved. 2
Welcome to Future of Data - Virtual
Princeton Future of Data Meetup
New York Future of Data Meetup
Philadelphia Future of Data Meetup
From Big Data to AI to Streaming to Containers to
Cloud to Analytics to Cloud Storage to Fast Data to
Machine Learning to Microservices to ...
© 2021 Cloudera, Inc. All rights reserved. 3
AGENDA
● Introductions with Kenny, John and Tim
● Flink Quick Overview
● SQL Stream Builder Overview
● Q&A
● Demos
● Q&A - Interactive Panel Session
● Next Meetups
● Raffle
© 2021 Cloudera, Inc. All rights reserved. 4
Cloudera DataFlow Use Cases
Data Movement
Optimize resource utilization by
moving data between data centers
or between on-premises and cloud
infrastructures
e.g. intercontinental data exchange
Logging Modernization
Optimize log analytics solutions by
with CDF in simplifying log ingestion
from the edge, reducing costs and
gaining key analytics
e.g. Splunk / Logstash offload
Streaming analytics insights
Make key business decisions by
analyzing streaming data for
complex patterns, gaining
actionable intelligence etc.
e.g. Fraud detection, Network threat
analysis, app monitoring, Clickstream
analysis
360° view of customer
Ingest, transform and combine
customer data from multiple
sources into a single data view /
lake
e.g. Real-time customer offers,
Loan approvals
IoT & Edge use cases
e.g. Predictive Maintenance, Asset
Tracking / Monitoring, Patient
Monitoring, Quality Processes,
Fleet Management, Connected
Cars and more
Enterprise data management
Managing massive volumes of
high-velocity data to/from legacy
systems, ETL tools and other data
stores
e.g. Flume offload, ETL
replacement, payment data
processing, integration with Oracle
© 2021 Cloudera, Inc. All rights reserved. 5
Simplifying the User Experience
© 2021 Cloudera, Inc. All rights reserved. 6
© 2021 Cloudera, Inc. All rights reserved. 7
APACHE FLINK
Streaming real-time data pipelines
that need to handle complex
stream or batch data event
processing, analytics, and/or
support event-driven applications
USE CASE TECHNOLOGY APPLICATION
Comcast a global media uses
Flink for operationalizing
machine learning models and
near-real-time event stream
processing
Flink helps deliver a
personalized, contextual
interaction reducing time to
support resolutions saving
millions of dollars per year
Flink performs compute at
in-memory speed at any scale
Flink parses SQL using Apache
Calcite, which supports
standard ANSI SQL
Flink runs standalone, on
YARN, and has a K8s Operator
Data Freshness SLAs
Flink can read and write from
Hive data
Review requirements for fault
tolerance, resilience, and HA
Other technologies play in
this space like Hive storage
handler to connect to Kafka
CONSIDERATION
3B+ data points daily streaming in
from 25 million customers running
real time machine learning
prediction
Flink
© 2020 Cloudera, Inc. All rights reserved. 8
FLINK
FEATURES
• Distributed processing engine for
stateful computations
• 10s TBs of managed state
• Flexible and expressive APIs
• Guaranteed correctness &
Exactly-once state consistency
• Event-time semantics
• Flexible deployment & large
ecosystem (K8s, YARN, S3, HDFS..)
• Support for Flink SQL API
© 2021 Cloudera, Inc. All rights reserved. 9
DELIVERING STREAMING ANALYTICS
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. (second)
SQL
Parsing and
Blending Data
Streaming
Analytics
Both offline and
streaming data
Data Analysts Can
Write Queries
Across the Line of Businesses
Capture Events
that Matter
Low-latency analytics use
cases
Events
Processing
© 2021 Cloudera, Inc. All rights reserved. 10
MAINTAINS & CHECKPOINTS STATE
● Flink maintains state locally per task
(in-mem / on-disk)
○ Fast access!
● State is periodically checkpointed to
durable storage
○ A checkpoint is a consistent
snapshot of the state of all tasks
11
Integrated Governance
Unified Governance & Lineage
Flow Management Streams Messaging Stream Processing
Reports Entity and Lineage
information about NiFi Flows
Connects with existing Lineage
information
Topic access centrally managed
supporting granular CRUD
operations
Manage permissions on dedicated
clusters or manage multiple
clusters at once
Manage schemas centrally and
make them available to
consumers/producers
Reports Flink Apps as an operation
Lineage through integration with
existing Lineage information like
Kafka topics, HBase tables etc.
Integrated SQL and materialized
view engine via SQL Stream Builder.
© 2021 Cloudera, Inc. All rights reserved. 12
SQL Stream Builder
● Democratize data access across enterprise - anyone who
knows SQL can create powerful stream processors.
● Iterative interface - Just like SQL on databases, run queries
and reason about the data with an interactive UI.
● Leverages Apache Flink for running of SQL jobs - production
grade, scalable and high performance
● Deep integration and features above and beyond just UI
features - UDF’s, input transforms, Kafka key and time
integration, CEP framework and more.
● Create Materialized Views to integrate with downstream
components like notebooks, visualizations and
applications.
© 2021 Cloudera, Inc. All rights reserved. 13
Streaming SQL
Democratizing access to streams of data via structured query language
© 2021 Cloudera, Inc. All rights reserved. 14
Download these assets today
© 2021 Cloudera, Inc. All rights reserved. 15
TH N Y U

More Related Content

What's hot (20)

PDF
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...
HostedbyConfluent
 
PPTX
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
HostedbyConfluent
 
PDF
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
HostedbyConfluent
 
PDF
Fan-out, fan-in & the multiplexer: Replication recipes for global platform di...
HostedbyConfluent
 
PPTX
EDA Governance Model: a multicloud approach based on GitOps | Alejandro Alija...
HostedbyConfluent
 
PDF
How to Discover, Visualize, Catalog, Share and Reuse your Kafka Streams (Jona...
HostedbyConfluent
 
PDF
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
HostedbyConfluent
 
PDF
Airbyte @ Airflow Summit - The new modern data stack
Michel Tricot
 
PDF
Enterprise Metadata Integration
Dr. Mirko Kämpf
 
PPTX
Introducing Events and Stream Processing into Nationwide Building Society
confluent
 
PPTX
Transform Your Mainframe Data for the Cloud with Precisely and Apache Kafka
Precisely
 
PPTX
Databus - LinkedIn's Change Data Capture Pipeline
Sunil Nagaraj
 
PPTX
Modernizing your Application Architecture with Microservices
confluent
 
PDF
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
DATAVERSITY
 
PDF
Transform Your Mainframe and IBM i Data for the Cloud with Precisely and Apac...
HostedbyConfluent
 
PDF
Building Event-Driven Microservices using Kafka Streams (Stathis Souris, Thou...
London Microservices
 
PDF
Kafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC Federal
HostedbyConfluent
 
PPTX
Kafka Summit NYC 2017 - Achieving Predictability and Compliance with BNY Mell...
confluent
 
PPTX
Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...
Pat Patterson
 
PDF
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
HostedbyConfluent
 
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...
HostedbyConfluent
 
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
HostedbyConfluent
 
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
HostedbyConfluent
 
Fan-out, fan-in & the multiplexer: Replication recipes for global platform di...
HostedbyConfluent
 
EDA Governance Model: a multicloud approach based on GitOps | Alejandro Alija...
HostedbyConfluent
 
How to Discover, Visualize, Catalog, Share and Reuse your Kafka Streams (Jona...
HostedbyConfluent
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
HostedbyConfluent
 
Airbyte @ Airflow Summit - The new modern data stack
Michel Tricot
 
Enterprise Metadata Integration
Dr. Mirko Kämpf
 
Introducing Events and Stream Processing into Nationwide Building Society
confluent
 
Transform Your Mainframe Data for the Cloud with Precisely and Apache Kafka
Precisely
 
Databus - LinkedIn's Change Data Capture Pipeline
Sunil Nagaraj
 
Modernizing your Application Architecture with Microservices
confluent
 
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
DATAVERSITY
 
Transform Your Mainframe and IBM i Data for the Cloud with Precisely and Apac...
HostedbyConfluent
 
Building Event-Driven Microservices using Kafka Streams (Stathis Souris, Thou...
London Microservices
 
Kafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC Federal
HostedbyConfluent
 
Kafka Summit NYC 2017 - Achieving Predictability and Compliance with BNY Mell...
confluent
 
Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...
Pat Patterson
 
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
HostedbyConfluent
 

Similar to Continus sql with sql stream builder (20)

PDF
Meetup - Brasil - Data In Motion - 2023 September 19
ssuser73434e
 
PDF
Meetup - Brasil - Data In Motion - 2023 September 19
Timothy Spann
 
PPTX
Streaming in the Wild with Apache Flink
DataWorks Summit/Hadoop Summit
 
PDF
BigDataFest_ Building Modern Data Streaming Apps
ssuser73434e
 
PDF
big data fest building modern data streaming apps
Timothy Spann
 
PDF
Santander Stream Processing with Apache Flink
confluent
 
PDF
Real-time Streaming Pipelines with FLaNK
Data Con LA
 
PDF
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
Timothy Spann
 
PPTX
Streaming in the Wild with Apache Flink
Kostas Tzoumas
 
PDF
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Timothy Spann
 
PPTX
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
DataWorks Summit/Hadoop Summit
 
PPTX
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Slim Baltagi
 
PPTX
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Slim Baltagi
 
PPTX
Apache Flink: Past, Present and Future
Gyula Fóra
 
PPTX
Flink history, roadmap and vision
Stephan Ewen
 
PDF
Rivivi il Data in Motion Tour Milano 2024
mtabrea
 
PDF
The Never Landing Stream with HTAP and Streaming
Timothy Spann
 
PDF
JConWorld_ Continuous SQL with Kafka and Flink
Timothy Spann
 
PDF
GSJUG: Mastering Data Streaming Pipelines 09May2023
Timothy Spann
 
PPTX
Apache flink 1.7 and Beyond
Till Rohrmann
 
Meetup - Brasil - Data In Motion - 2023 September 19
ssuser73434e
 
Meetup - Brasil - Data In Motion - 2023 September 19
Timothy Spann
 
Streaming in the Wild with Apache Flink
DataWorks Summit/Hadoop Summit
 
BigDataFest_ Building Modern Data Streaming Apps
ssuser73434e
 
big data fest building modern data streaming apps
Timothy Spann
 
Santander Stream Processing with Apache Flink
confluent
 
Real-time Streaming Pipelines with FLaNK
Data Con LA
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
Timothy Spann
 
Streaming in the Wild with Apache Flink
Kostas Tzoumas
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Timothy Spann
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
DataWorks Summit/Hadoop Summit
 
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Slim Baltagi
 
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Slim Baltagi
 
Apache Flink: Past, Present and Future
Gyula Fóra
 
Flink history, roadmap and vision
Stephan Ewen
 
Rivivi il Data in Motion Tour Milano 2024
mtabrea
 
The Never Landing Stream with HTAP and Streaming
Timothy Spann
 
JConWorld_ Continuous SQL with Kafka and Flink
Timothy Spann
 
GSJUG: Mastering Data Streaming Pipelines 09May2023
Timothy Spann
 
Apache flink 1.7 and Beyond
Till Rohrmann
 
Ad

More from Timothy Spann (20)

PDF
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
Timothy Spann
 
PDF
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
Timothy Spann
 
PDF
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
Timothy Spann
 
PDF
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
Timothy Spann
 
PDF
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
Timothy Spann
 
PDF
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
Timothy Spann
 
PDF
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
PDF
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
PDF
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
Timothy Spann
 
PDF
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
Timothy Spann
 
PPTX
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
Timothy Spann
 
PDF
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
Timothy Spann
 
PDF
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
Timothy Spann
 
PDF
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
Timothy Spann
 
PDF
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
Timothy Spann
 
PDF
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
Timothy Spann
 
PDF
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
Timothy Spann
 
PDF
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
Timothy Spann
 
PDF
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
Timothy Spann
 
PDF
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
Timothy Spann
 
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
Timothy Spann
 
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
Timothy Spann
 
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
Timothy Spann
 
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
Timothy Spann
 
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
Timothy Spann
 
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
Timothy Spann
 
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
Timothy Spann
 
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
Timothy Spann
 
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
Timothy Spann
 
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
Timothy Spann
 
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
Timothy Spann
 
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
Timothy Spann
 
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
Timothy Spann
 
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
Timothy Spann
 
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
Timothy Spann
 
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
Timothy Spann
 
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
Timothy Spann
 
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
Timothy Spann
 
Ad

Recently uploaded (20)

PDF
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
DOCX
Import Data Form Excel to Tally Services
Tally xperts
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PDF
Mobile CMMS Solutions Empowering the Frontline Workforce
CryotosCMMSSoftware
 
PDF
Streamline Contractor Lifecycle- TECH EHS Solution
TECH EHS Solution
 
PPTX
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
PDF
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
PPTX
3uTools Full Crack Free Version Download [Latest] 2025
muhammadgurbazkhan
 
PPTX
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
PPTX
How Odoo Became a Game-Changer for an IT Company in Manufacturing ERP
SatishKumar2651
 
PDF
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
PPTX
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
PPTX
MailsDaddy Outlook OST to PST converter.pptx
abhishekdutt366
 
PDF
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Imma Valls Bernaus
 
PDF
Continouous failure - Why do we make our lives hard?
Papp Krisztián
 
PDF
Powering GIS with FME and VertiGIS - Peak of Data & AI 2025
Safe Software
 
PPTX
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
 
PPTX
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
PPTX
Revolutionizing Code Modernization with AI
KrzysztofKkol1
 
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
Import Data Form Excel to Tally Services
Tally xperts
 
Tally software_Introduction_Presentation
AditiBansal54083
 
Mobile CMMS Solutions Empowering the Frontline Workforce
CryotosCMMSSoftware
 
Streamline Contractor Lifecycle- TECH EHS Solution
TECH EHS Solution
 
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
3uTools Full Crack Free Version Download [Latest] 2025
muhammadgurbazkhan
 
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
How Odoo Became a Game-Changer for an IT Company in Manufacturing ERP
SatishKumar2651
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
MailsDaddy Outlook OST to PST converter.pptx
abhishekdutt366
 
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Imma Valls Bernaus
 
Continouous failure - Why do we make our lives hard?
Papp Krisztián
 
Powering GIS with FME and VertiGIS - Peak of Data & AI 2025
Safe Software
 
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
 
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
Revolutionizing Code Modernization with AI
KrzysztofKkol1
 

Continus sql with sql stream builder

  • 1. Continuous SQL with SQL Stream Builder Kenny Gorman - Product Owner Timothy Spann - Principal DataFlow Field Engineer John Kuchmek - Senior Solutions Engineer 06-May-2021 https://www.meetup.com/futureofdata-newyork/ @PaasDev
  • 2. © 2021 Cloudera, Inc. All rights reserved. 2 Welcome to Future of Data - Virtual Princeton Future of Data Meetup New York Future of Data Meetup Philadelphia Future of Data Meetup From Big Data to AI to Streaming to Containers to Cloud to Analytics to Cloud Storage to Fast Data to Machine Learning to Microservices to ...
  • 3. © 2021 Cloudera, Inc. All rights reserved. 3 AGENDA ● Introductions with Kenny, John and Tim ● Flink Quick Overview ● SQL Stream Builder Overview ● Q&A ● Demos ● Q&A - Interactive Panel Session ● Next Meetups ● Raffle
  • 4. © 2021 Cloudera, Inc. All rights reserved. 4 Cloudera DataFlow Use Cases Data Movement Optimize resource utilization by moving data between data centers or between on-premises and cloud infrastructures e.g. intercontinental data exchange Logging Modernization Optimize log analytics solutions by with CDF in simplifying log ingestion from the edge, reducing costs and gaining key analytics e.g. Splunk / Logstash offload Streaming analytics insights Make key business decisions by analyzing streaming data for complex patterns, gaining actionable intelligence etc. e.g. Fraud detection, Network threat analysis, app monitoring, Clickstream analysis 360° view of customer Ingest, transform and combine customer data from multiple sources into a single data view / lake e.g. Real-time customer offers, Loan approvals IoT & Edge use cases e.g. Predictive Maintenance, Asset Tracking / Monitoring, Patient Monitoring, Quality Processes, Fleet Management, Connected Cars and more Enterprise data management Managing massive volumes of high-velocity data to/from legacy systems, ETL tools and other data stores e.g. Flume offload, ETL replacement, payment data processing, integration with Oracle
  • 5. © 2021 Cloudera, Inc. All rights reserved. 5 Simplifying the User Experience
  • 6. © 2021 Cloudera, Inc. All rights reserved. 6
  • 7. © 2021 Cloudera, Inc. All rights reserved. 7 APACHE FLINK Streaming real-time data pipelines that need to handle complex stream or batch data event processing, analytics, and/or support event-driven applications USE CASE TECHNOLOGY APPLICATION Comcast a global media uses Flink for operationalizing machine learning models and near-real-time event stream processing Flink helps deliver a personalized, contextual interaction reducing time to support resolutions saving millions of dollars per year Flink performs compute at in-memory speed at any scale Flink parses SQL using Apache Calcite, which supports standard ANSI SQL Flink runs standalone, on YARN, and has a K8s Operator Data Freshness SLAs Flink can read and write from Hive data Review requirements for fault tolerance, resilience, and HA Other technologies play in this space like Hive storage handler to connect to Kafka CONSIDERATION 3B+ data points daily streaming in from 25 million customers running real time machine learning prediction Flink
  • 8. © 2020 Cloudera, Inc. All rights reserved. 8 FLINK FEATURES • Distributed processing engine for stateful computations • 10s TBs of managed state • Flexible and expressive APIs • Guaranteed correctness & Exactly-once state consistency • Event-time semantics • Flexible deployment & large ecosystem (K8s, YARN, S3, HDFS..) • Support for Flink SQL API
  • 9. © 2021 Cloudera, Inc. All rights reserved. 9 DELIVERING STREAMING ANALYTICS 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. (second) SQL Parsing and Blending Data Streaming Analytics Both offline and streaming data Data Analysts Can Write Queries Across the Line of Businesses Capture Events that Matter Low-latency analytics use cases Events Processing
  • 10. © 2021 Cloudera, Inc. All rights reserved. 10 MAINTAINS & CHECKPOINTS STATE ● Flink maintains state locally per task (in-mem / on-disk) ○ Fast access! ● State is periodically checkpointed to durable storage ○ A checkpoint is a consistent snapshot of the state of all tasks
  • 11. 11 Integrated Governance Unified Governance & Lineage Flow Management Streams Messaging Stream Processing Reports Entity and Lineage information about NiFi Flows Connects with existing Lineage information Topic access centrally managed supporting granular CRUD operations Manage permissions on dedicated clusters or manage multiple clusters at once Manage schemas centrally and make them available to consumers/producers Reports Flink Apps as an operation Lineage through integration with existing Lineage information like Kafka topics, HBase tables etc. Integrated SQL and materialized view engine via SQL Stream Builder.
  • 12. © 2021 Cloudera, Inc. All rights reserved. 12 SQL Stream Builder ● Democratize data access across enterprise - anyone who knows SQL can create powerful stream processors. ● Iterative interface - Just like SQL on databases, run queries and reason about the data with an interactive UI. ● Leverages Apache Flink for running of SQL jobs - production grade, scalable and high performance ● Deep integration and features above and beyond just UI features - UDF’s, input transforms, Kafka key and time integration, CEP framework and more. ● Create Materialized Views to integrate with downstream components like notebooks, visualizations and applications.
  • 13. © 2021 Cloudera, Inc. All rights reserved. 13 Streaming SQL Democratizing access to streams of data via structured query language
  • 14. © 2021 Cloudera, Inc. All rights reserved. 14 Download these assets today
  • 15. © 2021 Cloudera, Inc. All rights reserved. 15 TH N Y U