SlideShare a Scribd company logo
1C O N F I D E N T I A L
Using Kafka to integrate DWH and
Cloud Based big data systems.
Mic Hussey, Confluent Nordics, mic@confluent.io
2C O N F I D E N T I A L
Apache Kafka, the de-facto OSS standard for
event streaming
Real-time | Uses disk structure for constant performance at Petabyte scale
Scalable | Distributed, scales quickly and easily without downtime
Persistent | Persists messages on disks, enables intra-cluster replication
Reliable | Replicates data, auto balances consumers upon failure
In production at more
than a third of the
Fortune 500
2 trillion messages a
day at LinkedIn
500 billion events a
day (1.3 PB) at Netflix
3C O N F I D E N T I A L
4C O N F I D E N T I A L 4C O N F I D E N T I A L
Data Warehouses to Big Data
5C O N F I D E N T I A L
Kafka Integration Architecture
Apps Apps Apps
Apps Apps Apps
Apps Apps Apps
Apps Apps Apps
Apps
Search
NoSQL
Apps
Apps
DWH
Hado
STREAM
ING
PLATFORM
Apps
Search
NoSQL
Apps
DWH
STREAMING
PLATFORM
PRODUCERCONSUMER
6C O N F I D E N T I A L
Sample UseCase: Sales data
● Dataset from Kaggle https://www.kaggle.com/kyanyoga/sample-sales-data
7C O N F I D E N T I A L
DWH
● Current de-facto
data integration
technology
● Third Normal Form
● Minimises data
duplication
● Star schema
8C O N F I D E N T I A L 8
Big Data
● Data storage is
cheap
● Tabular data
● Flat schema
9C O N F I D E N T I A L
Data load Scenario
10C O N F I D E N T I A L
There’s a huge gap!
11C O N F I D E N T I A L
Kafka Cluster
Connect API Stream Processing Connect API
$ cat < in.txt | grep “ksql” | tr a-z A-Z > out.txt
12C O N F I D E N T I A L
1
2
KSQLis the
Streaming
SQL Enginefor
Apache Kafka
13C O N F I D E N T I A L
CREATE STREAM oob_readings AS
SELECT *, c.std_value, c.sigma
FROM sensor_reading s
LEFT JOIN sensor_characteristics c
ON s.id = c.id
WHERE abs(s.value – c.std_value) > 3*c.sigma;
Simple SQL syntax for expressing reasoning along and
across data streams.
You can write user-defined functions in Java
14C O N F I D E N T I A L
Streaming KSQL: pairwise joins
15C O N F I D E N T I A L
Streaming KSQL: pairwise joins
16C O N F I D E N T I A L
Streaming KSQL: pairwise joins
17C O N F I D E N T I A L
Streaming KSQL: pairwise joins
18C O N F I D E N T I A L
Example code
● Template methods for invoking Kafka Connect REST APIs
● https://github.com/MichaelHussey/kafka_connect_curler
● 3rd Normal Form to Cloud
● https://github.com/MichaelHussey/dwh2cloud
19C O N F I D E N T I A L
Publish master data to a topic
20C O N F I D E N T I A L
Flow through a compacted topic
23C O N F I D E N T I A L

More Related Content

What's hot (20)

PDF
Kai Wähner, Technology Evangelist at Confluent: "Development of Scalable Mac...
Dataconomy Media
 
PDF
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...
confluent
 
PPTX
New Approaches for Fraud Detection on Apache Kafka and KSQL
confluent
 
PDF
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
confluent
 
PDF
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Codemotion
 
PDF
Leveraging Mainframe Data for Modern Analytics
confluent
 
PDF
Kafka Lag Monitoring For Human Beings (Elad Leev, AppsFlyer) Kafka Summit 2020
HostedbyConfluent
 
PDF
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...
Michael Noll
 
PDF
Real-time processing of large amounts of data
confluent
 
PDF
Operational Analytics on Event Streams in Kafka
confluent
 
PDF
All Streams Ahead! ksqlDB Workshop ANZ
confluent
 
PDF
What every software engineer should know about streams and tables in kafka ...
confluent
 
PDF
Confluent real time_acquisition_analysis_and_evaluation_of_data_streams_20190...
confluent
 
PDF
Kafka Summit NYC 2017 - Stream it Together: 3 Realities of Modern Programming
confluent
 
PDF
Apache Flink 101 - the rise of stream processing and beyond
Bowen Li
 
PDF
How to Build an Apache Kafka® Connector
confluent
 
PPTX
apidays LIVE Singapore 2021 - REST the Events - REST APIs for Event-Driven Ar...
apidays
 
PDF
Keeping Your Data Close and Your Caches Hotter (Ricardo Ferreira, Confluent) ...
confluent
 
PDF
Kafka, Killer of Point-to-Point Integrations, Lucian Lita
confluent
 
PDF
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
confluent
 
Kai Wähner, Technology Evangelist at Confluent: "Development of Scalable Mac...
Dataconomy Media
 
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...
confluent
 
New Approaches for Fraud Detection on Apache Kafka and KSQL
confluent
 
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
confluent
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Codemotion
 
Leveraging Mainframe Data for Modern Analytics
confluent
 
Kafka Lag Monitoring For Human Beings (Elad Leev, AppsFlyer) Kafka Summit 2020
HostedbyConfluent
 
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...
Michael Noll
 
Real-time processing of large amounts of data
confluent
 
Operational Analytics on Event Streams in Kafka
confluent
 
All Streams Ahead! ksqlDB Workshop ANZ
confluent
 
What every software engineer should know about streams and tables in kafka ...
confluent
 
Confluent real time_acquisition_analysis_and_evaluation_of_data_streams_20190...
confluent
 
Kafka Summit NYC 2017 - Stream it Together: 3 Realities of Modern Programming
confluent
 
Apache Flink 101 - the rise of stream processing and beyond
Bowen Li
 
How to Build an Apache Kafka® Connector
confluent
 
apidays LIVE Singapore 2021 - REST the Events - REST APIs for Event-Driven Ar...
apidays
 
Keeping Your Data Close and Your Caches Hotter (Ricardo Ferreira, Confluent) ...
confluent
 
Kafka, Killer of Point-to-Point Integrations, Lucian Lita
confluent
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
confluent
 

Similar to Using Kafka to integrate DWH and Cloud Based big data systems (20)

PDF
Amsterdam meetup at ING June 18, 2019
confluent
 
PDF
Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
Kai Wähner
 
PDF
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
confluent
 
PDF
A Tour of Apache Kafka
confluent
 
PDF
Data integration with Apache Kafka
confluent
 
PPTX
Introduction to KSQL: Streaming SQL for Apache Kafka®
confluent
 
PDF
KSQL: The Streaming SQL Engine for Apache Kafka
Chris Mueller
 
PDF
Confluent kafka meetupseattle jan2017
Nitin Kumar
 
PDF
Why Build an Apache Kafka® Connector
confluent
 
PPTX
Data Pipelines with Kafka Connect
Kaufman Ng
 
PDF
Introduction to Apache Kafka and Confluent... and why they matter!
Paolo Castagna
 
PDF
Introduction to Apache Kafka and Confluent... and why they matter
confluent
 
PDF
Apache Kafka as Event Streaming Platform for Microservice Architectures
Kai Wähner
 
PDF
Introduction to Apache Kafka and why it matters - Madrid
Paolo Castagna
 
PDF
Benefits of Stream Processing and Apache Kafka Use Cases
confluent
 
PDF
JHipster conf 2019 - Kafka Ecosystem
Florent Ramiere
 
PDF
Confluent and Elastic: a Lovely Couple - Elastic Stack in a Day 2018
Paolo Castagna
 
PPTX
Streaming Data and Stream Processing with Apache Kafka
confluent
 
PDF
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
confluent
 
PDF
Beyond the brokers - A tour of the Kafka ecosystem
Damien Gasparina
 
Amsterdam meetup at ING June 18, 2019
confluent
 
Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
Kai Wähner
 
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
confluent
 
A Tour of Apache Kafka
confluent
 
Data integration with Apache Kafka
confluent
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
confluent
 
KSQL: The Streaming SQL Engine for Apache Kafka
Chris Mueller
 
Confluent kafka meetupseattle jan2017
Nitin Kumar
 
Why Build an Apache Kafka® Connector
confluent
 
Data Pipelines with Kafka Connect
Kaufman Ng
 
Introduction to Apache Kafka and Confluent... and why they matter!
Paolo Castagna
 
Introduction to Apache Kafka and Confluent... and why they matter
confluent
 
Apache Kafka as Event Streaming Platform for Microservice Architectures
Kai Wähner
 
Introduction to Apache Kafka and why it matters - Madrid
Paolo Castagna
 
Benefits of Stream Processing and Apache Kafka Use Cases
confluent
 
JHipster conf 2019 - Kafka Ecosystem
Florent Ramiere
 
Confluent and Elastic: a Lovely Couple - Elastic Stack in a Day 2018
Paolo Castagna
 
Streaming Data and Stream Processing with Apache Kafka
confluent
 
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
confluent
 
Beyond the brokers - A tour of the Kafka ecosystem
Damien Gasparina
 
Ad

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
PDF
Migration, backup and restore made easy using Kannika
confluent
 
PDF
Five Things You Need to Know About Data Streaming in 2025
confluent
 
PDF
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
PDF
Unlocking value with event-driven architecture by Confluent
confluent
 
PDF
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
PDF
Building API data products on top of your real-time data infrastructure
confluent
 
PDF
Speed Wins: From Kafka to APIs in Minutes
confluent
 
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
Migration, backup and restore made easy using Kannika
confluent
 
Five Things You Need to Know About Data Streaming in 2025
confluent
 
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
Unlocking value with event-driven architecture by Confluent
confluent
 
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
Building API data products on top of your real-time data infrastructure
confluent
 
Speed Wins: From Kafka to APIs in Minutes
confluent
 
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Ad

Recently uploaded (20)

PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 

Using Kafka to integrate DWH and Cloud Based big data systems

  • 1. 1C O N F I D E N T I A L Using Kafka to integrate DWH and Cloud Based big data systems. Mic Hussey, Confluent Nordics, [email protected]
  • 2. 2C O N F I D E N T I A L Apache Kafka, the de-facto OSS standard for event streaming Real-time | Uses disk structure for constant performance at Petabyte scale Scalable | Distributed, scales quickly and easily without downtime Persistent | Persists messages on disks, enables intra-cluster replication Reliable | Replicates data, auto balances consumers upon failure In production at more than a third of the Fortune 500 2 trillion messages a day at LinkedIn 500 billion events a day (1.3 PB) at Netflix
  • 3. 3C O N F I D E N T I A L
  • 4. 4C O N F I D E N T I A L 4C O N F I D E N T I A L Data Warehouses to Big Data
  • 5. 5C O N F I D E N T I A L Kafka Integration Architecture Apps Apps Apps Apps Apps Apps Apps Apps Apps Apps Apps Apps Apps Search NoSQL Apps Apps DWH Hado STREAM ING PLATFORM Apps Search NoSQL Apps DWH STREAMING PLATFORM PRODUCERCONSUMER
  • 6. 6C O N F I D E N T I A L Sample UseCase: Sales data ● Dataset from Kaggle https://www.kaggle.com/kyanyoga/sample-sales-data
  • 7. 7C O N F I D E N T I A L DWH ● Current de-facto data integration technology ● Third Normal Form ● Minimises data duplication ● Star schema
  • 8. 8C O N F I D E N T I A L 8 Big Data ● Data storage is cheap ● Tabular data ● Flat schema
  • 9. 9C O N F I D E N T I A L Data load Scenario
  • 10. 10C O N F I D E N T I A L There’s a huge gap!
  • 11. 11C O N F I D E N T I A L Kafka Cluster Connect API Stream Processing Connect API $ cat < in.txt | grep “ksql” | tr a-z A-Z > out.txt
  • 12. 12C O N F I D E N T I A L 1 2 KSQLis the Streaming SQL Enginefor Apache Kafka
  • 13. 13C O N F I D E N T I A L CREATE STREAM oob_readings AS SELECT *, c.std_value, c.sigma FROM sensor_reading s LEFT JOIN sensor_characteristics c ON s.id = c.id WHERE abs(s.value – c.std_value) > 3*c.sigma; Simple SQL syntax for expressing reasoning along and across data streams. You can write user-defined functions in Java
  • 14. 14C O N F I D E N T I A L Streaming KSQL: pairwise joins
  • 15. 15C O N F I D E N T I A L Streaming KSQL: pairwise joins
  • 16. 16C O N F I D E N T I A L Streaming KSQL: pairwise joins
  • 17. 17C O N F I D E N T I A L Streaming KSQL: pairwise joins
  • 18. 18C O N F I D E N T I A L Example code ● Template methods for invoking Kafka Connect REST APIs ● https://github.com/MichaelHussey/kafka_connect_curler ● 3rd Normal Form to Cloud ● https://github.com/MichaelHussey/dwh2cloud
  • 19. 19C O N F I D E N T I A L Publish master data to a topic
  • 20. 20C O N F I D E N T I A L Flow through a compacted topic
  • 21. 23C O N F I D E N T I A L