SlideShare a Scribd company logo
Partner Skill Up:
Enable a Streaming CDC Solution
Tim Spann
Principal Developer Advocate in Data In Motion for Cloudera, Global
tspann@cloudera.com
Salvador Almazan
Partner Solutions Engineer, US
salmazan@cloudera.com
2
© Cloudera, Inc. All rights reserved.
Today’s Leads
CLOUDERA TEAM
Meet the Cloudera team
Salvador Almazan
Sr. Partner Solutions Engineer
salmazan@cloudera.com
Tim Spann
DIM Developer Advocate
tspann@cloudera.com
3
© 2023 Cloudera, Inc. All rights reserved.
TODAY’S LEAD
Who am I?
@PaasDev
DZone Zone Leader and Big Data MVB
Princeton and NYC Future of Data Meetups
ex-Pivotal Field Engineer ex-StreamNative ex-PwC
https://github.com/tspannhw https://twitter.com/PaaSDev
https://www.datainmotion.dev/
https://medium.com/@tspann
Principal Data-in-Motion Developer Advocate
© 2023 Cloudera, Inc. All rights reserved. 4
Future of Data - NYC / Princeton + Virtual
@PaasDev
https://www.meetup.com/futureofdata-princeton/
https://www.meetup.com/futureofdata-newyork/
From Big Data to AI to Streaming to LLM to Cloud to
Analytics to NLP to Fast Data to Machine Learning to
Microservices to ...
5
© 2023 Cloudera, Inc. All rights reserved.
Streaming Change Data Capture (CDC) 3+ Unique Ways
In this next session,
learn how to use Debezium with Flink, Kafka, and NiFi for Change Data Capture using two different mechanisms: Kafka
Connect and Flink SQL.
With the virtual nature of today's world, streaming data is more critical than ever. Join Cloudera Chief Data-In-Motion Principal,
Tim Spann, and Partner Solution Engineer, Salvador Alamazan as they look closely at key CDC use cases, discuss why
Debezium is the best option for handling CDC and use examples to show you how to demonstrate value.
This is a must-attend experience!
https://medium.com/cloudera-inc/cdc-not-cat-data-capture-e43713879c03
https://dzone.com/articles/streaming-change-data-capture-data-two-ways
WHAT IS IT? WHY DO I NEED IT?
7
© 2023 Cloudera, Inc. All rights reserved.
What is Change Data Capture
Full Fidelity vs Point in Time
8
© 2023 Cloudera, Inc. All rights reserved.
Why Change Data Capture?
Most Common Use Cases - Distribution or Synchronization
Analytics
{Distribution}
Full Fidelity Analytics
Offline, Look at any point in
time
Operational Data Store
{Distribution}
Prevent load on
operational databases by
replicating the database
for reporting queries to run
in isolation
Migrations
{Synchronization}
Keeping two databases in
synchronization, existing
and new systems need to
coexist for some time
CDC ENGINE SELECTION
HOW TO DO IT?
© 2023 Cloudera, Inc. All rights reserved. 10
Already using Kafka? Already using NiFi? Need for Fast Flink?
Simple setup for many tables
Want metadata augmented data
Don’t need low latency?
Visual monitoring
Easy manual scaling
Easy to combine with NiFi
Debezium
Simple JDBC queries?
Transform individual records?
Want easy development with UI?
Lots of small files, events, records, rows?
Continuous stream of rows
Support many different sources
Debezium coming
Strong control of table and joins
Want high Throughput?
Want Low Latency?
Want Advanced Windowing and State?
Automatic records immediately
Pure SQL
Debezium
Kafka Connect, NiFi, Flink? Which engine to choose? Or All 3?
CDC ARCHITECTURE - Using FLaNK to pull the data out of anything in near-real time
INGEST PREPARE PUBLISH
DATA SOURCES
Internal Users
(After Sales)
External
Systems
ENTERPRISE
LAKEHOUSE
CAPABILITY VIEW
INGESTION
ENTERPRISE DATA
MESSAGE HUB
STORAGE
BATCH
MANAGEMENT
STREAM
CONSUMPTION
Closed Loop
Systems
SQL Stream Builder
Machine Learning
Data Visualization
Workload Manager
watsonx.data
12
© 2023 Cloudera, Inc. All rights reserved.
Data Distribution as a Universal, Hybrid, Multi-Cloud Data Service
Universal Data Distribution Service
(Ingest, Transform, Deliver)
Ingest
Processors
Ingest Gateway
Router, Filter &
Transform Processors
Destination
Processors
Cloud Business Process Services*
Log Data Sources
Laptops /
Servers Security
Agents
IOT Devices App Logs
Mobile Apps
Cloud Data Analytics/ Service *
On-Prem Data Sources Cloud Warehouse
(Cloudera DW)
Big Data Cloud Services
Multi-Cloud Data Distribution Service that Solves the First & Last Mile Problem for the Modern Data Stack
CDC with SQL Stream Builder
(Flink SQL)
14
© 2023 Cloudera, Inc. All rights reserved.
Streaming CDC with Cloudera SQL Stream Builder (Flink SQL)
https://github.com/tspannhw/FLaNK-CDC/blob/main/flinkcdc.MD
© 2023 Cloudera, Inc. All rights reserved. 15
https://docs.cloudera.com/csa/1.10.0/how-to-ssb/topics/csa-ssb-cdc-connectors.html
CDC with Debezium and Flink
SQL Stream Builder with Flink SQL
© 2023 Cloudera, Inc. All rights reserved. 16
CDC with Debezium and Flink
SQL Stream Builder with Flink SQL
© 2023 Cloudera, Inc. All rights reserved. 17
© 2023 Cloudera, Inc. All rights reserved. 18
CREATE TABLE `postgres_cdc_newjerseybus` (
`title` STRING,
`description` STRING,
`link` STRING,
`guid` STRING,
`advisoryAlert` STRING,
`pubDate` STRING,
`ts` STRING,
`companyname` STRING,
`uuid` STRING,
`servicename` STRING
) WITH (
'connector' = 'postgres-cdc',
'database-name' = 'tspann',
'hostname' = '192.168.1.153',
'password' = 'tspann',
'decoding.plugin.name' = 'pgoutput',
'schema-name' = 'public',
'table-name' = 'newjerseybus',
'username' = 'tspann',
'port' = '5432'
);
Flink SQL Tables - Debezium CDC From Database Tables
© 2023 Cloudera, Inc. All rights reserved. 19
Flink SQL Tables - Upsert to Kafka Topics
CREATE TABLE `upsert_kafka_newjerseybus` (
`title` String,
`description` String,
`link` String,
`guid` String,
`advisoryAlert` String,
`pubDate` String,
`ts` String,
`companyname` String,
`uuid` String,
`servicename` String,
`eventTimestamp` TIMESTAMP(3),
WATERMARK FOR `eventTimestamp` AS `eventTimestamp` - INTERVAL '5' SECOND,
PRIMARY KEY (uuid) NOT ENFORCED
) WITH (
'connector' = 'upsert-kafka',
'topic' = 'kafka_newjerseybus',
'properties.bootstrap.servers' = 'kafka:9092',
'key.format' = 'json',
'value.format' = 'json'
);
CDC with Kafka Connect
21
© 2023 Cloudera, Inc. All rights reserved.
https://github.com/tspannhw/FLaNK-CDC/blob/main/kafkacdc.md
Streaming CDC with Cloudera Streams Messaging Manager (Kafka)
22
© 2023 Cloudera, Inc. All rights reserved.
CDC with Debezium and Kafka
Kafka Connect
23
© 2023 Cloudera, Inc. All rights reserved.
What is it?
• Connect data sources (or sinks) with Kafka
– Ideal for moving large amounts of data into or out of Kafka
– In a reliable manner
– In a performant way
• A tool built upon Apache Kafka
• Present since end of 2015
• Makes easier to implement the most common use case, earlier (presumably)
implemented with Producer/Consumer API
Kafka Connect
24
© 2023 Cloudera, Inc. All rights reserved.
Basic Concepts
Kafka Connect
● Connect framework uses “Workers” (Java processes) to do data processing.
● Workers Run Connectors
● Workers can be organized into a Connect cluster.
25
© 2022 Cloudera, Inc. All rights reserved.
KConnect
Connectors
• New Connectors
• CDC Debezium Connectors
• SMM UI Integration
• Reuse your Kafka
Infrastructure
• Enterprise Security for
Secrets, Authentication
and Authorization
Sources
● ActiveMQ (via JMS)
● MQTT
● Syslog over TCP
● JDBC
● JMS
● HTTP
Sinks
● AWS S3
● ADLS
● Kudu
● HDFS
● HTTP
CDC Connectors
● MySQL
● PostgreSQL
● Oracle
● MS SQL Server
● IBM DB2
© 2023 Cloudera, Inc. All rights reserved. 26
© 2023 Cloudera, Inc. All rights reserved. 27
CDC with NiFi
29
© 2023 Cloudera, Inc. All rights reserved.
Apache NiFi + Apache Kafka
Process Debezium Kafka Formats From Kafka Connect and others
30
© 2023 Cloudera, Inc. All rights reserved.
Apache NiFi to Databases
SQL and MySQL CDC
31
© 2023 Cloudera, Inc. All rights reserved.
Apache NiFi to Databases
SQL Query Access
DEMO AND Q&A
33
© 2023 Cloudera, Inc. All rights reserved.
FREE LEARNING ENVIRONMENT
35
© 2022 Cloudera, Inc. All rights reserved.
Cloudera Streams
Processing -
Community Edition
• Kafka, KConnect, SMM, SR,
Flink, and SSB in Docker
• Runs in Docker
• Try new features quickly
• Develop applications locally
● Docker compose file of CSP to run from command line w/o any
dependencies, including Flink, SQL Stream Builder, Kafka, Kafka
Connect, Streams Messaging Manager and Schema Registry
○ $> docker compose up
● Licensed under the Cloudera Community License
● Unsupported
● Community Group Hub for CSP
● Find it on docs.cloudera.com under Applications
Open Source Edition
• Apache NiFi in Docker
• Runs in Docker
• Try new features
quickly
• Develop applications
locally
● Docker NiFi
○ docker run --name nifi -p 8443:8443 -d -e
SINGLE_USER_CREDENTIALS_USERNAME=admin -e
SINGLE_USER_CREDENTIALS_PASSWORD=ctsBtRBKHRAx69EqUgh
vvgEvjnaLjFEB apache/nifi:latest
● Licensed under the ASF License
● Unsupported
https://hub.docker.com/r/apache/nifi
Cloudera Edge2AI Workshop - CDC - Debezium
https://github.com/asdaraujo/edge2ai-workshop/blob/trunk/workshop_cdc.adoc
RESOURCES AND WRAP-UP
https://medium.com/@tspann/cdc-not-cat-data-capture-e43713879c03
40
© 2023 Cloudera, Inc. All rights reserved.
References
https://medium.com/@tspann/ingesting-events-into-dockerized-ibm-db2-jdbc-with-apache-ni
fi-f0ca452d1351
https://community.cloudera.com/t5/Community-Articles/RDBMS-to-Hive-using-NiFi-small-m
edium-tables/ta-p/244677
https://community.cloudera.com/t5/Community-Articles/MySQL-CDC-with-Kafka-Connect-D
ebezium-in-CDP-Public-Cloud/ta-p/345321
https://docs.cloudera.com/cdp-private-cloud-base/7.1.9/kafka-connect/topics/kafka-connec
t-connector-debezium-db2.html
https://docs.cloudera.com/csa/1.10.0/how-to-ssb/topics/csa-ssb-cdc-connectors.html
https://medium.com/cloudera-inc/building-a-stateful-streaming-intrusion-detection-system-
with-sql-stream-builder-4667c87f347f
41
© 2023 Cloudera, Inc. All rights reserved.
Streaming Resources
• https://dzone.com/articles/real-time-stream-processing-with-hazelcast-an
d-streamnative
• https://flipstackweekly.com/
• https://www.datainmotion.dev/
• https://www.flankstack.dev/
• https://github.com/tspannhw
• https://medium.com/@tspann
• https://medium.com/@tspann/predictions-for-streaming-in-2023-ad4d739
5d714
• https://www.apachecon.com/acna2022/slides/04_Spann_Tim_Citizen_Str
eaming_Engineer.pdf
42
© 2023 Cloudera, Inc. All rights reserved.
CDC
https://community.cloudera.com/t5/Community-Articles/Incrementally-Streaming-RDBMS-Data-to-Your
-Hadoop-DataLake/ta-p/247927
https://community.cloudera.com/t5/Community-Articles/QADCDC-Our-how-to-ingest-some-database-ta
bles-to-Hadoop-Very/ta-p/245229
https://community.cloudera.com/t5/Community-Articles/Ingesting-RDBMS-Data-As-New-Tables-Arrive-
Automagically-into/ta-p/246214
https://community.cloudera.com/t5/Community-Articles/Ingesting-Golden-Gate-Records-From-Apache-
Kafka-and/ta-p/247557
https://community.cloudera.com/t5/Community-Articles/Simple-Change-Data-Capture-CDC-with-SQL-
Selects-via-Apache/ta-p/308390
43
© 2023 Cloudera, Inc. All rights reserved.
CDC
https://community.cloudera.com/t5/Community-Articles/RDBMS-to-Hive-using-NiFi-small-medium-table
s/ta-p/244677
https://community.cloudera.com/t5/Community-Articles/Simple-Change-Data-Capture-CDC-with-SQL-
Selects-via-Apache/ta-p/308376
© 2023 Cloudera, Inc. All rights reserved. 44
https://community.cloudera.com/t5/Community-Articles/Incrementally-Streaming-RDBMS-Data-to-Yo
ur-Hadoop-DataLake/ta-p/247927
https://community.cloudera.com/t5/Community-Articles/Ingesting-RDBMS-Data-As-New-Tables-Arriv
e-Automagically-into/ta-p/246214
https://community.cloudera.com/t5/Community-Articles/Incremental-Fetch-in-NiFi-with-QueryDatabas
eTable/ta-p/247073
https://community.cloudera.com/t5/Community-Articles/Simple-Change-Data-Capture-CDC-with-SQL
-Selects-via-Apache/ta-p/308390
Cloudera Data Flow / Apache NiFi
© 2023 Cloudera, Inc. All rights reserved. 45
https://community.cloudera.com/t5/Community-Articles/Change-Data-Capture-CDC-with-Apache-NiF
i-Part-1-of-3/ta-p/246623
https://community.cloudera.com/t5/Community-Articles/Change-Data-Capture-CDC-with-Apache-NiF
i-Part-2-of-3/ta-p/246519
https://community.cloudera.com/t5/Community-Articles/Change-Data-Capture-CDC-with-Apache-NiF
i-Part-3-of-3/ta-p/246482
Cloudera Data Flow / Apache NiFi
© 2023 Cloudera, Inc. All rights reserved. 46
https://community.cloudera.com/t5/Community-Articles/MySQL-CDC-with-Kafka-Connect-Debezium-
in-CDP-Public-Cloud/ta-p/345321
CDC Debezium KConnectors for PostgreSQL, MySQL, SQL Server, DB2, and Oracle
Cloudera Streams Messaging / Apache Kafka Connect
47
TH N Y U
Ad

More Related Content

Similar to PartnerSkillUp_Enable a Streaming CDC Solution (20)

Using the FLaNK Stack for edge ai (flink, nifi, kafka, kudu)
Using the FLaNK Stack for edge ai (flink, nifi, kafka, kudu)Using the FLaNK Stack for edge ai (flink, nifi, kafka, kudu)
Using the FLaNK Stack for edge ai (flink, nifi, kafka, kudu)
Timothy Spann
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017
DSDT_MTL
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
JDA Labs MTL
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
Timothy Spann
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
Timothy Spann
 
Meetup: Streaming Data Pipeline Development
Meetup:  Streaming Data Pipeline DevelopmentMeetup:  Streaming Data Pipeline Development
Meetup: Streaming Data Pipeline Development
Timothy Spann
 
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
Timothy Spann
 
Citi Tech Talk: Hybrid Cloud
Citi Tech Talk: Hybrid CloudCiti Tech Talk: Hybrid Cloud
Citi Tech Talk: Hybrid Cloud
confluent
 
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18   asher bartchBig data journey to the cloud 5.30.18   asher bartch
Big data journey to the cloud 5.30.18 asher bartch
Cloudera, Inc.
 
AIDEVDAY_ Data-in-Motion to Supercharge AI
AIDEVDAY_ Data-in-Motion to Supercharge AIAIDEVDAY_ Data-in-Motion to Supercharge AI
AIDEVDAY_ Data-in-Motion to Supercharge AI
Timothy Spann
 
Why NBC Universal Migrated to MongoDB Atlas
Why NBC Universal Migrated to MongoDB AtlasWhy NBC Universal Migrated to MongoDB Atlas
Why NBC Universal Migrated to MongoDB Atlas
Datavail
 
Streaming Time Series Data With Kenny Gorman and Elena Cuevas | Current 2022
Streaming Time Series Data With Kenny Gorman and Elena Cuevas | Current 2022Streaming Time Series Data With Kenny Gorman and Elena Cuevas | Current 2022
Streaming Time Series Data With Kenny Gorman and Elena Cuevas | Current 2022
HostedbyConfluent
 
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road AheadCloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
DataWorks Summit
 
Stl meetup cloudera platform - january 2020
Stl meetup   cloudera platform  - january 2020Stl meetup   cloudera platform  - january 2020
Stl meetup cloudera platform - january 2020
Adam Doyle
 
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataBuilding Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Timothy Spann
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

Cloudera, Inc.
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
Timothy Spann
 
Best Practices For Workflow
Best Practices For WorkflowBest Practices For Workflow
Best Practices For Workflow
Timothy Spann
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Timothy Spann
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Using the FLaNK Stack for edge ai (flink, nifi, kafka, kudu)
Using the FLaNK Stack for edge ai (flink, nifi, kafka, kudu)Using the FLaNK Stack for edge ai (flink, nifi, kafka, kudu)
Using the FLaNK Stack for edge ai (flink, nifi, kafka, kudu)
Timothy Spann
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017
DSDT_MTL
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
JDA Labs MTL
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
Timothy Spann
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
Timothy Spann
 
Meetup: Streaming Data Pipeline Development
Meetup:  Streaming Data Pipeline DevelopmentMeetup:  Streaming Data Pipeline Development
Meetup: Streaming Data Pipeline Development
Timothy Spann
 
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
Timothy Spann
 
Citi Tech Talk: Hybrid Cloud
Citi Tech Talk: Hybrid CloudCiti Tech Talk: Hybrid Cloud
Citi Tech Talk: Hybrid Cloud
confluent
 
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18   asher bartchBig data journey to the cloud 5.30.18   asher bartch
Big data journey to the cloud 5.30.18 asher bartch
Cloudera, Inc.
 
AIDEVDAY_ Data-in-Motion to Supercharge AI
AIDEVDAY_ Data-in-Motion to Supercharge AIAIDEVDAY_ Data-in-Motion to Supercharge AI
AIDEVDAY_ Data-in-Motion to Supercharge AI
Timothy Spann
 
Why NBC Universal Migrated to MongoDB Atlas
Why NBC Universal Migrated to MongoDB AtlasWhy NBC Universal Migrated to MongoDB Atlas
Why NBC Universal Migrated to MongoDB Atlas
Datavail
 
Streaming Time Series Data With Kenny Gorman and Elena Cuevas | Current 2022
Streaming Time Series Data With Kenny Gorman and Elena Cuevas | Current 2022Streaming Time Series Data With Kenny Gorman and Elena Cuevas | Current 2022
Streaming Time Series Data With Kenny Gorman and Elena Cuevas | Current 2022
HostedbyConfluent
 
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road AheadCloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
DataWorks Summit
 
Stl meetup cloudera platform - january 2020
Stl meetup   cloudera platform  - january 2020Stl meetup   cloudera platform  - january 2020
Stl meetup cloudera platform - january 2020
Adam Doyle
 
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataBuilding Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Timothy Spann
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

Cloudera, Inc.
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
Timothy Spann
 
Best Practices For Workflow
Best Practices For WorkflowBest Practices For Workflow
Best Practices For Workflow
Timothy Spann
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Timothy Spann
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 

More from Timothy Spann (20)

14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
Timothy Spann
 
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
Timothy Spann
 
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
Timothy Spann
 
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
Conf42_IoT_Dec2024_Building IoT Applications With Open SourceConf42_IoT_Dec2024_Building IoT Applications With Open Source
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
Timothy Spann
 
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
Timothy Spann
 
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
2024Nov20-BigDataEU-RealTimeAIWithOpenSource2024Nov20-BigDataEU-RealTimeAIWithOpenSource
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
Timothy Spann
 
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming PipelinesTSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
Timothy Spann
 
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
2024 Nov 05 - Linux Foundation TAC TALK With Milvus2024 Nov 05 - Linux Foundation TAC TALK With Milvus
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
Timothy Spann
 
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAGtspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
Timothy Spann
 
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
Timothy Spann
 
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
Timothy Spann
 
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
Timothy Spann
 
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
Timothy Spann
 
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
DBTA Round Table with Zilliz and Airbyte - Unstructured Data EngineeringDBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
Timothy Spann
 
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
17-October-2024 NYC AI Camp - Step-by-Step RAG 10117-October-2024 NYC AI Camp - Step-by-Step RAG 101
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
Timothy Spann
 
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
Timothy Spann
 
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
Timothy Spann
 
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
01-Oct-2024_PES-VectorDatabasesAndAI.pdf01-Oct-2024_PES-VectorDatabasesAndAI.pdf
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
Timothy Spann
 
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
Timothy Spann
 
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
Timothy Spann
 
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
Timothy Spann
 
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
Conf42_IoT_Dec2024_Building IoT Applications With Open SourceConf42_IoT_Dec2024_Building IoT Applications With Open Source
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
Timothy Spann
 
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
Timothy Spann
 
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
2024Nov20-BigDataEU-RealTimeAIWithOpenSource2024Nov20-BigDataEU-RealTimeAIWithOpenSource
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
Timothy Spann
 
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming PipelinesTSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
Timothy Spann
 
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
2024 Nov 05 - Linux Foundation TAC TALK With Milvus2024 Nov 05 - Linux Foundation TAC TALK With Milvus
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
Timothy Spann
 
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAGtspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
Timothy Spann
 
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
Timothy Spann
 
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
Timothy Spann
 
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
Timothy Spann
 
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
Timothy Spann
 
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
DBTA Round Table with Zilliz and Airbyte - Unstructured Data EngineeringDBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
Timothy Spann
 
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
17-October-2024 NYC AI Camp - Step-by-Step RAG 10117-October-2024 NYC AI Camp - Step-by-Step RAG 101
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
Timothy Spann
 
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
Timothy Spann
 
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
Timothy Spann
 
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
01-Oct-2024_PES-VectorDatabasesAndAI.pdf01-Oct-2024_PES-VectorDatabasesAndAI.pdf
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
Timothy Spann
 
Ad

Recently uploaded (20)

Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Andre Hora
 
Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025
kashifyounis067
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Download YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full ActivatedDownload YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full Activated
saniamalik72555
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025
kashifyounis067
 
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
ssuserb14185
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
Maxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINKMaxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINK
younisnoman75
 
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
F-Secure Freedome VPN 2025 Crack Plus Activation  New VersionF-Secure Freedome VPN 2025 Crack Plus Activation  New Version
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
saimabibi60507
 
Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)
Allon Mureinik
 
Top 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docxTop 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docx
Portli
 
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New VersionPixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
saimabibi60507
 
FL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full VersionFL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full Version
tahirabibi60507
 
Douwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License codeDouwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License code
aneelaramzan63
 
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Dele Amefo
 
PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025
mu394968
 
Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025
kashifyounis067
 
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AIScaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
danshalev
 
How can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptxHow can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptx
laravinson24
 
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Andre Hora
 
Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025
kashifyounis067
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Download YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full ActivatedDownload YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full Activated
saniamalik72555
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025
kashifyounis067
 
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
ssuserb14185
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
Maxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINKMaxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINK
younisnoman75
 
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
F-Secure Freedome VPN 2025 Crack Plus Activation  New VersionF-Secure Freedome VPN 2025 Crack Plus Activation  New Version
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
saimabibi60507
 
Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)
Allon Mureinik
 
Top 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docxTop 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docx
Portli
 
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New VersionPixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
saimabibi60507
 
FL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full VersionFL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full Version
tahirabibi60507
 
Douwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License codeDouwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License code
aneelaramzan63
 
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Dele Amefo
 
PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025
mu394968
 
Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025
kashifyounis067
 
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AIScaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
danshalev
 
How can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptxHow can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptx
laravinson24
 
Ad

PartnerSkillUp_Enable a Streaming CDC Solution

  • 1. Partner Skill Up: Enable a Streaming CDC Solution Tim Spann Principal Developer Advocate in Data In Motion for Cloudera, Global [email protected] Salvador Almazan Partner Solutions Engineer, US [email protected]
  • 2. 2 © Cloudera, Inc. All rights reserved. Today’s Leads CLOUDERA TEAM Meet the Cloudera team Salvador Almazan Sr. Partner Solutions Engineer [email protected] Tim Spann DIM Developer Advocate [email protected]
  • 3. 3 © 2023 Cloudera, Inc. All rights reserved. TODAY’S LEAD Who am I? @PaasDev DZone Zone Leader and Big Data MVB Princeton and NYC Future of Data Meetups ex-Pivotal Field Engineer ex-StreamNative ex-PwC https://github.com/tspannhw https://twitter.com/PaaSDev https://www.datainmotion.dev/ https://medium.com/@tspann Principal Data-in-Motion Developer Advocate
  • 4. © 2023 Cloudera, Inc. All rights reserved. 4 Future of Data - NYC / Princeton + Virtual @PaasDev https://www.meetup.com/futureofdata-princeton/ https://www.meetup.com/futureofdata-newyork/ From Big Data to AI to Streaming to LLM to Cloud to Analytics to NLP to Fast Data to Machine Learning to Microservices to ...
  • 5. 5 © 2023 Cloudera, Inc. All rights reserved. Streaming Change Data Capture (CDC) 3+ Unique Ways In this next session, learn how to use Debezium with Flink, Kafka, and NiFi for Change Data Capture using two different mechanisms: Kafka Connect and Flink SQL. With the virtual nature of today's world, streaming data is more critical than ever. Join Cloudera Chief Data-In-Motion Principal, Tim Spann, and Partner Solution Engineer, Salvador Alamazan as they look closely at key CDC use cases, discuss why Debezium is the best option for handling CDC and use examples to show you how to demonstrate value. This is a must-attend experience! https://medium.com/cloudera-inc/cdc-not-cat-data-capture-e43713879c03 https://dzone.com/articles/streaming-change-data-capture-data-two-ways
  • 6. WHAT IS IT? WHY DO I NEED IT?
  • 7. 7 © 2023 Cloudera, Inc. All rights reserved. What is Change Data Capture Full Fidelity vs Point in Time
  • 8. 8 © 2023 Cloudera, Inc. All rights reserved. Why Change Data Capture? Most Common Use Cases - Distribution or Synchronization Analytics {Distribution} Full Fidelity Analytics Offline, Look at any point in time Operational Data Store {Distribution} Prevent load on operational databases by replicating the database for reporting queries to run in isolation Migrations {Synchronization} Keeping two databases in synchronization, existing and new systems need to coexist for some time
  • 10. © 2023 Cloudera, Inc. All rights reserved. 10 Already using Kafka? Already using NiFi? Need for Fast Flink? Simple setup for many tables Want metadata augmented data Don’t need low latency? Visual monitoring Easy manual scaling Easy to combine with NiFi Debezium Simple JDBC queries? Transform individual records? Want easy development with UI? Lots of small files, events, records, rows? Continuous stream of rows Support many different sources Debezium coming Strong control of table and joins Want high Throughput? Want Low Latency? Want Advanced Windowing and State? Automatic records immediately Pure SQL Debezium Kafka Connect, NiFi, Flink? Which engine to choose? Or All 3?
  • 11. CDC ARCHITECTURE - Using FLaNK to pull the data out of anything in near-real time INGEST PREPARE PUBLISH DATA SOURCES Internal Users (After Sales) External Systems ENTERPRISE LAKEHOUSE CAPABILITY VIEW INGESTION ENTERPRISE DATA MESSAGE HUB STORAGE BATCH MANAGEMENT STREAM CONSUMPTION Closed Loop Systems SQL Stream Builder Machine Learning Data Visualization Workload Manager watsonx.data
  • 12. 12 © 2023 Cloudera, Inc. All rights reserved. Data Distribution as a Universal, Hybrid, Multi-Cloud Data Service Universal Data Distribution Service (Ingest, Transform, Deliver) Ingest Processors Ingest Gateway Router, Filter & Transform Processors Destination Processors Cloud Business Process Services* Log Data Sources Laptops / Servers Security Agents IOT Devices App Logs Mobile Apps Cloud Data Analytics/ Service * On-Prem Data Sources Cloud Warehouse (Cloudera DW) Big Data Cloud Services Multi-Cloud Data Distribution Service that Solves the First & Last Mile Problem for the Modern Data Stack
  • 13. CDC with SQL Stream Builder (Flink SQL)
  • 14. 14 © 2023 Cloudera, Inc. All rights reserved. Streaming CDC with Cloudera SQL Stream Builder (Flink SQL) https://github.com/tspannhw/FLaNK-CDC/blob/main/flinkcdc.MD
  • 15. © 2023 Cloudera, Inc. All rights reserved. 15 https://docs.cloudera.com/csa/1.10.0/how-to-ssb/topics/csa-ssb-cdc-connectors.html CDC with Debezium and Flink SQL Stream Builder with Flink SQL
  • 16. © 2023 Cloudera, Inc. All rights reserved. 16 CDC with Debezium and Flink SQL Stream Builder with Flink SQL
  • 17. © 2023 Cloudera, Inc. All rights reserved. 17
  • 18. © 2023 Cloudera, Inc. All rights reserved. 18 CREATE TABLE `postgres_cdc_newjerseybus` ( `title` STRING, `description` STRING, `link` STRING, `guid` STRING, `advisoryAlert` STRING, `pubDate` STRING, `ts` STRING, `companyname` STRING, `uuid` STRING, `servicename` STRING ) WITH ( 'connector' = 'postgres-cdc', 'database-name' = 'tspann', 'hostname' = '192.168.1.153', 'password' = 'tspann', 'decoding.plugin.name' = 'pgoutput', 'schema-name' = 'public', 'table-name' = 'newjerseybus', 'username' = 'tspann', 'port' = '5432' ); Flink SQL Tables - Debezium CDC From Database Tables
  • 19. © 2023 Cloudera, Inc. All rights reserved. 19 Flink SQL Tables - Upsert to Kafka Topics CREATE TABLE `upsert_kafka_newjerseybus` ( `title` String, `description` String, `link` String, `guid` String, `advisoryAlert` String, `pubDate` String, `ts` String, `companyname` String, `uuid` String, `servicename` String, `eventTimestamp` TIMESTAMP(3), WATERMARK FOR `eventTimestamp` AS `eventTimestamp` - INTERVAL '5' SECOND, PRIMARY KEY (uuid) NOT ENFORCED ) WITH ( 'connector' = 'upsert-kafka', 'topic' = 'kafka_newjerseybus', 'properties.bootstrap.servers' = 'kafka:9092', 'key.format' = 'json', 'value.format' = 'json' );
  • 20. CDC with Kafka Connect
  • 21. 21 © 2023 Cloudera, Inc. All rights reserved. https://github.com/tspannhw/FLaNK-CDC/blob/main/kafkacdc.md Streaming CDC with Cloudera Streams Messaging Manager (Kafka)
  • 22. 22 © 2023 Cloudera, Inc. All rights reserved. CDC with Debezium and Kafka Kafka Connect
  • 23. 23 © 2023 Cloudera, Inc. All rights reserved. What is it? • Connect data sources (or sinks) with Kafka – Ideal for moving large amounts of data into or out of Kafka – In a reliable manner – In a performant way • A tool built upon Apache Kafka • Present since end of 2015 • Makes easier to implement the most common use case, earlier (presumably) implemented with Producer/Consumer API Kafka Connect
  • 24. 24 © 2023 Cloudera, Inc. All rights reserved. Basic Concepts Kafka Connect ● Connect framework uses “Workers” (Java processes) to do data processing. ● Workers Run Connectors ● Workers can be organized into a Connect cluster.
  • 25. 25 © 2022 Cloudera, Inc. All rights reserved. KConnect Connectors • New Connectors • CDC Debezium Connectors • SMM UI Integration • Reuse your Kafka Infrastructure • Enterprise Security for Secrets, Authentication and Authorization Sources ● ActiveMQ (via JMS) ● MQTT ● Syslog over TCP ● JDBC ● JMS ● HTTP Sinks ● AWS S3 ● ADLS ● Kudu ● HDFS ● HTTP CDC Connectors ● MySQL ● PostgreSQL ● Oracle ● MS SQL Server ● IBM DB2
  • 26. © 2023 Cloudera, Inc. All rights reserved. 26
  • 27. © 2023 Cloudera, Inc. All rights reserved. 27
  • 29. 29 © 2023 Cloudera, Inc. All rights reserved. Apache NiFi + Apache Kafka Process Debezium Kafka Formats From Kafka Connect and others
  • 30. 30 © 2023 Cloudera, Inc. All rights reserved. Apache NiFi to Databases SQL and MySQL CDC
  • 31. 31 © 2023 Cloudera, Inc. All rights reserved. Apache NiFi to Databases SQL Query Access
  • 33. 33 © 2023 Cloudera, Inc. All rights reserved.
  • 35. 35 © 2022 Cloudera, Inc. All rights reserved. Cloudera Streams Processing - Community Edition • Kafka, KConnect, SMM, SR, Flink, and SSB in Docker • Runs in Docker • Try new features quickly • Develop applications locally ● Docker compose file of CSP to run from command line w/o any dependencies, including Flink, SQL Stream Builder, Kafka, Kafka Connect, Streams Messaging Manager and Schema Registry ○ $> docker compose up ● Licensed under the Cloudera Community License ● Unsupported ● Community Group Hub for CSP ● Find it on docs.cloudera.com under Applications
  • 36. Open Source Edition • Apache NiFi in Docker • Runs in Docker • Try new features quickly • Develop applications locally ● Docker NiFi ○ docker run --name nifi -p 8443:8443 -d -e SINGLE_USER_CREDENTIALS_USERNAME=admin -e SINGLE_USER_CREDENTIALS_PASSWORD=ctsBtRBKHRAx69EqUgh vvgEvjnaLjFEB apache/nifi:latest ● Licensed under the ASF License ● Unsupported https://hub.docker.com/r/apache/nifi
  • 37. Cloudera Edge2AI Workshop - CDC - Debezium https://github.com/asdaraujo/edge2ai-workshop/blob/trunk/workshop_cdc.adoc
  • 40. 40 © 2023 Cloudera, Inc. All rights reserved. References https://medium.com/@tspann/ingesting-events-into-dockerized-ibm-db2-jdbc-with-apache-ni fi-f0ca452d1351 https://community.cloudera.com/t5/Community-Articles/RDBMS-to-Hive-using-NiFi-small-m edium-tables/ta-p/244677 https://community.cloudera.com/t5/Community-Articles/MySQL-CDC-with-Kafka-Connect-D ebezium-in-CDP-Public-Cloud/ta-p/345321 https://docs.cloudera.com/cdp-private-cloud-base/7.1.9/kafka-connect/topics/kafka-connec t-connector-debezium-db2.html https://docs.cloudera.com/csa/1.10.0/how-to-ssb/topics/csa-ssb-cdc-connectors.html https://medium.com/cloudera-inc/building-a-stateful-streaming-intrusion-detection-system- with-sql-stream-builder-4667c87f347f
  • 41. 41 © 2023 Cloudera, Inc. All rights reserved. Streaming Resources • https://dzone.com/articles/real-time-stream-processing-with-hazelcast-an d-streamnative • https://flipstackweekly.com/ • https://www.datainmotion.dev/ • https://www.flankstack.dev/ • https://github.com/tspannhw • https://medium.com/@tspann • https://medium.com/@tspann/predictions-for-streaming-in-2023-ad4d739 5d714 • https://www.apachecon.com/acna2022/slides/04_Spann_Tim_Citizen_Str eaming_Engineer.pdf
  • 42. 42 © 2023 Cloudera, Inc. All rights reserved. CDC https://community.cloudera.com/t5/Community-Articles/Incrementally-Streaming-RDBMS-Data-to-Your -Hadoop-DataLake/ta-p/247927 https://community.cloudera.com/t5/Community-Articles/QADCDC-Our-how-to-ingest-some-database-ta bles-to-Hadoop-Very/ta-p/245229 https://community.cloudera.com/t5/Community-Articles/Ingesting-RDBMS-Data-As-New-Tables-Arrive- Automagically-into/ta-p/246214 https://community.cloudera.com/t5/Community-Articles/Ingesting-Golden-Gate-Records-From-Apache- Kafka-and/ta-p/247557 https://community.cloudera.com/t5/Community-Articles/Simple-Change-Data-Capture-CDC-with-SQL- Selects-via-Apache/ta-p/308390
  • 43. 43 © 2023 Cloudera, Inc. All rights reserved. CDC https://community.cloudera.com/t5/Community-Articles/RDBMS-to-Hive-using-NiFi-small-medium-table s/ta-p/244677 https://community.cloudera.com/t5/Community-Articles/Simple-Change-Data-Capture-CDC-with-SQL- Selects-via-Apache/ta-p/308376
  • 44. © 2023 Cloudera, Inc. All rights reserved. 44 https://community.cloudera.com/t5/Community-Articles/Incrementally-Streaming-RDBMS-Data-to-Yo ur-Hadoop-DataLake/ta-p/247927 https://community.cloudera.com/t5/Community-Articles/Ingesting-RDBMS-Data-As-New-Tables-Arriv e-Automagically-into/ta-p/246214 https://community.cloudera.com/t5/Community-Articles/Incremental-Fetch-in-NiFi-with-QueryDatabas eTable/ta-p/247073 https://community.cloudera.com/t5/Community-Articles/Simple-Change-Data-Capture-CDC-with-SQL -Selects-via-Apache/ta-p/308390 Cloudera Data Flow / Apache NiFi
  • 45. © 2023 Cloudera, Inc. All rights reserved. 45 https://community.cloudera.com/t5/Community-Articles/Change-Data-Capture-CDC-with-Apache-NiF i-Part-1-of-3/ta-p/246623 https://community.cloudera.com/t5/Community-Articles/Change-Data-Capture-CDC-with-Apache-NiF i-Part-2-of-3/ta-p/246519 https://community.cloudera.com/t5/Community-Articles/Change-Data-Capture-CDC-with-Apache-NiF i-Part-3-of-3/ta-p/246482 Cloudera Data Flow / Apache NiFi
  • 46. © 2023 Cloudera, Inc. All rights reserved. 46 https://community.cloudera.com/t5/Community-Articles/MySQL-CDC-with-Kafka-Connect-Debezium- in-CDP-Public-Cloud/ta-p/345321 CDC Debezium KConnectors for PostgreSQL, MySQL, SQL Server, DB2, and Oracle Cloudera Streams Messaging / Apache Kafka Connect