SlideShare a Scribd company logo
Welcome to the
Flink SQL
Hands-on Workshop
Hands-on Workshop:
Stream Processing Made Easy with
Flink SQL on Confluent Cloud
June 17, Seoul
HyunSoo Kim
Senior Solutions Engineer
Confluent Korea
Junhee Shin
Solutions Engineer
Confluent Korea
Todayโ€™s Hosts and Speakers
Jupil Hwang
Senior Solutions Engineer
Confluent Korea
13:30
14:00
14:30
15:30
15:40
16:40
17:00
Check-in, Setup (On-site)
Intro: What is โ€œShift Leftโ€?
Hands-on Part 1: Getting Started with Flink
Break
Hands-on Part 2: Advanced Features of Flink
Tea Time
Close the Door
4
Agenda - Workshop
๋งŒ์•ฝ Confluent Cloud ๊ณ„์ •์ด ์—†๋‹ค๋ฉด, Confluent Cloud ๊ณ„์ •์€ ์—ฌ๊ธฐ์—์„œ ์ƒ์„ฑํ•˜์„ธ์š”:
https://www.confluent.io/get-started/
Remember? Prerequisites?
AWS/Azure/GCP ์—์„œ ๊ตฌ๋™ํ•˜๋Š” Confluent Cloud ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
โ— Schema Registry๊ฐ€ ํ™œ์„ฑํ™”๋œ ํ™˜๊ฒฝ(Environment)์—์„œ
โ— ์—ฌ๋Ÿฌ ๊ฐœ์˜ Kafka Topic๋“ค์ด ์กด์žฌํ•˜๊ณ 
โ— Flink๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ฐ€์ƒ์˜ Data๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
์ด๋ฅผ ์œ„ํ•ด, Terraform ์Šคํฌ๋ฆฝํŠธ๊ฐ€ ์ œ๊ณต๋˜๋ฉฐ ์‹ค์Šต์— ํ•„์š”ํ•œ ๊ธฐ๋ณธ
์…‹ํŒ…์€ Terraform์œผ๋กœ ์ž๋™ ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค.
Remember? Prerequisites?(๊ณ„์†)
Workshop์€ ์—ฌ๊ธฐ์—์„œ ์‹œ์ž‘ํ•˜์„ธ์š”: โ€œrepo_urlโ€
https://github.com/confluentinc/confluent-cloud-flink-workshop
https://buly.kr/GksQXW8
์›Œํฌ์ˆ์ด ๋๋‚œ ํ›„์—๋Š” Cluster, Flink Pool ๋“ฑ
Confluent Cloud ๋ฆฌ์†Œ์Šค๋ฅผ ์ •๋ฆฌํ•˜๋Š” ๊ฒƒ์„ ์žŠ์ง€ ๋งˆ์„ธ์š”.
(Prerequisites ๋งค๋‰ด์–ผ์— ์‚ญ์ œํ•˜๋Š” ๋ช…๋ น์–ด๊ฐ€ ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค!)
Remember? Prerequisites?(๊ณ„์†)
Cloud resource management API Keys
Cloud resource management API Keys
Name ๋ฐ Description ์ž…๋ ฅ ํ›„ ์ƒ์„ฑ
์ƒ์„ฑ ํ›„ ๋‹ค์šด๋กœ๋“œ ๋ฐ›์œผ์„ธ์š”.
Cloud resource management API Keys & Terraform
์˜ˆ)
terraform.tfvars ์˜ˆ์‹œ
Intro:
What is โ€œShift Leftโ€?
12
Confluent Data Lakehouses
13
โ€ฆ.ํ•˜์ง€๋งŒ ๋ฐ์ดํ„ฐ ์ŠคํŠธ๋ฆฌ๋ฐ ํ”Œ๋žซํผ์ด ์—†์œผ๋ฉด ์ž˜๋ชป๋œ ๋ฐ์ดํ„ฐ๊ฐ€ ์กฐ์ง ์ „์ฒด์—
ํผ์ ธ ๋‚˜๊ฐ‘๋‹ˆ๋‹ค
๋งˆ์น˜ ํ˜ธ์ˆ˜ ์œ„์— ์žˆ๋Š” ์ง‘์— ์ง„ํ™
๋ฐœ์ž๊ตญ์„ ๋‚จ๊ธฐ๋Š” ๊ฒƒ๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€์ฃ !
Data Warehouse Data Lake โ€œLakehouseโ€
Scalable and high
performance for queries
and historical analyses
Scalable and flexible
for storing
unstructured data
Combines the advantages
of DWH and DL
์˜ค๋Š˜๋‚ ์˜ ๋ฐ์ดํ„ฐ ํŒŒ์ดํ”„๋ผ์ธ ์ ‘๊ทผ ๋ฐฉ์‹์€
๋ฐ์ดํ„ฐ ๋ฌธ์ œ์˜ ๊ทผ๋ณธ ์›์ธ์ž…๋‹ˆ๋‹ค
Domain 1
Database
Domain 2
Database
Domain 3
Database
Data Lake
Lake House
Data Mart
Data
Warehouse
ML/AI
Reports &
Dashboards
Domain 4
Database
OPERATIONAL
SYSTEMS
ETL/ELT PIPELINES
ANALYTICAL
SYSTEMS
DATA WAREHOUSE / DATA LAKE
ML/AI
Dashboards
OPERATIONAL DATA
Poor decision making
with stale data
5 / 30 / 60 min batch ingestion
Poor lineage and governance
and increasing pipeline sprawl
Cascading data pollution and failures
Time
Batch 1
Process
Batch 2
Process
Batch 3
Process
Batch 4
Process
Time
Batch 1
Process
Batch 2
Process
Batch 3
Process
Batch 4
Process
Time
Batch 1
Process
Batch 2
Process
Batch 3
Process
Batch 4
Process
Time
Batch 1
Process
Batch 2
Process
Batch 3
Process
Batch 4
Process
Complex remodelling and reprocessing = $$$
โ€˜JUST-ENOUGHโ€™
CLEANSED DATA
READY-TO-USE
BUSINESS DATA
RAW DATA
DUMPS
ANIMATED SLIDE
Reports
ELT ํŒŒ์ดํ”„๋ผ์ธ์€ ์ทจ์•ฝํ•˜๊ณ  ๋А๋ฆฌ๋ฉฐ ๋น„ํšจ์œจ์ ์ž…๋‹ˆ๋‹ค
Domain 1
Database
Domain 2
Database
Domain 3
Database
Data Lake
Lake House
Data Mart
Data
Warehouse
ML/AI
Reports &
Dashboards
Domain 4
Database
OPERATIONAL
SYSTEMS
ETL/ELT PIPELINES
ANALYTICAL
SYSTEMS
REVERSE ETL
More batch tools are bolted on
to reverse the flow of data โ€“ from
data warehouses and data lakes
back to operational systems and
apps โ€“ for โ€œreal-timeโ€ use cases
์ตœ์‹  ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์—์„œ๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ โ€˜Upstream'๋กœ ํ๋ฅด๋„๋ก ํ•ด์•ผ
ํ•˜๋Š” ๊ฒฝ์šฐ๋„ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค
Batch Data Pipeline์˜ ๋ฌธ์ œ
๋ถˆ๋Ÿ‰ ๋ฐ์ดํ„ฐ ๋ฐ
์ˆ˜๋™ ๊ณ ์žฅ ์ˆ˜๋ฆฌ
๋ถ€์‹คํ•œ ๊ฑฐ๋ฒ„๋„Œ์Šค
๋ฐ ๋ฐ์ดํ„ฐ ์ ‘๊ทผ์„ฑ
์ค‘์•™ ์ง‘์ค‘์‹
๊ณ ์ •
์ค‘๋ณต๋˜๊ณ  ๋น„์šฉ์ด ๋งŽ์ด ๋“œ๋Š” ์ฒ˜๋ฆฌ
์˜ค๋ž˜๋˜๊ณ 
์‹ ๋ขฐํ•  ์ˆ˜ ์—†๋Š” ๋ฐ์ดํ„ฐ
Batch ์ˆ˜์ง‘ ๋ฐ ๋Œ€์ƒ์ง€์—์„œ์˜ ์ค‘๋ณต ์ฒ˜๋ฆฌ๋กœ ์ธํ•ด ๋ฐ์ดํ„ฐ ์ถฉ์‹ค๋„ ๋ฐ ๊ฑฐ๋ฒ„๋„Œ์Šค ๋ฌธ์ œ๊ฐ€ ์žˆ๋Š” ๊ฑฐ๋Œ€ํ•œ Point-to-point ์—ฐ๊ฒฐ ํ˜ผ๋ž€
Operational
Databases
ELT
ETL
Raw Cleansed
Business-
ready
Raw Cleansed
Data Warehouse / Data Lake
rETL
rETL
ML/AI
Reports &
Dashboards
๋ฐ์ดํ„ฐ ์ŠคํŠธ๋ฆฌ๋ฐ์„ ํ†ตํ•ด ์‹ค์‹œ๊ฐ„ ๋ฐ ์•ˆ์ •์ ์ธ ๋ฐ์ดํ„ฐ ํŒŒ์ดํ”„๋ผ์ธ์„ ๊ตฌ์ถ•ํ•˜์„ธ์š”
Operational
Databases
Business-
ready
Data Warehouse / Data Lake
๋ฐ์ดํ„ฐ๋ฅผ ํ•œ ๋ฒˆ ๊ตฌ์ถ•ํ•˜๊ณ  ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋งŒ๋“ค์–ด ์–ด๋””์„œ๋“  ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋ฐ์ดํ„ฐ ์ƒ์„ฑ ํ›„ ๋ฐ€๋ฆฌ์ดˆ ์ด๋‚ด์— ์†Œ์Šค์—์„œ ์ฒ˜๋ฆฌ ๋ฐ ๊ฑฐ๋ฒ„๋„Œ์Šค๋ฅผ ์ „ํ™˜ํ•˜์„ธ์š”.
PROCESS
GOVERN
STREAM
Universal
Data Products
Operational Databases, SaaS Apps,
Custom Apps, AI Systemsโ€ฆ
Cleansed
Microservices
ML/AI
Reports &
Dashboards
Cleansed
๋ฐ์ดํ„ฐ ์ •๋ฆฌ ๋ฐ ์œ ์ง€ ๊ด€๋ฆฌ ๊ฐ์†Œ
๋”์šฑ ๊ฐ•ํ™”๋œ ๋ฐ์ดํ„ฐ ์‹ ๋ขฐ
๋ฐ ์ž์œจ์„ฑ
ํšจ์œจ์ ์ธ ์ฒ˜๋ฆฌ ๋ฐ ๋‚ฎ์€ ์ง€์ถœ
์œ ๋น„์ฟผํ„ฐ์Šค
๋ฐ์ดํ„ฐ ํ๋ฆ„
์‹ค์‹œ๊ฐ„
๋ฐ์ดํ„ฐ ์›จ์–ดํ•˜์šฐ์ง•
CONNECT
CONNECT
CONNECT
Confluent ๋ฐ์ดํ„ฐ ์ŠคํŠธ๋ฆฌ๋ฐ ํ”Œ๋žซํผ์˜ ์žฅ์ 
Streaming
Continuously capture and share
real-time data everywhere - to
your data warehouse, data lake and
operational systems and apps
Schema Management
Reduce faulty data downstream
by enforcing quality checks
and controls in the pipeline
with data contracts
Flink
Continuously process real-time data,
the moment itโ€™s created, for well-
curated
reusable data products
Data Portal
Enable anyone with the right
access controls to effortlessly
explore and use real-time
data products for greater
data autonomy
Tableflow
Simplify representing
your operational data as a
ready-to-use Iceberg table
in just one-click
Stream Lineage
Understand the complex
data relationships and the
data journey to ensure
trustworthiness
Focus of todayโ€™s session
How Shift Left Works
๋ฐ์ดํ„ฐ๋ฅผ ํ•œ ๋ฒˆ ์“ฐ๊ณ  ์ŠคํŠธ๋ฆผ์ด๋‚˜ ํ…Œ์ด๋ธ”๋กœ ์ฝ์–ด๋ณด์„ธ์š”
Stream processing
(Focus of todayโ€™s session)
Data Stream Data Product
Schema Registry
Tableflow
(Iceberg)
Third Party Compute
Engines
Databases
Log data &
messaging systems
Custom Apps &
Microservices
Operational Apps &
Data Systems
Stream (Kafka)
Event-Driven
Design
Decoupled
Architecture
Connect
Connect
Connect
Data Warehouses /
Data Lakes
Stream (Kafka)
COMING
SOON
READ
AS
READ
AS
Stream
Lineage
Stream
Catalog
Data
Portal
Immutable
Logs
Enterprise Resource
Planning systems
Connect
Reduce DWH / DL costs by
ingesting data from operational
systems and apps, attaching
schema, and processing it with
Flink, in order to share high-
quality streams to analytics
systems (e.g., SNOW, DBricks) in
real time
Continuously analyze and update
results as data streams are
produced for real-time
dashboarding via a RT analytics
DB (e.g., Druid, Rockset, Pinot)
โ— Ad/campaign performance
โ— Content performance
โ— Quality monitoring of Telco
networks
โ— Large-scale graph analysis
Analyze data streams over time
windows to detect patterns and
react to incoming events by
triggering computations, state
updates, or external actions (i.e.,
microservices)
Description
Sample Use
Cases
(Technical
and Business)
Category
Real-time Analytics
โ— Real-time search index building
โ— ML pipelines
โ— Data warehouse modernization
โ— Database modernization
โ— Data lake ingestion
โ— Reporting and analytics
Data Pipelines (โ€œShift
Leftโ€)
Event Driven Applications
โ— Fraud detection
โ— Anomaly detection
โ— Alerting/notifications
โ— Routing
โ— Business process monitoring
โ— Bad experience detection
์ŠคํŠธ๋ฆผ ์ฒ˜๋ฆฌ๋ฅผ ํ†ตํ•ด ๋น„์ฆˆ๋‹ˆ์Šค ๊ฐ€์น˜์™€ ๊ด€๋ จ๋œ ๊ด‘๋ฒ”์œ„ํ•œ ์‚ฌ์šฉ
์‚ฌ๋ก€๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค
Kafka Streams ksqlDB
Kafka ecosystem
Client library deployed to Java Runtime.
Self hosted. Input and output data are
stored in single Kafka cluster.
Java
Open Source
Standalone SQL engine built on top of
Kafka Streams. Input and output data are
stored in single Kafka cluster
SQL
Community Source
Flink
Flink
Framework and distributed engine for
stateful computations over unbounded
and bounded data streams
Java, Python, SQL
Open Source
22
Kafka๋ฅผ ์œ„ํ•œ ์ŠคํŠธ๋ฆผ ์ฒ˜๋ฆฌ(Stream Processing)
Real-time
Data
A Sale
A Shipment
A Trade
A Customer
Experience
Real-Time Backend
Operations
Real-time Stream Processing
์‹ค์‹œ๊ฐ„ ์„œ๋น„์Šค๋Š” ์ŠคํŠธ๋ฆผ ์ฒ˜๋ฆฌ์— ์˜์กดํ•ฉ๋‹ˆ๋‹ค
DATA IN MOTION
์ŠคํŠธ๋ฆฌ๋ฐ
์• ํ”Œ๋ฆฌ์ผ€์ด์…˜
Apache
Flink
Apache
Kafka
DATA AT REST
์• ํ”Œ๋ฆฌ์ผ€์ด์…˜
๋ ˆ์ด์–ด
์ปดํ“จํŒ…
๋ ˆ์ด์–ด
์Šคํ† ๋ฆฌ์ง€
๋ ˆ์ด์–ด
์ „ํ†ต์ ์ธ
๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค
ํŒŒ์ผ
์‹œ์Šคํ…œ
์›น
์• ํ”Œ๋ฆฌ์ผ€์ด์…˜
์ŠคํŠธ๋ฆผ ์ฒ˜๋ฆฌ๋Š” Kafka์˜ ์ปดํ“จํŒ… ๊ณ„์ธต ์—ญํ• ์„ ํ•˜์—ฌ ์‹ค์‹œ๊ฐ„
์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋ฐ ํŒŒ์ดํ”„๋ผ์ธ์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค
๋””์ง€ํ„ธ ๋„ค์ดํ‹ฐ๋ธŒ ๊ธฐ์—…์€ Flink๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์‹œ์žฅ์„ ํ˜์‹ ํ•˜๊ณ 
๊ฒฝ์Ÿ ์šฐ์œ„๋ฅผ ํ™•๋ณดํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค
UBER: ์‹ค์‹œ๊ฐ„ ๊ฐ€๊ฒฉ NETFLIX: ๋งž์ถคํ˜• ์ถ”์ฒœ STRIPE: ์‹ค์‹œ๊ฐ„ ์‚ฌ๊ธฐ ํƒ์ง€
Scalability and
Performance
Fault
Tolerance
Flink๋Š” ์ƒ์œ„ 5๊ฐœ Apache ํ”„๋กœ์ ํŠธ ์ค‘ ํ•˜๋‚˜์ด๋ฉฐ ๊ฐ•๋ ฅํ•œ ๊ฐœ๋ฐœ์ž ์ปค๋ฎค๋‹ˆํ‹ฐ๋ฅผ ๋ณด์œ ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
Unified
Processing
Flink๋Š” ์—„์ฒญ๋‚œ ๊ทœ๋ชจ์˜
์ŠคํŠธ๋ฆผ ์ฒ˜๋ฆฌ ์›Œํฌ๋กœ๋“œ๋ฅผ
์ง€์›ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
Language
Flexibility
Flink์˜ ๋‚ด๊ฒฐํ•จ์„ฑ
๋ฉ”์ปค๋‹ˆ์ฆ˜์€ ์žฅ์• ๋ฅผ
ํšจ๊ณผ์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•˜๊ณ 
๊ณ ๊ฐ€์šฉ์„ฑ์„ ์ œ๊ณตํ•  ์ˆ˜
์žˆ๋„๋ก ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค.
Flink๋Š” 150๊ฐœ ์ด์ƒ์˜ ๋‚ด์žฅ
๊ธฐ๋Šฅ์„ ํ†ตํ•ด Java, Python
๋ฐ SQL์„ ์ง€์›ํ•˜๋ฏ€๋กœ
๊ฐœ๋ฐœ์ž๋Š” ์›ํ•˜๋Š” ์–ธ์–ด๋กœ
์ž‘์—…ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
Flink๋Š” ํ•˜๋‚˜์˜ ๊ธฐ์ˆ ์„
ํ†ตํ•ด ์ŠคํŠธ๋ฆผ ์ฒ˜๋ฆฌ(stream
processing), ์ผ๊ด„
์ฒ˜๋ฆฌ(batch processing)
๋ฐ ์ž„์‹œ ๋ถ„์„(ad-hoc
analytics)์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.
๋งŽ์€ ์กฐ์ง๋“ค์€ ์„ฑ๋Šฅ๊ณผ ํ’๋ถ€ํ•œ ๊ธฐ๋Šฅ ์„ธํŠธ ๋•Œ๋ฌธ์— Flink๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค
0
50,000
100,000
150,000
2020 2021 2022
2016 2017 2018
Flink
Kafka
Two Apache Projects, Born a
Few Years Apart
Monthly Unique Users
Flink์˜ ์„ฑ์žฅ์€ ์ŠคํŠธ๋ฆฌ๋ฐ
๋ฐ์ดํ„ฐ์˜ ์‚ฌ์‹ค์ƒ ํ‘œ์ค€์ธ
Kafka์˜ ์„ฑ์žฅ์„ ๋ฐ˜์˜ํ•ฉ๋‹ˆ๋‹ค
Fortune 500๋Œ€ ๊ธฐ์—… ์ค‘ 75% ์ด์ƒ์ด Kafka๋ฅผ
์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์ถ”์ •
Kafka๋ฅผ ์‚ฌ์šฉํ•˜๋Š” 100,000๊ฐœ ์ด์ƒ์˜ ์กฐ์ง
41,000๋ช… ์ด์ƒ์˜ Kafka ๋ชจ์ž„ ์ฐธ์„์ž
750๊ฐœ ์ด์ƒ์˜ Kafka ๊ฐœ์„  ์ œ์•ˆ(KIP)
Apache Kafka๋ฅผ ์œ„ํ•œ 12,000๊ฐœ ์ด์ƒ์˜ Jiras
Hands-on Workshop Overview
Confluent Cloud for
Apache Flinkยฎ
๊ฐ„ํŽธํ•œ ์„œ๋ฒ„๋ฆฌ์Šค ์ŠคํŠธ๋ฆผ ์ฒ˜๋ฆฌ
์—…๊ณ„ ์œ ์ผ์˜ ํด๋ผ์šฐ๋“œ ๋„ค์ดํ‹ฐ๋ธŒ
์„œ๋ฒ„๋ฆฌ์Šค Flink ์„œ๋น„์Šค๋กœ ๊ณ ํ’ˆ์งˆ์˜
์žฌ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋ฐ์ดํ„ฐ ์ŠคํŠธ๋ฆผ์„ ์†์‰ฝ๊ฒŒ
๊ตฌ์ถ•ํ•˜์„ธ์š”
์ŠคํŠธ๋ฆผ ์ฒ˜๋ฆฌ์˜ ์‚ฌ์‹ค์ƒ ํ‘œ์ค€์ธ Flink๋ฅผ
์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์ŠคํŠธ๋ฆผ์„ ์†์‰ฝ๊ฒŒ ํ•„ํ„ฐ๋ง,
๋ถ„์„ ๋ฐ ํ’๋ถ€ํ•˜๊ฒŒ ๋งŒ๋“œ์‹ญ์‹œ์˜ค
์ธํ”„๋ผ ๊ด€๋ฆฌ์˜ ๋ณต์žก์„ฑ ์—†์ด ๋ชจ๋“  ๊ทœ๋ชจ์—์„œ
๊ณ ์„ฑ๋Šฅ ๋ฐ ํšจ์œจ์ ์ธ ์ŠคํŠธ๋ฆผ ์ฒ˜๋ฆฌ๋ฅผ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค
์™„๋ฒฝํ•˜๊ฒŒ ํ†ตํ•ฉ๋œ ๋ชจ๋‹ˆํ„ฐ๋ง, ๋ณด์•ˆ ๋ฐ
๊ฑฐ๋ฒ„๋„Œ์Šค๋ฅผ ๊ฐ–์ถ˜ ํ†ตํ•ฉ ํ”Œ๋žซํผ์œผ๋กœ Kafka์™€
Flink๋ฅผ ๊ฒฝํ—˜ํ•ด ๋ณด์„ธ์š”
Now available on all 3 clouds
Lab์—์„œ๋Š” ์ œ3์ž ๋ฆฌ์…€๋Ÿฌ๋ฅผ ์œ„ํ•œ ๋ฐ์ดํ„ฐ ์ œํ’ˆ์„ ๋งŒ๋“ญ๋‹ˆ๋‹ค
โ— ์ด lab์€ Amazon๊ณผ Walmart์™€ ๊ฐ™์€ ์œ ๋ช… ๊ณต๊ธ‰์—…์ฒด์˜ ์ œํ’ˆ์„
์ œ๊ณตํ•˜๋Š” ํƒ€์‚ฌ ๋ฆฌ์…€๋Ÿฌ์— ์ค‘์ ์„ ๋‘ก๋‹ˆ๋‹ค.
โ—‹ ์ฒซ ๋ฒˆ์งธ lab์—์„œ๋Š” ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ ์†Œ์Šค๋ฅผ
์‚ดํŽด๋ณด๊ณ  ๋ฐ์ดํ„ฐ๋ฅผ ์ง‘๊ณ„ํ•˜๊ธฐ ์œ„ํ•œ ์ž„์‹œ(ad-hoc) ์ฟผ๋ฆฌ๋ฅผ
์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค.
โ—‹ ๋‘ ๋ฒˆ์งธ lab์—์„œ๋Š” Flink์˜ ๊ณ ๊ธ‰ ๊ธฐ๋Šฅ์„ ์ž์„ธํžˆ ์‚ดํŽด๋ด…๋‹ˆ๋‹ค.
๋‹ค์–‘ํ•œ ์œ ํ˜•์˜ ์กฐ์ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ช‡ ๊ฐ€์ง€ ๋ฐ์ดํ„ฐ ์ œํ’ˆ์„
์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๋จผ์ €, ์œ ํšจํ•˜์ง€ ์•Š์€ ๊ฒฐ์ œ ๋‚ด์—ญ์ด ์žˆ๋Š” ์ฃผ๋ฌธ์„
ํ•„ํ„ฐ๋งํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ ๊ณ ๊ฐ ํ”„๋กœ๋ชจ์…˜ ๋ฐ ์ถฉ์„ฑ๋„ ๋ ˆ๋ฒจ์„
๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
โ— The Architecture
โ—‹ ์ด lab์€ ๋ชจ๋‘ Confluent Cloud์—์„œ ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค.
โ—‹ ๋ฐ์ดํ„ฐ๋Š” ์ œํ’ˆ, ๊ณ ๊ฐ, ์ฃผ๋ฌธ, ๊ฒฐ์ œ๋ผ๋Š” ๋„ค ๊ฐœ์˜ ์ŠคํŠธ๋ฆฌ๋ฐ
ํ…Œ์ด๋ธ”์— ์‹ค์‹œ๊ฐ„์œผ๋กœ ์ž…๋ ฅ๋ฉ๋‹ˆ๋‹ค.
The Hands-on Architecture
Step 1: Deduplicating Orders
Step 2: Create valid_orders table with Flink Joins
Step 3: Data Enrichment
Step 4: Promotions Calculation
Step 5: Data pipeline observability
Step 6: Loyalty Levels Calculation
์‚ฌ์šฉํ•  ๋„๊ตฌ๋“ค: Console Workspace, Shell, Monitoring
39
Cloud Console
Workspace
Flink
Shell
Flink
Monitoring
์šด์˜ : Autoscale, Increase without Downtime
โ— Autoscale within CFUs
โ— Increase CFUs without downtime
โ— Delete Pool(s)
Flink์—์„œ ์ŠคํŠธ๋ฆฌ๋ฐ
๋ฐ์ดํ„ฐ๊ฐ€ ์–ด๋–ป๊ฒŒ
๋ณ€ํ™˜๋˜๊ณ  ์ฒ˜๋ฆฌ๋˜๋Š”์ง€
์‹œ๊ฐ์ ์œผ๋กœ ์ถ”์ 
๋ฐ์ดํ„ฐ ํ๋ฆ„์„ ์‹œ๊ฐํ™”ํ•˜๊ณ  ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•œ
๋„๊ตฌ์ธ Stream Lineage๋Š” Flink์™€
ํ†ตํ•ฉ๋˜์–ด Flink ์ฟผ๋ฆฌ๋ฅผ ํ†ตํ•ด ํ๋ฅด๋Š”
๋ฉ”์‹œ์ง€์˜ ๊ณ„๋ณด๋ฅผ ์บก์ฒ˜ํ•˜๊ณ  ํ‘œ์‹œํ•˜์—ฌ
์‚ฌ์šฉ์ž๊ฐ€ ๋‹ค์Œ์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค:
โ— ๊ฐ ๋ฐ์ดํ„ฐ ํ๋ฆ„์ด ์–ด๋””์—์„œ
์‹œ์ž‘๋˜์—ˆ๋Š”์ง€ ์ถ”์ ํ•ฉ๋‹ˆ๋‹ค.
โ— Flink๊ฐ€ ๋ฐ์ดํ„ฐ ํ๋ฆ„์„ ์–ด๋–ป๊ฒŒ
๋ณ€ํ™˜ํ•˜๋Š”์ง€ ์ถ”์ ํ•ฉ๋‹ˆ๋‹ค.
โ— ๊ฐ ๋ฐ์ดํ„ฐ ํ๋ฆ„์ด ์–ด๋””์—์„œ ๋๋‚˜๋Š”์ง€
๊ด€์ฐฐํ•ฉ๋‹ˆ๋‹ค.
Inspect a Flink query
Track how data flows to and from Flink
Recap
์‹œ๊ฐ„ ๊ฒฝ๊ณผ์— ๋”ฐ๋ฅธ ์ฒ˜๋ฆฌ๋Ÿ‰ ์šฉ๋Ÿ‰
์ˆ˜์š”
๋ฆฌ์†Œ์Šค ํ™œ์šฉ๋ฅ  ๊ทน๋Œ€ํ™” ๋ฐ ๊ณผ๋„ํ•œ ํ”„๋กœ๋น„์ €๋‹ ๋ฐฉ์ง€
๋ณ€ํ™”ํ•˜๋Š” ๋น„์ฆˆ๋‹ˆ์Šค
์š”๊ตฌ ์‚ฌํ•ญ์— ๋งž๊ฒŒ
ํƒ„๋ ฅ์ ์œผ๋กœ ํ™•์žฅ
๊ฐ€์žฅ ๋ณต์žกํ•œ ์›Œํฌ๋กœ๋“œ์˜ ์š”๊ตฌ ์‚ฌํ•ญ์„
์ถฉ์กฑํ•˜๋„๋ก ์ž๋™์œผ๋กœ ํ™•์žฅ ๋˜๋Š” ์ถ•์†Œ
โ€ข ํ™œ์šฉ๋„๊ฐ€ ๋‚ฎ์€ ์ธํ”„๋ผ ๋ฆฌ์†Œ์Šค ๋ฐฉ์ง€
โ€ข ๊ทœ๋ชจ๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฐ€๊ฒฉ ์ •์ฑ…์œผ๋กœ ์‚ฌ์šฉํ•œ
๋ฆฌ์†Œ์Šค์— ๋Œ€ํ•ด์„œ๋งŒ ๋น„์šฉ ์ง€๋ถˆ
0 CFU
MAX CFUs
๋ธŒ๋ผ์šฐ์ € ๊ธฐ๋ฐ˜ SQL
์ธํ„ฐํŽ˜์ด์Šค๋ฅผ
์‚ฌ์šฉํ•˜์—ฌ Kafka
๋ฐ์ดํ„ฐ ํƒ์ƒ‰ ๋ฐ ์ฟผ๋ฆฌ
SQL Workspaces๋Š” Flink SQL์„ ์‚ฌ์šฉํ•˜์—ฌ
๋ชจ๋“  Confluent Cloud ๋ฐ์ดํ„ฐ๋ฅผ ๋™์ ์œผ๋กœ
ํƒ์ƒ‰ํ•˜๊ณ  ์ƒํ˜ธ ์ž‘์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์ง๊ด€์ ์ด๊ณ 
์œ ์—ฐํ•œ UI๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
โ— ์ฟผ๋ฆฌ๋ฅผ ์ €์žฅํ•˜์—ฌ ๋‚˜์ค‘์— ๋‹ค์‹œ ๊ฒ€ํ† ํ•˜๊ณ 
์ž‘์—…ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ— ๋‹จ์ผ ๋ทฐ์—์„œ ์—ฌ๋Ÿฌ ์‹ค์‹œ๊ฐ„ ์ฟผ๋ฆฌ๋ฅผ ๋™์‹œ์—
์‹คํ–‰ํ•˜์„ธ์š”.
โ— SQL ์ค‘์‹ฌ ๊ด€์ ์—์„œ ํ™˜๊ฒฝ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋ฅผ
๊ฒ€์‚ฌํ•˜์„ธ์š”.
๋Œ€ํ™”ํ˜• ํ…Œ์ด๋ธ”์„
์‚ฌ์šฉํ•˜์—ฌ ์ฟผ๋ฆฌ ๊ฒฐ๊ณผ๋ฅผ
ํƒ์ƒ‰ํ•˜๊ณ  ์‹œ๊ฐํ™”
Flink SQL Workspaces์šฉ ๋Œ€ํ™”ํ˜• ํ…Œ์ด๋ธ”์„
์‚ฌ์šฉํ•˜๋ฉด ๊ฐ ์ฟผ๋ฆฌ์˜ ์ถœ๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ์Šค์บ”, ๋ถ„์„
๋ฐ ํ”„๋กœํŒŒ์ผ๋งํ•˜์—ฌ ๊ฐœ๋ฐœ ๋ฐ ๋ฌธ์ œ ํ•ด๊ฒฐ
ํ”„๋กœ์„ธ์Šค๋ฅผ ๊ฐ„์†Œํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ— ๋ฐ์ดํ„ฐ ํƒ์ƒ‰ ๋ฐ ํ”„๋กœํŒŒ์ผ๋ง ๊ฐ„์†Œํ™”
โ— ๋ฐ์ดํ„ฐ ์ถ”์„ธ ๋ฐ ๋ถ„ํฌ์— ๋Œ€ํ•œ ์ฆ‰๊ฐ์ ์ธ
์ธ์‚ฌ์ดํŠธ ํ™•๋ณด
โ— ๋ฌธ์ œ ํ•ด๊ฒฐ ๋ฐ ๋ชจ๋‹ˆํ„ฐ๋ง ๊ฐ•ํ™”
Actions๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ผ๋ฐ˜์ ์ธ ์‚ฌ์šฉ ์‚ฌ๋ก€์— ๋Œ€ํ•œ ์ŠคํŠธ๋ฆผ ์ฒ˜๋ฆฌ
์ž‘์—…์˜ ๋ฐฐํฌ๋ฅผ ๊ฐ„์†Œํ™”ํ•ฉ๋‹ˆ๋‹ค
Deduplicate topic
Generate a topic containing only
unique records from an input topic
Mask fields
Generate a topic containing masked
fields from an input topic
Filter topic
Filter a topic based on a given set of
conditions
Apply a transformation
Transform a topic based on a set of
provided expressions
COMING SOON
Actions provide pre-packaged, turnkey stream
processing workloads that run on Flink
์‚ฌ์šฉ์ž ์ •์˜ ํ•จ์ˆ˜๋ฅผ
์‚ฌ์šฉํ•˜์—ฌ Flink SQL
๊ธฐ๋Šฅ ํ™•์žฅ
์‚ฌ์šฉ์ž ์ •์˜ ํ•จ์ˆ˜(UDF)๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด Flink
SQL์—์„œ ๊ธฐ๋ณธ์ ์œผ๋กœ ์ง€์›๋˜์ง€ ์•Š๋Š” ๋ณต์žกํ•œ
๋…ผ๋ฆฌ๋ฅผ ๊ตฌํ˜„ํ•˜๊ธฐ ์œ„ํ•œ ์‚ฌ์šฉ์ž ์ •์˜ ํ•จ์ˆ˜๋ฅผ
์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ— ํŠน์ • ์‚ฌ์šฉ ์‚ฌ๋ก€์— ๋งž์ถฐ ์ฒ˜๋ฆฌ
โ— ์—ฌ๋Ÿฌ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์—์„œ ์žฌ์‚ฌ์šฉ
โ— ์„ ํ˜ธํ•˜๋Š” ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์–ธ์–ด๋กœ ์ž‘์—…
Java UDF
SQL query
EARLY ACCESS
NOTE: Early Access is open to a limited number of
candidates. Only Java and scalar functions are supported
initially. Python support planned for 2H โ€˜24.
UDF
arguments
UDF
result
Table API(Open Preview)๋Š” Java ๋˜๋Š”
Python์—์„œ ํ”„๋กœ๊ทธ๋ž˜๋ฐ ๋ฐฉ์‹์˜ ์ œ์–ด๋ฅผ
์ œ๊ณตํ•˜์—ฌ ๊ธฐ์กด ์ฝ”๋“œ๋ฒ ์ด์Šค์— ์›ํ™œํ•˜๊ฒŒ ํ†ตํ•ฉํ• 
์ˆ˜ ์žˆ๋Š” ํ’๋ถ€ํ•œ ์—ฐ์‚ฐ ๋ฐ ๋ณ€ํ™˜ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
โ— ์ต์ˆ™ํ•œ ๊ตฌ์กฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ŠคํŠธ๋ฆฌ๋ฐ
์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ๊ตฌ์ถ•ํ•ฉ๋‹ˆ๋‹ค.
โ— ๋ช…๋ นํ˜• ํ”„๋กœ๊ทธ๋ž˜๋ฐ ๋ฐฉ์‹์„ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.
โ— ๊ตฌ์กฐ์ ์ด๊ณ  ๊ฐ•๋ ฅํ•œ ํ˜•์‹์˜ ์„ค๊ณ„๋ฅผ ํ†ตํ•ด ๊ฐœ๋ฐœ,
ํ…Œ์ŠคํŠธ ๋ฐ ์œ ์ง€ ๊ด€๋ฆฌ๋ฅผ ๊ฐ„์†Œํ™”ํ•ฉ๋‹ˆ๋‹ค.
Table API ์ง€์›์„
ํ†ตํ•ด ์„œ๋ฒ„๋ฆฌ์Šค Flink
์ ‘๊ทผ์„ฑ ํ™•๋Œ€
Track status of Table API statements
Use full capabilities of modern IDEs
๊ณ ๊ธ‰ SQL ์ŠคํŠธ๋ฆฌ๋ฐ ์—ฐ์‚ฐ์ž
51
Time Windows Pattern Matching Streaming Joins
โ— Time-based windows
โ— Event-density windows
โ— Event-based windows: every
single event can trigger a new
window
โ— Complex Event Processing
โ— See sample
โ— Stream-to-stream joins
โ— Temporal joins
โ— Lookup joins
โ— Versioned joins
Fully integrated out of the box
โ— Connected via Confluent Connector
โ— Environments are Catalogs
โ— Kafka Clusters as Databases
โ— Topics are Tables
โ— RBAC for managing flink Resources
โ—‹ Keep in mind: A statementโ€™s
access level is determined
entirely by the permissions that
you attach to the statement
โ— Schema Registry, Data Portal,
Lineage, Consumer/Producer
Monitoring, Metric APIโ€ฆ
โ— Cluster and Pool need to be in the
same region and same CSP
โ— All over the Confluent Organisation
including all environments and
clusters
Flink๋Š” Confluent Cloud์™€ ์™„๋ฒฝํ•˜๊ฒŒ ํ†ตํ•ฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค
AWS์˜ ์ „์šฉ
ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์œ„ํ•œ Flink
ํ”„๋ผ์ด๋น— ๋„คํŠธ์›Œํ‚น
Flink์— ๋Œ€ํ•œ ๊ฐœ์ธ ๋„คํŠธ์›Œํ‚น ์ง€์›์„ ํ†ตํ•ด
Confluent ์‚ฌ์šฉ์ž๋Š” ๋‹ค์Œ์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ— ๋ฐ์ดํ„ฐ ๋ณด์•ˆ ๋ฐ ๊ฐœ์ธ์ •๋ณด ๋ณดํ˜ธ ๊ฐ•ํ™”
โ— ๋ณด์•ˆ ๋„คํŠธ์›Œํฌ ๊ตฌ์„ฑ ๊ฐ„์†Œํ™”
โ— ํด๋Ÿฌ์Šคํ„ฐ ๋ฐ ํ™˜๊ฒฝ ์ „๋ฐ˜์—์„œ ์•ˆ์ „ํ•˜๊ณ  ์œ ์—ฐํ•œ
์ŠคํŠธ๋ฆผ ์ฒ˜๋ฆฌ ์ง€์›
Env A Env A
Env B Env B
PUBLIC PLATT
Internet Private Link (AWS)
Private
Cluster
(Dedicated,
Enterprise)
Public
Cluster
(Dedicated,
Standard,
Basic)
Private
Cluster
(Dedicated,
Enterprise)
Public
Cluster
(Dedicated,
Standard,
Basic)
โ— No access to private clusters โ— No cross-env access
โ— No egress to public clusters
55
Flink, KStreams ๋ฐ ksqlDB์˜ ์ฃผ์š” ์ฐจ์ด์ 
Attribute CP Flink CC Flink Kafka Streams ksqlDB
Description
Stream processing framework developed independent of Apache
Kafka
Embeddable client library for
Java applications that is part
of the Apache Kafka project
Stream processing framework
that exposes Kafka Streams
functionality through SQL
Processing
modes
โ— Unified stream and batch processing
โ— Supports reads from multiple Kafka clusters
โ— Stream processing only
โ— Supports reads from
single Kafka cluster
โ— Stream processing only
โ— Supports reads from
single Kafka cluster
Pricing
โ— Restore state after failure from most recent incremental
snapshot
โ— Restore state after failure
by replaying all messages
โ— Restore state after failure
by replaying all messages
CFLT
deployment
model
โ— Self-managed offering
with Confluent Platform
โ— Fully managed
โ— No cluster deployment,
scales to zero
โ— Self-managed
โ— Embeddable client library
with no cluster
โ— Fully managed and self-
managed
โ— Separate cluster
deployment
Language
flexibility
โ— Full support of all Flink
APIs (SQL, Table API,
DataStream,
ProcessFunction)
โ— ANSI-compliant SQL
โ— Java UDFs EA
โ— Table API Open preview
โ— Java (more flexible than
SQL, but more complex)
โ— SQL syntax inspired by
ANSI SQL
We recommend Confluent Cloud for Apache Flink for all new cloud workloads
confluent.io/get-started

More Related Content

Similar to Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean) (20)

PDF
AWS ๊ธฐ๋ฐ˜์˜ ๋Œ€์šฉ๋Ÿ‰ ์‹ค์‹œ๊ฐ„ ์ŠคํŠธ๋ฆฌ๋ฐ ๋ฐ์ดํ„ฐ ๋ถ„์„ ์•„ํ‚คํ…์ฒ˜ ํŒจํ„ด::๊น€ํ•„์ค‘::AWS Summit Seoul 2018
Amazon Web Services Korea
ย 
PDF
์Šค์‚ฌ๋ชจ ํ…Œํฌํ†ก - Apache Flink ๋‘˜๋Ÿฌ๋ณด๊ธฐ
SangWoo Kim
ย 
PDF
Demystify Streaming on AWS - ๋ฐœํ‘œ์ž: ์ด์ข…ํ˜, Sr Analytics Specialist, WWSO, AWS :::...
Amazon Web Services Korea
ย 
PDF
[236] แ„แ…กแ„แ…กแ„‹แ…ฉแ„‹แ…ดแ„ƒแ…ฆแ„‹แ…ตแ„แ…ฅแ„‘แ…กแ„‹แ…ตแ„‘แ…ณแ„…แ…กแ„‹แ…ตแ†ซ แ„‹แ…ฒแ†ซแ„ƒแ…ฉแ„‹แ…งแ†ผ
NAVER D2
ย 
PDF
ksqlDB๋กœ ์‹œ์ž‘ํ•˜๋Š” ์ŠคํŠธ๋ฆผ ํ”„๋กœ์„ธ์‹ฑ
confluent
ย 
PDF
Confluent Startup Webinar Series
confluent
ย 
PDF
Data in Motion Tour Seoul 2024 - Keynote
confluent
ย 
PPTX
Streaming platform Kafka in SK planet
Byeongsu Kang
ย 
PDF
AWS๊ธฐ๋ฐ˜ ์„œ๋ฒ„๋ฆฌ์Šค ๋ฐ์ดํ„ฐ๋ ˆ์ดํฌ ๊ตฌ์ถ•ํ•˜๊ธฐ - ๊น€์ง„์›… (SK C&C) :: AWS Community Day 2020
Jinwoong Kim
ย 
PDF
AWS๊ธฐ๋ฐ˜ ์„œ๋ฒ„๋ฆฌ์Šค ๋ฐ์ดํ„ฐ๋ ˆ์ดํฌ ๊ตฌ์ถ•ํ•˜๊ธฐ - ๊น€์ง„์›… (SK C&C) :: AWS Community Day 2020
AWSKRUG - AWSํ•œ๊ตญ์‚ฌ์šฉ์ž๋ชจ์ž„
ย 
PPTX
DeView2013 Big Data Platform Architecture with Hadoop - Hyeong-jun Kim
Gruter
ย 
PDF
234 deview2013 แ„€แ…ตแ†ทแ„’แ…งแ†ผแ„Œแ…ฎแ†ซ
NAVER D2
ย 
PDF
Data in Motion์„ ์œ„ํ•œ ์ด๋ฒคํŠธ ๊ธฐ๋ฐ˜ ๋งˆ์ดํฌ๋กœ์„œ๋น„์Šค ์•„ํ‚คํ…์ฒ˜ ์†Œ๊ฐœ
confluent
ย 
PPTX
Spark streaming tutorial
Minho Kim
ย 
PPTX
data platform on kubernetes
์ฐฝ์–ธ ์ •
ย 
PDF
[AWSKRUG] ๋ฐ์ดํ„ฐ ์–ผ๋งˆ๊นŒ์ง€ ์•Œ์•„๋ณด์…จ์–ด์š”?
Yan So
ย 
PDF
แ„Œแ…ตแ„€แ…ณแ†ท แ„’แ…กแ†บแ„’แ…กแ†ซ Real-time In-memory Stream Processing แ„‹แ…ตแ„‹แ…ฃแ„€แ…ต
Ted Won
ย 
PDF
3 ๋น…๋ฐ์ดํ„ฐ๊ธฐ๋ฐ˜๋น„์ •ํ˜•๋ฐ์ดํ„ฐ์˜์‹ค์‹œ๊ฐ„์ฒ˜๋ฆฌ๋ฐฉ๋ฒ• ์›์ข…์„
Saltlux Inc.
ย 
PPTX
2017 ์ฃผ์š” ๊ธฐ์ˆ  ํ๋ฆ„ ๋ฐ ๊ฐœ์š”
Hosung Lee
ย 
PDF
๋น…๋ฐ์ดํ„ฐ ๊ธฐ์ˆ  ํ˜„ํ™ฉ๊ณผ ์‹œ์žฅ ์ „๋ง(2014)
Channy Yun
ย 
AWS ๊ธฐ๋ฐ˜์˜ ๋Œ€์šฉ๋Ÿ‰ ์‹ค์‹œ๊ฐ„ ์ŠคํŠธ๋ฆฌ๋ฐ ๋ฐ์ดํ„ฐ ๋ถ„์„ ์•„ํ‚คํ…์ฒ˜ ํŒจํ„ด::๊น€ํ•„์ค‘::AWS Summit Seoul 2018
Amazon Web Services Korea
ย 
์Šค์‚ฌ๋ชจ ํ…Œํฌํ†ก - Apache Flink ๋‘˜๋Ÿฌ๋ณด๊ธฐ
SangWoo Kim
ย 
Demystify Streaming on AWS - ๋ฐœํ‘œ์ž: ์ด์ข…ํ˜, Sr Analytics Specialist, WWSO, AWS :::...
Amazon Web Services Korea
ย 
[236] แ„แ…กแ„แ…กแ„‹แ…ฉแ„‹แ…ดแ„ƒแ…ฆแ„‹แ…ตแ„แ…ฅแ„‘แ…กแ„‹แ…ตแ„‘แ…ณแ„…แ…กแ„‹แ…ตแ†ซ แ„‹แ…ฒแ†ซแ„ƒแ…ฉแ„‹แ…งแ†ผ
NAVER D2
ย 
ksqlDB๋กœ ์‹œ์ž‘ํ•˜๋Š” ์ŠคํŠธ๋ฆผ ํ”„๋กœ์„ธ์‹ฑ
confluent
ย 
Confluent Startup Webinar Series
confluent
ย 
Data in Motion Tour Seoul 2024 - Keynote
confluent
ย 
Streaming platform Kafka in SK planet
Byeongsu Kang
ย 
AWS๊ธฐ๋ฐ˜ ์„œ๋ฒ„๋ฆฌ์Šค ๋ฐ์ดํ„ฐ๋ ˆ์ดํฌ ๊ตฌ์ถ•ํ•˜๊ธฐ - ๊น€์ง„์›… (SK C&C) :: AWS Community Day 2020
Jinwoong Kim
ย 
AWS๊ธฐ๋ฐ˜ ์„œ๋ฒ„๋ฆฌ์Šค ๋ฐ์ดํ„ฐ๋ ˆ์ดํฌ ๊ตฌ์ถ•ํ•˜๊ธฐ - ๊น€์ง„์›… (SK C&C) :: AWS Community Day 2020
AWSKRUG - AWSํ•œ๊ตญ์‚ฌ์šฉ์ž๋ชจ์ž„
ย 
DeView2013 Big Data Platform Architecture with Hadoop - Hyeong-jun Kim
Gruter
ย 
234 deview2013 แ„€แ…ตแ†ทแ„’แ…งแ†ผแ„Œแ…ฎแ†ซ
NAVER D2
ย 
Data in Motion์„ ์œ„ํ•œ ์ด๋ฒคํŠธ ๊ธฐ๋ฐ˜ ๋งˆ์ดํฌ๋กœ์„œ๋น„์Šค ์•„ํ‚คํ…์ฒ˜ ์†Œ๊ฐœ
confluent
ย 
Spark streaming tutorial
Minho Kim
ย 
data platform on kubernetes
์ฐฝ์–ธ ์ •
ย 
[AWSKRUG] ๋ฐ์ดํ„ฐ ์–ผ๋งˆ๊นŒ์ง€ ์•Œ์•„๋ณด์…จ์–ด์š”?
Yan So
ย 
แ„Œแ…ตแ„€แ…ณแ†ท แ„’แ…กแ†บแ„’แ…กแ†ซ Real-time In-memory Stream Processing แ„‹แ…ตแ„‹แ…ฃแ„€แ…ต
Ted Won
ย 
3 ๋น…๋ฐ์ดํ„ฐ๊ธฐ๋ฐ˜๋น„์ •ํ˜•๋ฐ์ดํ„ฐ์˜์‹ค์‹œ๊ฐ„์ฒ˜๋ฆฌ๋ฐฉ๋ฒ• ์›์ข…์„
Saltlux Inc.
ย 
2017 ์ฃผ์š” ๊ธฐ์ˆ  ํ๋ฆ„ ๋ฐ ๊ฐœ์š”
Hosung Lee
ย 
๋น…๋ฐ์ดํ„ฐ ๊ธฐ์ˆ  ํ˜„ํ™ฉ๊ณผ ์‹œ์žฅ ์ „๋ง(2014)
Channy Yun
ย 

More from confluent (20)

PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
ย 
PDF
Migration, backup and restore made easy using Kannika
confluent
ย 
PDF
Five Things You Need to Know About Data Streaming in 2025
confluent
ย 
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
ย 
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
ย 
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
ย 
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
ย 
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
ย 
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
ย 
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
ย 
PDF
Unlocking value with event-driven architecture by Confluent
confluent
ย 
PDF
Il Data Streaming per unโ€™AI real-time di nuova generazione
confluent
ย 
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
ย 
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
ย 
PDF
Building API data products on top of your real-time data infrastructure
confluent
ย 
PDF
Speed Wins: From Kafka to APIs in Minutes
confluent
ย 
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
ย 
PDF
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
confluent
ย 
PDF
Santander Stream Processing with Apache Flink
confluent
ย 
PDF
Unlocking the Power of IoT: A comprehensive approach to real-time insights
confluent
ย 
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
ย 
Migration, backup and restore made easy using Kannika
confluent
ย 
Five Things You Need to Know About Data Streaming in 2025
confluent
ย 
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
ย 
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
ย 
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
ย 
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
ย 
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
ย 
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
ย 
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
ย 
Unlocking value with event-driven architecture by Confluent
confluent
ย 
Il Data Streaming per unโ€™AI real-time di nuova generazione
confluent
ย 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
ย 
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
ย 
Building API data products on top of your real-time data infrastructure
confluent
ย 
Speed Wins: From Kafka to APIs in Minutes
confluent
ย 
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
ย 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
confluent
ย 
Santander Stream Processing with Apache Flink
confluent
ย 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
confluent
ย 
Ad

Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)

  • 1. Welcome to the Flink SQL Hands-on Workshop
  • 2. Hands-on Workshop: Stream Processing Made Easy with Flink SQL on Confluent Cloud June 17, Seoul
  • 3. HyunSoo Kim Senior Solutions Engineer Confluent Korea Junhee Shin Solutions Engineer Confluent Korea Todayโ€™s Hosts and Speakers Jupil Hwang Senior Solutions Engineer Confluent Korea
  • 4. 13:30 14:00 14:30 15:30 15:40 16:40 17:00 Check-in, Setup (On-site) Intro: What is โ€œShift Leftโ€? Hands-on Part 1: Getting Started with Flink Break Hands-on Part 2: Advanced Features of Flink Tea Time Close the Door 4 Agenda - Workshop
  • 5. ๋งŒ์•ฝ Confluent Cloud ๊ณ„์ •์ด ์—†๋‹ค๋ฉด, Confluent Cloud ๊ณ„์ •์€ ์—ฌ๊ธฐ์—์„œ ์ƒ์„ฑํ•˜์„ธ์š”: https://www.confluent.io/get-started/ Remember? Prerequisites?
  • 6. AWS/Azure/GCP ์—์„œ ๊ตฌ๋™ํ•˜๋Š” Confluent Cloud ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. โ— Schema Registry๊ฐ€ ํ™œ์„ฑํ™”๋œ ํ™˜๊ฒฝ(Environment)์—์„œ โ— ์—ฌ๋Ÿฌ ๊ฐœ์˜ Kafka Topic๋“ค์ด ์กด์žฌํ•˜๊ณ  โ— Flink๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ฐ€์ƒ์˜ Data๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด, Terraform ์Šคํฌ๋ฆฝํŠธ๊ฐ€ ์ œ๊ณต๋˜๋ฉฐ ์‹ค์Šต์— ํ•„์š”ํ•œ ๊ธฐ๋ณธ ์…‹ํŒ…์€ Terraform์œผ๋กœ ์ž๋™ ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค. Remember? Prerequisites?(๊ณ„์†)
  • 7. Workshop์€ ์—ฌ๊ธฐ์—์„œ ์‹œ์ž‘ํ•˜์„ธ์š”: โ€œrepo_urlโ€ https://github.com/confluentinc/confluent-cloud-flink-workshop https://buly.kr/GksQXW8 ์›Œํฌ์ˆ์ด ๋๋‚œ ํ›„์—๋Š” Cluster, Flink Pool ๋“ฑ Confluent Cloud ๋ฆฌ์†Œ์Šค๋ฅผ ์ •๋ฆฌํ•˜๋Š” ๊ฒƒ์„ ์žŠ์ง€ ๋งˆ์„ธ์š”. (Prerequisites ๋งค๋‰ด์–ผ์— ์‚ญ์ œํ•˜๋Š” ๋ช…๋ น์–ด๊ฐ€ ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค!) Remember? Prerequisites?(๊ณ„์†)
  • 9. Cloud resource management API Keys Name ๋ฐ Description ์ž…๋ ฅ ํ›„ ์ƒ์„ฑ ์ƒ์„ฑ ํ›„ ๋‹ค์šด๋กœ๋“œ ๋ฐ›์œผ์„ธ์š”.
  • 10. Cloud resource management API Keys & Terraform ์˜ˆ) terraform.tfvars ์˜ˆ์‹œ
  • 13. 13 โ€ฆ.ํ•˜์ง€๋งŒ ๋ฐ์ดํ„ฐ ์ŠคํŠธ๋ฆฌ๋ฐ ํ”Œ๋žซํผ์ด ์—†์œผ๋ฉด ์ž˜๋ชป๋œ ๋ฐ์ดํ„ฐ๊ฐ€ ์กฐ์ง ์ „์ฒด์— ํผ์ ธ ๋‚˜๊ฐ‘๋‹ˆ๋‹ค ๋งˆ์น˜ ํ˜ธ์ˆ˜ ์œ„์— ์žˆ๋Š” ์ง‘์— ์ง„ํ™ ๋ฐœ์ž๊ตญ์„ ๋‚จ๊ธฐ๋Š” ๊ฒƒ๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€์ฃ ! Data Warehouse Data Lake โ€œLakehouseโ€ Scalable and high performance for queries and historical analyses Scalable and flexible for storing unstructured data Combines the advantages of DWH and DL
  • 14. ์˜ค๋Š˜๋‚ ์˜ ๋ฐ์ดํ„ฐ ํŒŒ์ดํ”„๋ผ์ธ ์ ‘๊ทผ ๋ฐฉ์‹์€ ๋ฐ์ดํ„ฐ ๋ฌธ์ œ์˜ ๊ทผ๋ณธ ์›์ธ์ž…๋‹ˆ๋‹ค Domain 1 Database Domain 2 Database Domain 3 Database Data Lake Lake House Data Mart Data Warehouse ML/AI Reports & Dashboards Domain 4 Database OPERATIONAL SYSTEMS ETL/ELT PIPELINES ANALYTICAL SYSTEMS
  • 15. DATA WAREHOUSE / DATA LAKE ML/AI Dashboards OPERATIONAL DATA Poor decision making with stale data 5 / 30 / 60 min batch ingestion Poor lineage and governance and increasing pipeline sprawl Cascading data pollution and failures Time Batch 1 Process Batch 2 Process Batch 3 Process Batch 4 Process Time Batch 1 Process Batch 2 Process Batch 3 Process Batch 4 Process Time Batch 1 Process Batch 2 Process Batch 3 Process Batch 4 Process Time Batch 1 Process Batch 2 Process Batch 3 Process Batch 4 Process Complex remodelling and reprocessing = $$$ โ€˜JUST-ENOUGHโ€™ CLEANSED DATA READY-TO-USE BUSINESS DATA RAW DATA DUMPS ANIMATED SLIDE Reports ELT ํŒŒ์ดํ”„๋ผ์ธ์€ ์ทจ์•ฝํ•˜๊ณ  ๋А๋ฆฌ๋ฉฐ ๋น„ํšจ์œจ์ ์ž…๋‹ˆ๋‹ค
  • 16. Domain 1 Database Domain 2 Database Domain 3 Database Data Lake Lake House Data Mart Data Warehouse ML/AI Reports & Dashboards Domain 4 Database OPERATIONAL SYSTEMS ETL/ELT PIPELINES ANALYTICAL SYSTEMS REVERSE ETL More batch tools are bolted on to reverse the flow of data โ€“ from data warehouses and data lakes back to operational systems and apps โ€“ for โ€œreal-timeโ€ use cases ์ตœ์‹  ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์—์„œ๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ โ€˜Upstream'๋กœ ํ๋ฅด๋„๋ก ํ•ด์•ผ ํ•˜๋Š” ๊ฒฝ์šฐ๋„ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค
  • 17. Batch Data Pipeline์˜ ๋ฌธ์ œ ๋ถˆ๋Ÿ‰ ๋ฐ์ดํ„ฐ ๋ฐ ์ˆ˜๋™ ๊ณ ์žฅ ์ˆ˜๋ฆฌ ๋ถ€์‹คํ•œ ๊ฑฐ๋ฒ„๋„Œ์Šค ๋ฐ ๋ฐ์ดํ„ฐ ์ ‘๊ทผ์„ฑ ์ค‘์•™ ์ง‘์ค‘์‹ ๊ณ ์ • ์ค‘๋ณต๋˜๊ณ  ๋น„์šฉ์ด ๋งŽ์ด ๋“œ๋Š” ์ฒ˜๋ฆฌ ์˜ค๋ž˜๋˜๊ณ  ์‹ ๋ขฐํ•  ์ˆ˜ ์—†๋Š” ๋ฐ์ดํ„ฐ Batch ์ˆ˜์ง‘ ๋ฐ ๋Œ€์ƒ์ง€์—์„œ์˜ ์ค‘๋ณต ์ฒ˜๋ฆฌ๋กœ ์ธํ•ด ๋ฐ์ดํ„ฐ ์ถฉ์‹ค๋„ ๋ฐ ๊ฑฐ๋ฒ„๋„Œ์Šค ๋ฌธ์ œ๊ฐ€ ์žˆ๋Š” ๊ฑฐ๋Œ€ํ•œ Point-to-point ์—ฐ๊ฒฐ ํ˜ผ๋ž€ Operational Databases ELT ETL Raw Cleansed Business- ready Raw Cleansed Data Warehouse / Data Lake rETL rETL ML/AI Reports & Dashboards
  • 18. ๋ฐ์ดํ„ฐ ์ŠคํŠธ๋ฆฌ๋ฐ์„ ํ†ตํ•ด ์‹ค์‹œ๊ฐ„ ๋ฐ ์•ˆ์ •์ ์ธ ๋ฐ์ดํ„ฐ ํŒŒ์ดํ”„๋ผ์ธ์„ ๊ตฌ์ถ•ํ•˜์„ธ์š” Operational Databases Business- ready Data Warehouse / Data Lake ๋ฐ์ดํ„ฐ๋ฅผ ํ•œ ๋ฒˆ ๊ตฌ์ถ•ํ•˜๊ณ  ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋งŒ๋“ค์–ด ์–ด๋””์„œ๋“  ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ์ƒ์„ฑ ํ›„ ๋ฐ€๋ฆฌ์ดˆ ์ด๋‚ด์— ์†Œ์Šค์—์„œ ์ฒ˜๋ฆฌ ๋ฐ ๊ฑฐ๋ฒ„๋„Œ์Šค๋ฅผ ์ „ํ™˜ํ•˜์„ธ์š”. PROCESS GOVERN STREAM Universal Data Products Operational Databases, SaaS Apps, Custom Apps, AI Systemsโ€ฆ Cleansed Microservices ML/AI Reports & Dashboards Cleansed ๋ฐ์ดํ„ฐ ์ •๋ฆฌ ๋ฐ ์œ ์ง€ ๊ด€๋ฆฌ ๊ฐ์†Œ ๋”์šฑ ๊ฐ•ํ™”๋œ ๋ฐ์ดํ„ฐ ์‹ ๋ขฐ ๋ฐ ์ž์œจ์„ฑ ํšจ์œจ์ ์ธ ์ฒ˜๋ฆฌ ๋ฐ ๋‚ฎ์€ ์ง€์ถœ ์œ ๋น„์ฟผํ„ฐ์Šค ๋ฐ์ดํ„ฐ ํ๋ฆ„ ์‹ค์‹œ๊ฐ„ ๋ฐ์ดํ„ฐ ์›จ์–ดํ•˜์šฐ์ง• CONNECT CONNECT CONNECT
  • 19. Confluent ๋ฐ์ดํ„ฐ ์ŠคํŠธ๋ฆฌ๋ฐ ํ”Œ๋žซํผ์˜ ์žฅ์  Streaming Continuously capture and share real-time data everywhere - to your data warehouse, data lake and operational systems and apps Schema Management Reduce faulty data downstream by enforcing quality checks and controls in the pipeline with data contracts Flink Continuously process real-time data, the moment itโ€™s created, for well- curated reusable data products Data Portal Enable anyone with the right access controls to effortlessly explore and use real-time data products for greater data autonomy Tableflow Simplify representing your operational data as a ready-to-use Iceberg table in just one-click Stream Lineage Understand the complex data relationships and the data journey to ensure trustworthiness Focus of todayโ€™s session
  • 20. How Shift Left Works ๋ฐ์ดํ„ฐ๋ฅผ ํ•œ ๋ฒˆ ์“ฐ๊ณ  ์ŠคํŠธ๋ฆผ์ด๋‚˜ ํ…Œ์ด๋ธ”๋กœ ์ฝ์–ด๋ณด์„ธ์š” Stream processing (Focus of todayโ€™s session) Data Stream Data Product Schema Registry Tableflow (Iceberg) Third Party Compute Engines Databases Log data & messaging systems Custom Apps & Microservices Operational Apps & Data Systems Stream (Kafka) Event-Driven Design Decoupled Architecture Connect Connect Connect Data Warehouses / Data Lakes Stream (Kafka) COMING SOON READ AS READ AS Stream Lineage Stream Catalog Data Portal Immutable Logs Enterprise Resource Planning systems Connect
  • 21. Reduce DWH / DL costs by ingesting data from operational systems and apps, attaching schema, and processing it with Flink, in order to share high- quality streams to analytics systems (e.g., SNOW, DBricks) in real time Continuously analyze and update results as data streams are produced for real-time dashboarding via a RT analytics DB (e.g., Druid, Rockset, Pinot) โ— Ad/campaign performance โ— Content performance โ— Quality monitoring of Telco networks โ— Large-scale graph analysis Analyze data streams over time windows to detect patterns and react to incoming events by triggering computations, state updates, or external actions (i.e., microservices) Description Sample Use Cases (Technical and Business) Category Real-time Analytics โ— Real-time search index building โ— ML pipelines โ— Data warehouse modernization โ— Database modernization โ— Data lake ingestion โ— Reporting and analytics Data Pipelines (โ€œShift Leftโ€) Event Driven Applications โ— Fraud detection โ— Anomaly detection โ— Alerting/notifications โ— Routing โ— Business process monitoring โ— Bad experience detection ์ŠคํŠธ๋ฆผ ์ฒ˜๋ฆฌ๋ฅผ ํ†ตํ•ด ๋น„์ฆˆ๋‹ˆ์Šค ๊ฐ€์น˜์™€ ๊ด€๋ จ๋œ ๊ด‘๋ฒ”์œ„ํ•œ ์‚ฌ์šฉ ์‚ฌ๋ก€๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค
  • 22. Kafka Streams ksqlDB Kafka ecosystem Client library deployed to Java Runtime. Self hosted. Input and output data are stored in single Kafka cluster. Java Open Source Standalone SQL engine built on top of Kafka Streams. Input and output data are stored in single Kafka cluster SQL Community Source Flink Flink Framework and distributed engine for stateful computations over unbounded and bounded data streams Java, Python, SQL Open Source 22 Kafka๋ฅผ ์œ„ํ•œ ์ŠคํŠธ๋ฆผ ์ฒ˜๋ฆฌ(Stream Processing)
  • 23. Real-time Data A Sale A Shipment A Trade A Customer Experience Real-Time Backend Operations Real-time Stream Processing ์‹ค์‹œ๊ฐ„ ์„œ๋น„์Šค๋Š” ์ŠคํŠธ๋ฆผ ์ฒ˜๋ฆฌ์— ์˜์กดํ•ฉ๋‹ˆ๋‹ค
  • 24. DATA IN MOTION ์ŠคํŠธ๋ฆฌ๋ฐ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ Apache Flink Apache Kafka DATA AT REST ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋ ˆ์ด์–ด ์ปดํ“จํŒ… ๋ ˆ์ด์–ด ์Šคํ† ๋ฆฌ์ง€ ๋ ˆ์ด์–ด ์ „ํ†ต์ ์ธ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ํŒŒ์ผ ์‹œ์Šคํ…œ ์›น ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ์ŠคํŠธ๋ฆผ ์ฒ˜๋ฆฌ๋Š” Kafka์˜ ์ปดํ“จํŒ… ๊ณ„์ธต ์—ญํ• ์„ ํ•˜์—ฌ ์‹ค์‹œ๊ฐ„ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋ฐ ํŒŒ์ดํ”„๋ผ์ธ์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค
  • 25. ๋””์ง€ํ„ธ ๋„ค์ดํ‹ฐ๋ธŒ ๊ธฐ์—…์€ Flink๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์‹œ์žฅ์„ ํ˜์‹ ํ•˜๊ณ  ๊ฒฝ์Ÿ ์šฐ์œ„๋ฅผ ํ™•๋ณดํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค UBER: ์‹ค์‹œ๊ฐ„ ๊ฐ€๊ฒฉ NETFLIX: ๋งž์ถคํ˜• ์ถ”์ฒœ STRIPE: ์‹ค์‹œ๊ฐ„ ์‚ฌ๊ธฐ ํƒ์ง€
  • 26. Scalability and Performance Fault Tolerance Flink๋Š” ์ƒ์œ„ 5๊ฐœ Apache ํ”„๋กœ์ ํŠธ ์ค‘ ํ•˜๋‚˜์ด๋ฉฐ ๊ฐ•๋ ฅํ•œ ๊ฐœ๋ฐœ์ž ์ปค๋ฎค๋‹ˆํ‹ฐ๋ฅผ ๋ณด์œ ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. Unified Processing Flink๋Š” ์—„์ฒญ๋‚œ ๊ทœ๋ชจ์˜ ์ŠคํŠธ๋ฆผ ์ฒ˜๋ฆฌ ์›Œํฌ๋กœ๋“œ๋ฅผ ์ง€์›ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Language Flexibility Flink์˜ ๋‚ด๊ฒฐํ•จ์„ฑ ๋ฉ”์ปค๋‹ˆ์ฆ˜์€ ์žฅ์• ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•˜๊ณ  ๊ณ ๊ฐ€์šฉ์„ฑ์„ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ๋„๋ก ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค. Flink๋Š” 150๊ฐœ ์ด์ƒ์˜ ๋‚ด์žฅ ๊ธฐ๋Šฅ์„ ํ†ตํ•ด Java, Python ๋ฐ SQL์„ ์ง€์›ํ•˜๋ฏ€๋กœ ๊ฐœ๋ฐœ์ž๋Š” ์›ํ•˜๋Š” ์–ธ์–ด๋กœ ์ž‘์—…ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Flink๋Š” ํ•˜๋‚˜์˜ ๊ธฐ์ˆ ์„ ํ†ตํ•ด ์ŠคํŠธ๋ฆผ ์ฒ˜๋ฆฌ(stream processing), ์ผ๊ด„ ์ฒ˜๋ฆฌ(batch processing) ๋ฐ ์ž„์‹œ ๋ถ„์„(ad-hoc analytics)์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ๋งŽ์€ ์กฐ์ง๋“ค์€ ์„ฑ๋Šฅ๊ณผ ํ’๋ถ€ํ•œ ๊ธฐ๋Šฅ ์„ธํŠธ ๋•Œ๋ฌธ์— Flink๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค
  • 27. 0 50,000 100,000 150,000 2020 2021 2022 2016 2017 2018 Flink Kafka Two Apache Projects, Born a Few Years Apart Monthly Unique Users Flink์˜ ์„ฑ์žฅ์€ ์ŠคํŠธ๋ฆฌ๋ฐ ๋ฐ์ดํ„ฐ์˜ ์‚ฌ์‹ค์ƒ ํ‘œ์ค€์ธ Kafka์˜ ์„ฑ์žฅ์„ ๋ฐ˜์˜ํ•ฉ๋‹ˆ๋‹ค Fortune 500๋Œ€ ๊ธฐ์—… ์ค‘ 75% ์ด์ƒ์ด Kafka๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์ถ”์ • Kafka๋ฅผ ์‚ฌ์šฉํ•˜๋Š” 100,000๊ฐœ ์ด์ƒ์˜ ์กฐ์ง 41,000๋ช… ์ด์ƒ์˜ Kafka ๋ชจ์ž„ ์ฐธ์„์ž 750๊ฐœ ์ด์ƒ์˜ Kafka ๊ฐœ์„  ์ œ์•ˆ(KIP) Apache Kafka๋ฅผ ์œ„ํ•œ 12,000๊ฐœ ์ด์ƒ์˜ Jiras
  • 29. Confluent Cloud for Apache Flinkยฎ ๊ฐ„ํŽธํ•œ ์„œ๋ฒ„๋ฆฌ์Šค ์ŠคํŠธ๋ฆผ ์ฒ˜๋ฆฌ ์—…๊ณ„ ์œ ์ผ์˜ ํด๋ผ์šฐ๋“œ ๋„ค์ดํ‹ฐ๋ธŒ ์„œ๋ฒ„๋ฆฌ์Šค Flink ์„œ๋น„์Šค๋กœ ๊ณ ํ’ˆ์งˆ์˜ ์žฌ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋ฐ์ดํ„ฐ ์ŠคํŠธ๋ฆผ์„ ์†์‰ฝ๊ฒŒ ๊ตฌ์ถ•ํ•˜์„ธ์š” ์ŠคํŠธ๋ฆผ ์ฒ˜๋ฆฌ์˜ ์‚ฌ์‹ค์ƒ ํ‘œ์ค€์ธ Flink๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์ŠคํŠธ๋ฆผ์„ ์†์‰ฝ๊ฒŒ ํ•„ํ„ฐ๋ง, ๋ถ„์„ ๋ฐ ํ’๋ถ€ํ•˜๊ฒŒ ๋งŒ๋“œ์‹ญ์‹œ์˜ค ์ธํ”„๋ผ ๊ด€๋ฆฌ์˜ ๋ณต์žก์„ฑ ์—†์ด ๋ชจ๋“  ๊ทœ๋ชจ์—์„œ ๊ณ ์„ฑ๋Šฅ ๋ฐ ํšจ์œจ์ ์ธ ์ŠคํŠธ๋ฆผ ์ฒ˜๋ฆฌ๋ฅผ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค ์™„๋ฒฝํ•˜๊ฒŒ ํ†ตํ•ฉ๋œ ๋ชจ๋‹ˆํ„ฐ๋ง, ๋ณด์•ˆ ๋ฐ ๊ฑฐ๋ฒ„๋„Œ์Šค๋ฅผ ๊ฐ–์ถ˜ ํ†ตํ•ฉ ํ”Œ๋žซํผ์œผ๋กœ Kafka์™€ Flink๋ฅผ ๊ฒฝํ—˜ํ•ด ๋ณด์„ธ์š” Now available on all 3 clouds
  • 30. Lab์—์„œ๋Š” ์ œ3์ž ๋ฆฌ์…€๋Ÿฌ๋ฅผ ์œ„ํ•œ ๋ฐ์ดํ„ฐ ์ œํ’ˆ์„ ๋งŒ๋“ญ๋‹ˆ๋‹ค โ— ์ด lab์€ Amazon๊ณผ Walmart์™€ ๊ฐ™์€ ์œ ๋ช… ๊ณต๊ธ‰์—…์ฒด์˜ ์ œํ’ˆ์„ ์ œ๊ณตํ•˜๋Š” ํƒ€์‚ฌ ๋ฆฌ์…€๋Ÿฌ์— ์ค‘์ ์„ ๋‘ก๋‹ˆ๋‹ค. โ—‹ ์ฒซ ๋ฒˆ์งธ lab์—์„œ๋Š” ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ ์†Œ์Šค๋ฅผ ์‚ดํŽด๋ณด๊ณ  ๋ฐ์ดํ„ฐ๋ฅผ ์ง‘๊ณ„ํ•˜๊ธฐ ์œ„ํ•œ ์ž„์‹œ(ad-hoc) ์ฟผ๋ฆฌ๋ฅผ ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค. โ—‹ ๋‘ ๋ฒˆ์งธ lab์—์„œ๋Š” Flink์˜ ๊ณ ๊ธ‰ ๊ธฐ๋Šฅ์„ ์ž์„ธํžˆ ์‚ดํŽด๋ด…๋‹ˆ๋‹ค. ๋‹ค์–‘ํ•œ ์œ ํ˜•์˜ ์กฐ์ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ช‡ ๊ฐ€์ง€ ๋ฐ์ดํ„ฐ ์ œํ’ˆ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๋จผ์ €, ์œ ํšจํ•˜์ง€ ์•Š์€ ๊ฒฐ์ œ ๋‚ด์—ญ์ด ์žˆ๋Š” ์ฃผ๋ฌธ์„ ํ•„ํ„ฐ๋งํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ ๊ณ ๊ฐ ํ”„๋กœ๋ชจ์…˜ ๋ฐ ์ถฉ์„ฑ๋„ ๋ ˆ๋ฒจ์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. โ— The Architecture โ—‹ ์ด lab์€ ๋ชจ๋‘ Confluent Cloud์—์„œ ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค. โ—‹ ๋ฐ์ดํ„ฐ๋Š” ์ œํ’ˆ, ๊ณ ๊ฐ, ์ฃผ๋ฌธ, ๊ฒฐ์ œ๋ผ๋Š” ๋„ค ๊ฐœ์˜ ์ŠคํŠธ๋ฆฌ๋ฐ ํ…Œ์ด๋ธ”์— ์‹ค์‹œ๊ฐ„์œผ๋กœ ์ž…๋ ฅ๋ฉ๋‹ˆ๋‹ค.
  • 33. Step 2: Create valid_orders table with Flink Joins
  • 34. Step 3: Data Enrichment
  • 35. Step 4: Promotions Calculation
  • 36. Step 5: Data pipeline observability
  • 37. Step 6: Loyalty Levels Calculation
  • 38. ์‚ฌ์šฉํ•  ๋„๊ตฌ๋“ค: Console Workspace, Shell, Monitoring 39 Cloud Console Workspace Flink Shell Flink Monitoring
  • 39. ์šด์˜ : Autoscale, Increase without Downtime โ— Autoscale within CFUs โ— Increase CFUs without downtime โ— Delete Pool(s)
  • 40. Flink์—์„œ ์ŠคํŠธ๋ฆฌ๋ฐ ๋ฐ์ดํ„ฐ๊ฐ€ ์–ด๋–ป๊ฒŒ ๋ณ€ํ™˜๋˜๊ณ  ์ฒ˜๋ฆฌ๋˜๋Š”์ง€ ์‹œ๊ฐ์ ์œผ๋กœ ์ถ”์  ๋ฐ์ดํ„ฐ ํ๋ฆ„์„ ์‹œ๊ฐํ™”ํ•˜๊ณ  ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•œ ๋„๊ตฌ์ธ Stream Lineage๋Š” Flink์™€ ํ†ตํ•ฉ๋˜์–ด Flink ์ฟผ๋ฆฌ๋ฅผ ํ†ตํ•ด ํ๋ฅด๋Š” ๋ฉ”์‹œ์ง€์˜ ๊ณ„๋ณด๋ฅผ ์บก์ฒ˜ํ•˜๊ณ  ํ‘œ์‹œํ•˜์—ฌ ์‚ฌ์šฉ์ž๊ฐ€ ๋‹ค์Œ์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค: โ— ๊ฐ ๋ฐ์ดํ„ฐ ํ๋ฆ„์ด ์–ด๋””์—์„œ ์‹œ์ž‘๋˜์—ˆ๋Š”์ง€ ์ถ”์ ํ•ฉ๋‹ˆ๋‹ค. โ— Flink๊ฐ€ ๋ฐ์ดํ„ฐ ํ๋ฆ„์„ ์–ด๋–ป๊ฒŒ ๋ณ€ํ™˜ํ•˜๋Š”์ง€ ์ถ”์ ํ•ฉ๋‹ˆ๋‹ค. โ— ๊ฐ ๋ฐ์ดํ„ฐ ํ๋ฆ„์ด ์–ด๋””์—์„œ ๋๋‚˜๋Š”์ง€ ๊ด€์ฐฐํ•ฉ๋‹ˆ๋‹ค. Inspect a Flink query Track how data flows to and from Flink
  • 41. Recap
  • 42. ์‹œ๊ฐ„ ๊ฒฝ๊ณผ์— ๋”ฐ๋ฅธ ์ฒ˜๋ฆฌ๋Ÿ‰ ์šฉ๋Ÿ‰ ์ˆ˜์š” ๋ฆฌ์†Œ์Šค ํ™œ์šฉ๋ฅ  ๊ทน๋Œ€ํ™” ๋ฐ ๊ณผ๋„ํ•œ ํ”„๋กœ๋น„์ €๋‹ ๋ฐฉ์ง€ ๋ณ€ํ™”ํ•˜๋Š” ๋น„์ฆˆ๋‹ˆ์Šค ์š”๊ตฌ ์‚ฌํ•ญ์— ๋งž๊ฒŒ ํƒ„๋ ฅ์ ์œผ๋กœ ํ™•์žฅ ๊ฐ€์žฅ ๋ณต์žกํ•œ ์›Œํฌ๋กœ๋“œ์˜ ์š”๊ตฌ ์‚ฌํ•ญ์„ ์ถฉ์กฑํ•˜๋„๋ก ์ž๋™์œผ๋กœ ํ™•์žฅ ๋˜๋Š” ์ถ•์†Œ โ€ข ํ™œ์šฉ๋„๊ฐ€ ๋‚ฎ์€ ์ธํ”„๋ผ ๋ฆฌ์†Œ์Šค ๋ฐฉ์ง€ โ€ข ๊ทœ๋ชจ๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฐ€๊ฒฉ ์ •์ฑ…์œผ๋กœ ์‚ฌ์šฉํ•œ ๋ฆฌ์†Œ์Šค์— ๋Œ€ํ•ด์„œ๋งŒ ๋น„์šฉ ์ง€๋ถˆ 0 CFU MAX CFUs
  • 43. ๋ธŒ๋ผ์šฐ์ € ๊ธฐ๋ฐ˜ SQL ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ Kafka ๋ฐ์ดํ„ฐ ํƒ์ƒ‰ ๋ฐ ์ฟผ๋ฆฌ SQL Workspaces๋Š” Flink SQL์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋“  Confluent Cloud ๋ฐ์ดํ„ฐ๋ฅผ ๋™์ ์œผ๋กœ ํƒ์ƒ‰ํ•˜๊ณ  ์ƒํ˜ธ ์ž‘์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์ง๊ด€์ ์ด๊ณ  ์œ ์—ฐํ•œ UI๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. โ— ์ฟผ๋ฆฌ๋ฅผ ์ €์žฅํ•˜์—ฌ ๋‚˜์ค‘์— ๋‹ค์‹œ ๊ฒ€ํ† ํ•˜๊ณ  ์ž‘์—…ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. โ— ๋‹จ์ผ ๋ทฐ์—์„œ ์—ฌ๋Ÿฌ ์‹ค์‹œ๊ฐ„ ์ฟผ๋ฆฌ๋ฅผ ๋™์‹œ์— ์‹คํ–‰ํ•˜์„ธ์š”. โ— SQL ์ค‘์‹ฌ ๊ด€์ ์—์„œ ํ™˜๊ฒฝ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋ฅผ ๊ฒ€์‚ฌํ•˜์„ธ์š”.
  • 44. ๋Œ€ํ™”ํ˜• ํ…Œ์ด๋ธ”์„ ์‚ฌ์šฉํ•˜์—ฌ ์ฟผ๋ฆฌ ๊ฒฐ๊ณผ๋ฅผ ํƒ์ƒ‰ํ•˜๊ณ  ์‹œ๊ฐํ™” Flink SQL Workspaces์šฉ ๋Œ€ํ™”ํ˜• ํ…Œ์ด๋ธ”์„ ์‚ฌ์šฉํ•˜๋ฉด ๊ฐ ์ฟผ๋ฆฌ์˜ ์ถœ๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ์Šค์บ”, ๋ถ„์„ ๋ฐ ํ”„๋กœํŒŒ์ผ๋งํ•˜์—ฌ ๊ฐœ๋ฐœ ๋ฐ ๋ฌธ์ œ ํ•ด๊ฒฐ ํ”„๋กœ์„ธ์Šค๋ฅผ ๊ฐ„์†Œํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. โ— ๋ฐ์ดํ„ฐ ํƒ์ƒ‰ ๋ฐ ํ”„๋กœํŒŒ์ผ๋ง ๊ฐ„์†Œํ™” โ— ๋ฐ์ดํ„ฐ ์ถ”์„ธ ๋ฐ ๋ถ„ํฌ์— ๋Œ€ํ•œ ์ฆ‰๊ฐ์ ์ธ ์ธ์‚ฌ์ดํŠธ ํ™•๋ณด โ— ๋ฌธ์ œ ํ•ด๊ฒฐ ๋ฐ ๋ชจ๋‹ˆํ„ฐ๋ง ๊ฐ•ํ™”
  • 45. Actions๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ผ๋ฐ˜์ ์ธ ์‚ฌ์šฉ ์‚ฌ๋ก€์— ๋Œ€ํ•œ ์ŠคํŠธ๋ฆผ ์ฒ˜๋ฆฌ ์ž‘์—…์˜ ๋ฐฐํฌ๋ฅผ ๊ฐ„์†Œํ™”ํ•ฉ๋‹ˆ๋‹ค Deduplicate topic Generate a topic containing only unique records from an input topic Mask fields Generate a topic containing masked fields from an input topic Filter topic Filter a topic based on a given set of conditions Apply a transformation Transform a topic based on a set of provided expressions COMING SOON Actions provide pre-packaged, turnkey stream processing workloads that run on Flink
  • 46. ์‚ฌ์šฉ์ž ์ •์˜ ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ Flink SQL ๊ธฐ๋Šฅ ํ™•์žฅ ์‚ฌ์šฉ์ž ์ •์˜ ํ•จ์ˆ˜(UDF)๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด Flink SQL์—์„œ ๊ธฐ๋ณธ์ ์œผ๋กœ ์ง€์›๋˜์ง€ ์•Š๋Š” ๋ณต์žกํ•œ ๋…ผ๋ฆฌ๋ฅผ ๊ตฌํ˜„ํ•˜๊ธฐ ์œ„ํ•œ ์‚ฌ์šฉ์ž ์ •์˜ ํ•จ์ˆ˜๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. โ— ํŠน์ • ์‚ฌ์šฉ ์‚ฌ๋ก€์— ๋งž์ถฐ ์ฒ˜๋ฆฌ โ— ์—ฌ๋Ÿฌ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์—์„œ ์žฌ์‚ฌ์šฉ โ— ์„ ํ˜ธํ•˜๋Š” ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์–ธ์–ด๋กœ ์ž‘์—… Java UDF SQL query EARLY ACCESS NOTE: Early Access is open to a limited number of candidates. Only Java and scalar functions are supported initially. Python support planned for 2H โ€˜24. UDF arguments UDF result
  • 47. Table API(Open Preview)๋Š” Java ๋˜๋Š” Python์—์„œ ํ”„๋กœ๊ทธ๋ž˜๋ฐ ๋ฐฉ์‹์˜ ์ œ์–ด๋ฅผ ์ œ๊ณตํ•˜์—ฌ ๊ธฐ์กด ์ฝ”๋“œ๋ฒ ์ด์Šค์— ์›ํ™œํ•˜๊ฒŒ ํ†ตํ•ฉํ•  ์ˆ˜ ์žˆ๋Š” ํ’๋ถ€ํ•œ ์—ฐ์‚ฐ ๋ฐ ๋ณ€ํ™˜ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. โ— ์ต์ˆ™ํ•œ ๊ตฌ์กฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ŠคํŠธ๋ฆฌ๋ฐ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ๊ตฌ์ถ•ํ•ฉ๋‹ˆ๋‹ค. โ— ๋ช…๋ นํ˜• ํ”„๋กœ๊ทธ๋ž˜๋ฐ ๋ฐฉ์‹์„ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. โ— ๊ตฌ์กฐ์ ์ด๊ณ  ๊ฐ•๋ ฅํ•œ ํ˜•์‹์˜ ์„ค๊ณ„๋ฅผ ํ†ตํ•ด ๊ฐœ๋ฐœ, ํ…Œ์ŠคํŠธ ๋ฐ ์œ ์ง€ ๊ด€๋ฆฌ๋ฅผ ๊ฐ„์†Œํ™”ํ•ฉ๋‹ˆ๋‹ค. Table API ์ง€์›์„ ํ†ตํ•ด ์„œ๋ฒ„๋ฆฌ์Šค Flink ์ ‘๊ทผ์„ฑ ํ™•๋Œ€ Track status of Table API statements Use full capabilities of modern IDEs
  • 48. ๊ณ ๊ธ‰ SQL ์ŠคํŠธ๋ฆฌ๋ฐ ์—ฐ์‚ฐ์ž 51 Time Windows Pattern Matching Streaming Joins โ— Time-based windows โ— Event-density windows โ— Event-based windows: every single event can trigger a new window โ— Complex Event Processing โ— See sample โ— Stream-to-stream joins โ— Temporal joins โ— Lookup joins โ— Versioned joins
  • 49. Fully integrated out of the box โ— Connected via Confluent Connector โ— Environments are Catalogs โ— Kafka Clusters as Databases โ— Topics are Tables โ— RBAC for managing flink Resources โ—‹ Keep in mind: A statementโ€™s access level is determined entirely by the permissions that you attach to the statement โ— Schema Registry, Data Portal, Lineage, Consumer/Producer Monitoring, Metric APIโ€ฆ โ— Cluster and Pool need to be in the same region and same CSP โ— All over the Confluent Organisation including all environments and clusters Flink๋Š” Confluent Cloud์™€ ์™„๋ฒฝํ•˜๊ฒŒ ํ†ตํ•ฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค
  • 50. AWS์˜ ์ „์šฉ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์œ„ํ•œ Flink ํ”„๋ผ์ด๋น— ๋„คํŠธ์›Œํ‚น Flink์— ๋Œ€ํ•œ ๊ฐœ์ธ ๋„คํŠธ์›Œํ‚น ์ง€์›์„ ํ†ตํ•ด Confluent ์‚ฌ์šฉ์ž๋Š” ๋‹ค์Œ์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. โ— ๋ฐ์ดํ„ฐ ๋ณด์•ˆ ๋ฐ ๊ฐœ์ธ์ •๋ณด ๋ณดํ˜ธ ๊ฐ•ํ™” โ— ๋ณด์•ˆ ๋„คํŠธ์›Œํฌ ๊ตฌ์„ฑ ๊ฐ„์†Œํ™” โ— ํด๋Ÿฌ์Šคํ„ฐ ๋ฐ ํ™˜๊ฒฝ ์ „๋ฐ˜์—์„œ ์•ˆ์ „ํ•˜๊ณ  ์œ ์—ฐํ•œ ์ŠคํŠธ๋ฆผ ์ฒ˜๋ฆฌ ์ง€์› Env A Env A Env B Env B PUBLIC PLATT Internet Private Link (AWS) Private Cluster (Dedicated, Enterprise) Public Cluster (Dedicated, Standard, Basic) Private Cluster (Dedicated, Enterprise) Public Cluster (Dedicated, Standard, Basic) โ— No access to private clusters โ— No cross-env access โ— No egress to public clusters
  • 51. 55 Flink, KStreams ๋ฐ ksqlDB์˜ ์ฃผ์š” ์ฐจ์ด์  Attribute CP Flink CC Flink Kafka Streams ksqlDB Description Stream processing framework developed independent of Apache Kafka Embeddable client library for Java applications that is part of the Apache Kafka project Stream processing framework that exposes Kafka Streams functionality through SQL Processing modes โ— Unified stream and batch processing โ— Supports reads from multiple Kafka clusters โ— Stream processing only โ— Supports reads from single Kafka cluster โ— Stream processing only โ— Supports reads from single Kafka cluster Pricing โ— Restore state after failure from most recent incremental snapshot โ— Restore state after failure by replaying all messages โ— Restore state after failure by replaying all messages CFLT deployment model โ— Self-managed offering with Confluent Platform โ— Fully managed โ— No cluster deployment, scales to zero โ— Self-managed โ— Embeddable client library with no cluster โ— Fully managed and self- managed โ— Separate cluster deployment Language flexibility โ— Full support of all Flink APIs (SQL, Table API, DataStream, ProcessFunction) โ— ANSI-compliant SQL โ— Java UDFs EA โ— Table API Open preview โ— Java (more flexible than SQL, but more complex) โ— SQL syntax inspired by ANSI SQL We recommend Confluent Cloud for Apache Flink for all new cloud workloads