SlideShare a Scribd company logo
Kafka Multi-Tenancy - 160
Billion Daily Messages on One
Shared Cluster at LINE
Yuto Kawamura - LINE Corporation
Speaker introduction
Yuto Kawamura
Senior Software Engineer at LINE
Leading a team for providing a
company-wide Kafka platform
Apache Kafka Contributor
Speaker at Kafka Summit SF
2017 1
1
https://kafka-summit.org/sessions/single-data-hub-
services-feed-100-billion-messages-per-day/
LINE
Messaging service
164 million active users in
countries with top market share
like Japan, Taiwan, Thailand and
Indonesia.2
And many other services:
- News
- Bitbox/Bitmax -
Cryptocurrency trading
- LINE Pay - Digital payment
2
As of June 2018.
Kafka platform at LINE
Two main usages:
— "Data Hub" for distributing data to other services
— e.g: Users relationship update event from
messaging service
— As a task queue for buffering and processing business
logic asynchronously
Kafka platform at LINE
Single cluster is shared by many independent services
for:
- Concept of Data Hub
- Efficiency of management/operation
Messaging, AD, News, Blockchain and etc... all of their
data stored and distributed on single Kafka cluster.
From department-wide to company-wide platform
It was just for messaging service. Now everyone uses it.
Broker installation
CPU: Intel(R) Xeon(R) 2.20GHz x 20 cores (HT) * 2
Memory: 256GiB
- more memory, more caching (page cache)
- newly written data can survive only 20 minutes ...
Network: 10Gbps
Disk: HDD x 12 RAID 1+0
- saves maintenance costs
Kafka version: 0.10.2.1 ~ 0.11.1.2
Requirements doing multitenancy
Cluster can protect itself against abusing workloads
- Accidental workload doesn't propagates to other
users.
We can track on which client is sending requests
- Find source of strange requests.
Certain level of isolation among client workloads
- Slow response for one client doesn't appears to
another client.
Protect cluster against abusing workload - Request
Quota
It is more important to manage number of requests over
incoming/outgoing byte rate.
Kafka is amazingly durable for large data if they are well-batched.
=> Producers which configures linger.ms=0 with large number of
servers probably leads large amount of requests
Starting from 0.11.0.0, by KIP-124 we can configure request rate
quota 3
3
https://cwiki.apache.org/confluence/display/KAFKA/KIP-124+-+Request+rate+quotas
Protect cluster against abusing
workload - Request Quota
Basic idea is to apply default
quota for preventing single
abusing client destabilize the
cluster as a least protection.
*Not for controlling resource
quantity for each client.
Track on requests from clients - Metrics
— kafka.server:type=Request,user=([-.w]+),client-
id=([-.w]+):request-time
— Percentage of time spent in broker network and I/O
threads to process requests from each client
group.
— Useful to see how much of broker resource is being
consumed by each client.
Track on requests from clients -
Slowlog
Log requests which took longer
than certain threshold to
process.
- Kafka has "request logging"
but it leads too many of lines
- Inspired by HBase's
Thresholds can be changed
dynamically through JMX console
for each request type.
Isolation among client workloads
Let's give a look through the actual troubleshooting.
Detection
Detection: 50x ~ 100x slower
response time in 99th %ile
Produce response time.
Normal: ~20ms
Observed: 50ms ~ 200ms
Finding #1: Disk read
coincidence
Coincidence disk read of certain
amount.
Finding #2: Network threads got
busier
Network threads' utilization was
very high.
Metrics:
kafka.network:type=SocketServer,n
ame=NetworkProcessorAvgIdlePercen
t
Request handling in Kafka broker
Two thread layers:
Network Threads: Reads/Writes request/response from/to client sockets.
Request Handler Threads: Processes requests and produces response object.
Request handling - Read Request
Request handling - Process
Request handling - Write Response
Network thread runs event loop
— Multiplex and processes assigned client sockets sequentially.
— It never blocks awaiting IO completion.
=> So it makes sense to set num.network.threads <= CPU_CORES
When Network threads gets busy...
It means either one of:
1. Really busy doing lots of work. Many requests/
responses to read/write
2. Blocked by some operations (which should not
happen in event loop in general)
Response handling of normal
requests
When response is in queue, all
data to be transferred are in
memory.
Exceptional handling for Fetch
response
When response is in queue, topic
data segments are not in
userspace memory.
=> Copy to client socket directly
inside the kernel using
sendfile(2) system call.
What if target data doesn't exists in page cache?
Target data in page cache:
=> Just a memory copy. Very fast: ~ 100us
Target data is NOT in page cache:
=> Needs to load data from disk into page cache first.
Can be slow: ~ 50ms (or even slower)
Suspecting blocking in sendfile(2)
Inspected duration of sendfile system calls issued by broker process using
SystemTap (dynamic tracing tool to probe events in kernel. see my previous talk 4
)
$ stap -e ‘(script counting sendfile(2) duration histogram)’
# value (us)
value |---------------------------------------- count
0 | 0
1 | 71
2 |@@@ 6171
16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 29472
32 |@@@ 3418
2048 | 0
...
8192 | 3
4
https://www.confluent.io/kafka-summit-sf17/One-Day-One-Data-Hub-100-Billion-Messages-
Kafka-at-LINE
Problem hypothesis
Fetch request reading old data causes blocking sendfile(2) in event loop and applying latency for
responses needs to be processed in the same network thread.
Problem hypothesis
Super harmful because of:
It can be triggered either by:
- Consumers attempting to fetch old data
- Replica fetch by follower brokers for restoring replica
of old logs
=> Both are very common scenario
Breaks performance isolation among independent
clients.
Solution candidates
A: Separate network threads among clients
=> Possible, but a lot of changes required
=> Not essential because network threads should be
completely computation intensive
B: Balance connections among network threads
=> Possible, but again a lot of changes
=> Still for first moment other connections will get
affected
Solution candidates
C: Make sure that data are ready on memory before the
response passed to the network thread
=> Event loop never blocks
Choice: Warmup page cache
before the network thread
Move blocking part to request
handler threads (= single queue
and pool of threads)
=> Free thread can take arbitrary
task (request) while some
threads are blocked.
Choice: Warmup page cache
before the network thread
When Network thread calls
sendfile(2) for transferring log
data, it's always in page cache.
Warming up page cache with minimal overhead
Easiest way: Do synchronous read(2) on target data
=> Large overhead by copying memory from kernel to
userland
Why is Kafka using sendfile(2) for transferring topic data?
=> To avoid expensive large memory copy
How can we achieve it keeping this property?
Trick #1 Zero copy synchronous
page load
Call sendfile(2) for target data
with dest /dev/null.
The /dev/null driver does not
actually copy data to anywhere.
Why it has almost no overhead?
Linux kernel internally uses splice to implement sendfile(2).
splice implementation of /dev/null returns w/o iterating target data.
# ./drivers/char/mem.c
static const struct file_operations null_fops = {
...
.splice_write = splice_write_null,
};
static int pipe_to_null(...)
{
return sd->len;
}
static ssize_t splice_write_null(...)
{
return splice_from_pipe(pipe, out, ppos, len, flags, pipe_to_null);
}
Implementing page load
// FileRecords.java
private static final java.nio.file.Path DEVNULL_PATH =
new File("/dev/null").toPath();
public void prepareForRead() throws IOException {
long size = Math.min(channel.size(), end) - start;
try (FileChannel devnullChannel = FileChannel.open(
DEVNULL_PATH, StandardOpenOption.WRITE)) {
// Calls sendfile(2) internally
channel.transferTo(start, size, devnullChannel);
}
}
Trick #2 Skip the "hot" last log
segment
Another concern: additional
syscalls * Fetch req count?
- Warming up is necessary only
for older data.
- Exclude the last log segment
from the warmup target.
Trick #2 Skip the "hot" last log segment
# Log.scala#read
@@ -585,6 +586,17 @@ class Log(@volatile var dir: File,
if(fetchInfo == null) {
entry = segments.higherEntry(entry.getKey)
} else {
+ // For last entries we assume that it is hot enough to still have all data in page cache.
+ // Most of fetch requests are fetching from the tail of the log, so this optimization
+ // should save call of sendfile significantly.
+ if (!isLastEntry && fetchInfo.records.isInstanceOf[FileRecords]) {
+ try {
+ info("Prepare Read for " + fetchInfo.records.asInstanceOf[FileRecords].file().getPath)
+ fetchInfo.records.asInstanceOf[FileRecords].prepareForRead()
+ } catch {
+ case e: Throwable => warn("failed to prepare cache for read", e)
+ }
+ }
return fetchInfo
}
It works
No response time degradation in irrelevant requests while there are coincidence of Fetch request
triggers disk read.
Patch upstream?
Concern: The patch heavily assumes underlying kernel
implementation.
Still:
- Effect is tremendous.
- Fixes very common performance degradation scenario.
Discuss at KAFKA-7504
Conclusion
— Talked requirements for multi tenancy clusters and
solutions
— Quota, Metrics, Slowlog ... and hacky patch.
— After fixing some issues our hosting policy is working well
and efficient, keeping:
— concept of single "Data Hub" and
— operational cost not proportional to the number of
users/usages.
— Kafka is well designed and implemented to contain many,
independent and different workloads.
End of presentation.
Questions?
Ad

More Related Content

What's hot (20)

How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for Performance
Brendan Gregg
 
Hive: Loading Data
Hive: Loading DataHive: Loading Data
Hive: Loading Data
Benjamin Leonhardi
 
Memory access tracing [poug17]
Memory access tracing [poug17]Memory access tracing [poug17]
Memory access tracing [poug17]
Mahmoud Hatem
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017
Gwen (Chen) Shapira
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Databricks
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking Walkthrough
Thomas Graf
 
10分で分かるLinuxブロックレイヤ
10分で分かるLinuxブロックレイヤ10分で分かるLinuxブロックレイヤ
10分で分かるLinuxブロックレイヤ
Takashi Hoshino
 
Develop Your Own Operating Systems using Cheap ARM Boards
Develop Your Own Operating Systems using Cheap ARM BoardsDevelop Your Own Operating Systems using Cheap ARM Boards
Develop Your Own Operating Systems using Cheap ARM Boards
National Cheng Kung University
 
Rethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming SystemsRethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming Systems
Yingjun Wu
 
InnoDB Locking Explained with Stick Figures
InnoDB Locking Explained with Stick FiguresInnoDB Locking Explained with Stick Figures
InnoDB Locking Explained with Stick Figures
Karwin Software Solutions LLC
 
PostgreSQLのfull_page_writesについて(第24回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQLのfull_page_writesについて(第24回PostgreSQLアンカンファレンス@オンライン 発表資料)PostgreSQLのfull_page_writesについて(第24回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQLのfull_page_writesについて(第24回PostgreSQLアンカンファレンス@オンライン 発表資料)
NTT DATA Technology & Innovation
 
PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会
PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会
PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会
Shigeru Hanada
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Stores
confluent
 
SQL Performance Improvements at a Glance in Apache Spark 3.0
SQL Performance Improvements at a Glance in Apache Spark 3.0SQL Performance Improvements at a Glance in Apache Spark 3.0
SQL Performance Improvements at a Glance in Apache Spark 3.0
Databricks
 
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on TezAchieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
DataWorks Summit/Hadoop Summit
 
ORC Deep Dive 2020
ORC Deep Dive 2020ORC Deep Dive 2020
ORC Deep Dive 2020
Owen O'Malley
 
Everything you always wanted to know about Redis but were afraid to ask
Everything you always wanted to know about Redis but were afraid to askEverything you always wanted to know about Redis but were afraid to ask
Everything you always wanted to know about Redis but were afraid to ask
Carlos Abalde
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
Brendan Gregg
 
Apache Spark
Apache SparkApache Spark
Apache Spark
SugumarSarDurai
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
Kostas Tzoumas
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for Performance
Brendan Gregg
 
Memory access tracing [poug17]
Memory access tracing [poug17]Memory access tracing [poug17]
Memory access tracing [poug17]
Mahmoud Hatem
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017
Gwen (Chen) Shapira
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Databricks
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking Walkthrough
Thomas Graf
 
10分で分かるLinuxブロックレイヤ
10分で分かるLinuxブロックレイヤ10分で分かるLinuxブロックレイヤ
10分で分かるLinuxブロックレイヤ
Takashi Hoshino
 
Develop Your Own Operating Systems using Cheap ARM Boards
Develop Your Own Operating Systems using Cheap ARM BoardsDevelop Your Own Operating Systems using Cheap ARM Boards
Develop Your Own Operating Systems using Cheap ARM Boards
National Cheng Kung University
 
Rethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming SystemsRethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming Systems
Yingjun Wu
 
PostgreSQLのfull_page_writesについて(第24回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQLのfull_page_writesについて(第24回PostgreSQLアンカンファレンス@オンライン 発表資料)PostgreSQLのfull_page_writesについて(第24回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQLのfull_page_writesについて(第24回PostgreSQLアンカンファレンス@オンライン 発表資料)
NTT DATA Technology & Innovation
 
PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会
PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会
PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会
Shigeru Hanada
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Stores
confluent
 
SQL Performance Improvements at a Glance in Apache Spark 3.0
SQL Performance Improvements at a Glance in Apache Spark 3.0SQL Performance Improvements at a Glance in Apache Spark 3.0
SQL Performance Improvements at a Glance in Apache Spark 3.0
Databricks
 
Everything you always wanted to know about Redis but were afraid to ask
Everything you always wanted to know about Redis but were afraid to askEverything you always wanted to know about Redis but were afraid to ask
Everything you always wanted to know about Redis but were afraid to ask
Carlos Abalde
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
Brendan Gregg
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
Kostas Tzoumas
 

Similar to Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE (20)

Multitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEMultitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINE
kawamuray
 
Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28
Xavier Lucas
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
DataWorks Summit/Hadoop Summit
 
Kafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internalsKafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internals
Ayyappadas Ravindran (Appu)
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network Processing
Ryousei Takano
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
Ding Li
 
Kafka Deep Dive
Kafka Deep DiveKafka Deep Dive
Kafka Deep Dive
Knoldus Inc.
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
Joe Stein
 
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messagesMulti-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
LINE Corporation
 
Clug 2011 March web server optimisation
Clug 2011 March  web server optimisationClug 2011 March  web server optimisation
Clug 2011 March web server optimisation
grooverdan
 
Kafka zero to hero
Kafka zero to heroKafka zero to hero
Kafka zero to hero
Avi Levi
 
Apache Kafka - From zero to hero
Apache Kafka - From zero to heroApache Kafka - From zero to hero
Apache Kafka - From zero to hero
Apache Kafka TLV
 
Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Production
confluent
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
Joe Stein
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
Joe Stein
 
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache CassandraMovile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
DataStax Academy
 
Cassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of SeasonsCassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of Seasons
Eiti Kimura
 
Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...
Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...
Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...
Ontico
 
Scaling HDFS at Xiaomi
Scaling HDFS at XiaomiScaling HDFS at Xiaomi
Scaling HDFS at Xiaomi
DataWorks Summit
 
Scaling HDFS at Xiaomi
Scaling HDFS at XiaomiScaling HDFS at Xiaomi
Scaling HDFS at Xiaomi
DataWorks Summit
 
Multitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEMultitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINE
kawamuray
 
Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28
Xavier Lucas
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
DataWorks Summit/Hadoop Summit
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network Processing
Ryousei Takano
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
Ding Li
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
Joe Stein
 
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messagesMulti-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
LINE Corporation
 
Clug 2011 March web server optimisation
Clug 2011 March  web server optimisationClug 2011 March  web server optimisation
Clug 2011 March web server optimisation
grooverdan
 
Kafka zero to hero
Kafka zero to heroKafka zero to hero
Kafka zero to hero
Avi Levi
 
Apache Kafka - From zero to hero
Apache Kafka - From zero to heroApache Kafka - From zero to hero
Apache Kafka - From zero to hero
Apache Kafka TLV
 
Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Production
confluent
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
Joe Stein
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
Joe Stein
 
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache CassandraMovile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
DataStax Academy
 
Cassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of SeasonsCassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of Seasons
Eiti Kimura
 
Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...
Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...
Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...
Ontico
 
Ad

More from confluent (20)

Webinar Think Right - Shift Left - 19-03-2025.pptx
Webinar Think Right - Shift Left - 19-03-2025.pptxWebinar Think Right - Shift Left - 19-03-2025.pptx
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
Migration, backup and restore made easy using Kannika
Migration, backup and restore made easy using KannikaMigration, backup and restore made easy using Kannika
Migration, backup and restore made easy using Kannika
confluent
 
Five Things You Need to Know About Data Streaming in 2025
Five Things You Need to Know About Data Streaming in 2025Five Things You Need to Know About Data Streaming in 2025
Five Things You Need to Know About Data Streaming in 2025
confluent
 
Data in Motion Tour Seoul 2024 - Keynote
Data in Motion Tour Seoul 2024 - KeynoteData in Motion Tour Seoul 2024 - Keynote
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
Data in Motion Tour Seoul 2024 - Roadmap Demo
Data in Motion Tour Seoul 2024  - Roadmap DemoData in Motion Tour Seoul 2024  - Roadmap Demo
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
Confluent per il settore FSI:  Accelerare l'Innovazione con il Data Streaming...Confluent per il settore FSI:  Accelerare l'Innovazione con il Data Streaming...
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
Data in Motion Tour 2024 Riyadh, Saudi Arabia
Data in Motion Tour 2024 Riyadh, Saudi ArabiaData in Motion Tour 2024 Riyadh, Saudi Arabia
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
Build a Real-Time Decision Support Application for Financial Market Traders w...
Build a Real-Time Decision Support Application for Financial Market Traders w...Build a Real-Time Decision Support Application for Financial Market Traders w...
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
Strumenti e Strategie di Stream Governance con Confluent Platform
Strumenti e Strategie di Stream Governance con Confluent PlatformStrumenti e Strategie di Stream Governance con Confluent Platform
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not WeeksCompose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
Building Real-Time Gen AI Applications with SingleStore and Confluent
Building Real-Time Gen AI Applications with SingleStore and ConfluentBuilding Real-Time Gen AI Applications with SingleStore and Confluent
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
Unlocking value with event-driven architecture by Confluent
Unlocking value with event-driven architecture by ConfluentUnlocking value with event-driven architecture by Confluent
Unlocking value with event-driven architecture by Confluent
confluent
 
Il Data Streaming per un’AI real-time di nuova generazione
Il Data Streaming per un’AI real-time di nuova generazioneIl Data Streaming per un’AI real-time di nuova generazione
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
Break data silos with real-time connectivity using Confluent Cloud Connectors
Break data silos with real-time connectivity using Confluent Cloud ConnectorsBreak data silos with real-time connectivity using Confluent Cloud Connectors
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
Building API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructureBuilding API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructure
confluent
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
confluent
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
confluent
 
Webinar Think Right - Shift Left - 19-03-2025.pptx
Webinar Think Right - Shift Left - 19-03-2025.pptxWebinar Think Right - Shift Left - 19-03-2025.pptx
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
Migration, backup and restore made easy using Kannika
Migration, backup and restore made easy using KannikaMigration, backup and restore made easy using Kannika
Migration, backup and restore made easy using Kannika
confluent
 
Five Things You Need to Know About Data Streaming in 2025
Five Things You Need to Know About Data Streaming in 2025Five Things You Need to Know About Data Streaming in 2025
Five Things You Need to Know About Data Streaming in 2025
confluent
 
Data in Motion Tour Seoul 2024 - Keynote
Data in Motion Tour Seoul 2024 - KeynoteData in Motion Tour Seoul 2024 - Keynote
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
Data in Motion Tour Seoul 2024 - Roadmap Demo
Data in Motion Tour Seoul 2024  - Roadmap DemoData in Motion Tour Seoul 2024  - Roadmap Demo
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
Confluent per il settore FSI:  Accelerare l'Innovazione con il Data Streaming...Confluent per il settore FSI:  Accelerare l'Innovazione con il Data Streaming...
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
Data in Motion Tour 2024 Riyadh, Saudi Arabia
Data in Motion Tour 2024 Riyadh, Saudi ArabiaData in Motion Tour 2024 Riyadh, Saudi Arabia
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
Build a Real-Time Decision Support Application for Financial Market Traders w...
Build a Real-Time Decision Support Application for Financial Market Traders w...Build a Real-Time Decision Support Application for Financial Market Traders w...
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
Strumenti e Strategie di Stream Governance con Confluent Platform
Strumenti e Strategie di Stream Governance con Confluent PlatformStrumenti e Strategie di Stream Governance con Confluent Platform
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not WeeksCompose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
Building Real-Time Gen AI Applications with SingleStore and Confluent
Building Real-Time Gen AI Applications with SingleStore and ConfluentBuilding Real-Time Gen AI Applications with SingleStore and Confluent
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
Unlocking value with event-driven architecture by Confluent
Unlocking value with event-driven architecture by ConfluentUnlocking value with event-driven architecture by Confluent
Unlocking value with event-driven architecture by Confluent
confluent
 
Il Data Streaming per un’AI real-time di nuova generazione
Il Data Streaming per un’AI real-time di nuova generazioneIl Data Streaming per un’AI real-time di nuova generazione
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
Break data silos with real-time connectivity using Confluent Cloud Connectors
Break data silos with real-time connectivity using Confluent Cloud ConnectorsBreak data silos with real-time connectivity using Confluent Cloud Connectors
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
Building API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructureBuilding API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructure
confluent
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
confluent
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
confluent
 
Ad

Recently uploaded (20)

Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 

Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE

  • 1. Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE Yuto Kawamura - LINE Corporation
  • 2. Speaker introduction Yuto Kawamura Senior Software Engineer at LINE Leading a team for providing a company-wide Kafka platform Apache Kafka Contributor Speaker at Kafka Summit SF 2017 1 1 https://kafka-summit.org/sessions/single-data-hub- services-feed-100-billion-messages-per-day/
  • 3. LINE Messaging service 164 million active users in countries with top market share like Japan, Taiwan, Thailand and Indonesia.2 And many other services: - News - Bitbox/Bitmax - Cryptocurrency trading - LINE Pay - Digital payment 2 As of June 2018.
  • 4. Kafka platform at LINE Two main usages: — "Data Hub" for distributing data to other services — e.g: Users relationship update event from messaging service — As a task queue for buffering and processing business logic asynchronously
  • 5. Kafka platform at LINE Single cluster is shared by many independent services for: - Concept of Data Hub - Efficiency of management/operation Messaging, AD, News, Blockchain and etc... all of their data stored and distributed on single Kafka cluster.
  • 6. From department-wide to company-wide platform It was just for messaging service. Now everyone uses it.
  • 7. Broker installation CPU: Intel(R) Xeon(R) 2.20GHz x 20 cores (HT) * 2 Memory: 256GiB - more memory, more caching (page cache) - newly written data can survive only 20 minutes ... Network: 10Gbps Disk: HDD x 12 RAID 1+0 - saves maintenance costs Kafka version: 0.10.2.1 ~ 0.11.1.2
  • 8. Requirements doing multitenancy Cluster can protect itself against abusing workloads - Accidental workload doesn't propagates to other users. We can track on which client is sending requests - Find source of strange requests. Certain level of isolation among client workloads - Slow response for one client doesn't appears to another client.
  • 9. Protect cluster against abusing workload - Request Quota It is more important to manage number of requests over incoming/outgoing byte rate. Kafka is amazingly durable for large data if they are well-batched. => Producers which configures linger.ms=0 with large number of servers probably leads large amount of requests Starting from 0.11.0.0, by KIP-124 we can configure request rate quota 3 3 https://cwiki.apache.org/confluence/display/KAFKA/KIP-124+-+Request+rate+quotas
  • 10. Protect cluster against abusing workload - Request Quota Basic idea is to apply default quota for preventing single abusing client destabilize the cluster as a least protection. *Not for controlling resource quantity for each client.
  • 11. Track on requests from clients - Metrics — kafka.server:type=Request,user=([-.w]+),client- id=([-.w]+):request-time — Percentage of time spent in broker network and I/O threads to process requests from each client group. — Useful to see how much of broker resource is being consumed by each client.
  • 12. Track on requests from clients - Slowlog Log requests which took longer than certain threshold to process. - Kafka has "request logging" but it leads too many of lines - Inspired by HBase's Thresholds can be changed dynamically through JMX console for each request type.
  • 13. Isolation among client workloads Let's give a look through the actual troubleshooting.
  • 14. Detection Detection: 50x ~ 100x slower response time in 99th %ile Produce response time. Normal: ~20ms Observed: 50ms ~ 200ms
  • 15. Finding #1: Disk read coincidence Coincidence disk read of certain amount.
  • 16. Finding #2: Network threads got busier Network threads' utilization was very high. Metrics: kafka.network:type=SocketServer,n ame=NetworkProcessorAvgIdlePercen t
  • 17. Request handling in Kafka broker Two thread layers: Network Threads: Reads/Writes request/response from/to client sockets. Request Handler Threads: Processes requests and produces response object.
  • 18. Request handling - Read Request
  • 20. Request handling - Write Response
  • 21. Network thread runs event loop — Multiplex and processes assigned client sockets sequentially. — It never blocks awaiting IO completion. => So it makes sense to set num.network.threads <= CPU_CORES
  • 22. When Network threads gets busy... It means either one of: 1. Really busy doing lots of work. Many requests/ responses to read/write 2. Blocked by some operations (which should not happen in event loop in general)
  • 23. Response handling of normal requests When response is in queue, all data to be transferred are in memory.
  • 24. Exceptional handling for Fetch response When response is in queue, topic data segments are not in userspace memory. => Copy to client socket directly inside the kernel using sendfile(2) system call.
  • 25. What if target data doesn't exists in page cache? Target data in page cache: => Just a memory copy. Very fast: ~ 100us Target data is NOT in page cache: => Needs to load data from disk into page cache first. Can be slow: ~ 50ms (or even slower)
  • 26. Suspecting blocking in sendfile(2) Inspected duration of sendfile system calls issued by broker process using SystemTap (dynamic tracing tool to probe events in kernel. see my previous talk 4 ) $ stap -e ‘(script counting sendfile(2) duration histogram)’ # value (us) value |---------------------------------------- count 0 | 0 1 | 71 2 |@@@ 6171 16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 29472 32 |@@@ 3418 2048 | 0 ... 8192 | 3 4 https://www.confluent.io/kafka-summit-sf17/One-Day-One-Data-Hub-100-Billion-Messages- Kafka-at-LINE
  • 27. Problem hypothesis Fetch request reading old data causes blocking sendfile(2) in event loop and applying latency for responses needs to be processed in the same network thread.
  • 28. Problem hypothesis Super harmful because of: It can be triggered either by: - Consumers attempting to fetch old data - Replica fetch by follower brokers for restoring replica of old logs => Both are very common scenario Breaks performance isolation among independent clients.
  • 29. Solution candidates A: Separate network threads among clients => Possible, but a lot of changes required => Not essential because network threads should be completely computation intensive B: Balance connections among network threads => Possible, but again a lot of changes => Still for first moment other connections will get affected
  • 30. Solution candidates C: Make sure that data are ready on memory before the response passed to the network thread => Event loop never blocks
  • 31. Choice: Warmup page cache before the network thread Move blocking part to request handler threads (= single queue and pool of threads) => Free thread can take arbitrary task (request) while some threads are blocked.
  • 32. Choice: Warmup page cache before the network thread When Network thread calls sendfile(2) for transferring log data, it's always in page cache.
  • 33. Warming up page cache with minimal overhead Easiest way: Do synchronous read(2) on target data => Large overhead by copying memory from kernel to userland Why is Kafka using sendfile(2) for transferring topic data? => To avoid expensive large memory copy How can we achieve it keeping this property?
  • 34. Trick #1 Zero copy synchronous page load Call sendfile(2) for target data with dest /dev/null. The /dev/null driver does not actually copy data to anywhere.
  • 35. Why it has almost no overhead? Linux kernel internally uses splice to implement sendfile(2). splice implementation of /dev/null returns w/o iterating target data. # ./drivers/char/mem.c static const struct file_operations null_fops = { ... .splice_write = splice_write_null, }; static int pipe_to_null(...) { return sd->len; } static ssize_t splice_write_null(...) { return splice_from_pipe(pipe, out, ppos, len, flags, pipe_to_null); }
  • 36. Implementing page load // FileRecords.java private static final java.nio.file.Path DEVNULL_PATH = new File("/dev/null").toPath(); public void prepareForRead() throws IOException { long size = Math.min(channel.size(), end) - start; try (FileChannel devnullChannel = FileChannel.open( DEVNULL_PATH, StandardOpenOption.WRITE)) { // Calls sendfile(2) internally channel.transferTo(start, size, devnullChannel); } }
  • 37. Trick #2 Skip the "hot" last log segment Another concern: additional syscalls * Fetch req count? - Warming up is necessary only for older data. - Exclude the last log segment from the warmup target.
  • 38. Trick #2 Skip the "hot" last log segment # Log.scala#read @@ -585,6 +586,17 @@ class Log(@volatile var dir: File, if(fetchInfo == null) { entry = segments.higherEntry(entry.getKey) } else { + // For last entries we assume that it is hot enough to still have all data in page cache. + // Most of fetch requests are fetching from the tail of the log, so this optimization + // should save call of sendfile significantly. + if (!isLastEntry && fetchInfo.records.isInstanceOf[FileRecords]) { + try { + info("Prepare Read for " + fetchInfo.records.asInstanceOf[FileRecords].file().getPath) + fetchInfo.records.asInstanceOf[FileRecords].prepareForRead() + } catch { + case e: Throwable => warn("failed to prepare cache for read", e) + } + } return fetchInfo }
  • 39. It works No response time degradation in irrelevant requests while there are coincidence of Fetch request triggers disk read.
  • 40. Patch upstream? Concern: The patch heavily assumes underlying kernel implementation. Still: - Effect is tremendous. - Fixes very common performance degradation scenario. Discuss at KAFKA-7504
  • 41. Conclusion — Talked requirements for multi tenancy clusters and solutions — Quota, Metrics, Slowlog ... and hacky patch. — After fixing some issues our hosting policy is working well and efficient, keeping: — concept of single "Data Hub" and — operational cost not proportional to the number of users/usages. — Kafka is well designed and implemented to contain many, independent and different workloads.