SlideShare a Scribd company logo
Building a real-
time streaming
platform using
Kafka Connect +
Kafka Streams
Jeremy Custenborder, Systems Engineer, Confluent
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
• Everything in the company is a real-time stream
• > 1.2 trillion messages written per day
• > 3.4 trillion messages read per day
• ~ 1 PB of stream data
• Thousands of engineers
• Tens of thousands of producer processes
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Confluent   building a real-time streaming platform using kafka streams and kafka connect-20min-version
Resources
• Confluent
• Company website: http://www.confluent.io
• Blog: http://www.confluent.io/blog
• Free Ebook “Making Sense of Stream Processing”
http://www.confluent.io/making-sense-of-stream-processing-ebook
• Apache Kafka
• http://kafka.apache.org
• Kafka Connect
• http://www.confluent.io/blog/announcing-kafka-connect-building-large-scale-
low-latency-data-pipelines
• Kafka Streams
• http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-
made-simple
Thanks!
Jeremy Custenborder | jeremy@confluent.io |
Download Kafka
and Confluent Platform
www.confluent.io/download
Training!
http://www.confluent.io/training
Discount Code: BELLEVUE10
Operations Training in Seattle December 10th.

More Related Content

What's hot (20)

Data integration with Apache Kafka
Data integration with Apache KafkaData integration with Apache Kafka
Data integration with Apache Kafka
confluent
 
Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matter
confluent
 
Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016
Gwen (Chen) Shapira
 
Kafka connect
Kafka connectKafka connect
Kafka connect
Andrew Stevenson
 
Apache Kafka & Kafka Connectを に使ったデータ連携パターン(改めETLの実装)
Apache Kafka & Kafka Connectを に使ったデータ連携パターン(改めETLの実装)Apache Kafka & Kafka Connectを に使ったデータ連携パターン(改めETLの実装)
Apache Kafka & Kafka Connectを に使ったデータ連携パターン(改めETLの実装)
Keigo Suda
 
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
HostedbyConfluent
 
The Many Faces of Apache Kafka: Leveraging real-time data at scale
The Many Faces of Apache Kafka: Leveraging real-time data at scaleThe Many Faces of Apache Kafka: Leveraging real-time data at scale
The Many Faces of Apache Kafka: Leveraging real-time data at scale
Neha Narkhede
 
Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...
Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...
Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...
HostedbyConfluent
 
Monitoring Apache Kafka with Confluent Control Center
Monitoring Apache Kafka with Confluent Control Center   Monitoring Apache Kafka with Confluent Control Center
Monitoring Apache Kafka with Confluent Control Center
confluent
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
Michael Noll
 
Deploying Kafka on DC/OS
Deploying Kafka on DC/OSDeploying Kafka on DC/OS
Deploying Kafka on DC/OS
Kaufman Ng
 
Data Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectData Pipelines with Kafka Connect
Data Pipelines with Kafka Connect
Kaufman Ng
 
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
Michael Noll
 
Introduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - MadridIntroduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - Madrid
Paolo Castagna
 
Apache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformApache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platform
confluent
 
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structures
confluent
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
confluent
 
KSQL Intro
KSQL IntroKSQL Intro
KSQL Intro
confluent
 
PostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data CapturePostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data Capture
Jeff Klukas
 
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Michael Noll
 
Data integration with Apache Kafka
Data integration with Apache KafkaData integration with Apache Kafka
Data integration with Apache Kafka
confluent
 
Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matter
confluent
 
Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016
Gwen (Chen) Shapira
 
Apache Kafka & Kafka Connectを に使ったデータ連携パターン(改めETLの実装)
Apache Kafka & Kafka Connectを に使ったデータ連携パターン(改めETLの実装)Apache Kafka & Kafka Connectを に使ったデータ連携パターン(改めETLの実装)
Apache Kafka & Kafka Connectを に使ったデータ連携パターン(改めETLの実装)
Keigo Suda
 
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
HostedbyConfluent
 
The Many Faces of Apache Kafka: Leveraging real-time data at scale
The Many Faces of Apache Kafka: Leveraging real-time data at scaleThe Many Faces of Apache Kafka: Leveraging real-time data at scale
The Many Faces of Apache Kafka: Leveraging real-time data at scale
Neha Narkhede
 
Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...
Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...
Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...
HostedbyConfluent
 
Monitoring Apache Kafka with Confluent Control Center
Monitoring Apache Kafka with Confluent Control Center   Monitoring Apache Kafka with Confluent Control Center
Monitoring Apache Kafka with Confluent Control Center
confluent
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
Michael Noll
 
Deploying Kafka on DC/OS
Deploying Kafka on DC/OSDeploying Kafka on DC/OS
Deploying Kafka on DC/OS
Kaufman Ng
 
Data Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectData Pipelines with Kafka Connect
Data Pipelines with Kafka Connect
Kaufman Ng
 
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
Michael Noll
 
Introduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - MadridIntroduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - Madrid
Paolo Castagna
 
Apache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformApache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platform
confluent
 
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structures
confluent
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
confluent
 
PostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data CapturePostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data Capture
Jeff Klukas
 
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Michael Noll
 

Similar to Confluent building a real-time streaming platform using kafka streams and kafka connect-20min-version (20)

Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Reducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive StreamsReducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive Streams
jimriecken
 
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...
Jonghyun Lee
 
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
Data Con LA
 
Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016
Peter Bakas
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis
 
Kafka 탄생과 생태계
Kafka 탄생과 생태계Kafka 탄생과 생태계
Kafka 탄생과 생태계
Gee Yeol Nahm
 
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 20190-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
confluent
 
Sas 2015 event_driven
Sas 2015 event_drivenSas 2015 event_driven
Sas 2015 event_driven
Sascha Möllering
 
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
LINE Corporation
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Michael Noll
 
introductiontoapachekafka-201102140206.pdf
introductiontoapachekafka-201102140206.pdfintroductiontoapachekafka-201102140206.pdf
introductiontoapachekafka-201102140206.pdf
TarekHamdi8
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
AIMDek Technologies
 
Distributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleDistributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola Scale
Apache Kafka TLV
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
Joe Stein
 
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
Monal Daxini
 
Web Analytics using Kafka - August talk w/ Women Who Code
Web Analytics using Kafka - August talk w/ Women Who CodeWeb Analytics using Kafka - August talk w/ Women Who Code
Web Analytics using Kafka - August talk w/ Women Who Code
Purnima Kamath
 
Data streaming-systems
Data streaming-systemsData streaming-systems
Data streaming-systems
imcpune
 
NoSQL afternoon in Japan Kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePackNoSQL afternoon in Japan Kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePack
Sadayuki Furuhashi
 
NoSQL afternoon in Japan kumofs & MessagePack
NoSQL afternoon in Japan kumofs & MessagePackNoSQL afternoon in Japan kumofs & MessagePack
NoSQL afternoon in Japan kumofs & MessagePack
Sadayuki Furuhashi
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Reducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive StreamsReducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive Streams
jimriecken
 
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...
Jonghyun Lee
 
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
Data Con LA
 
Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016
Peter Bakas
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis
 
Kafka 탄생과 생태계
Kafka 탄생과 생태계Kafka 탄생과 생태계
Kafka 탄생과 생태계
Gee Yeol Nahm
 
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 20190-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
confluent
 
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
LINE Corporation
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Michael Noll
 
introductiontoapachekafka-201102140206.pdf
introductiontoapachekafka-201102140206.pdfintroductiontoapachekafka-201102140206.pdf
introductiontoapachekafka-201102140206.pdf
TarekHamdi8
 
Distributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleDistributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola Scale
Apache Kafka TLV
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
Joe Stein
 
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
Monal Daxini
 
Web Analytics using Kafka - August talk w/ Women Who Code
Web Analytics using Kafka - August talk w/ Women Who CodeWeb Analytics using Kafka - August talk w/ Women Who Code
Web Analytics using Kafka - August talk w/ Women Who Code
Purnima Kamath
 
Data streaming-systems
Data streaming-systemsData streaming-systems
Data streaming-systems
imcpune
 
NoSQL afternoon in Japan Kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePackNoSQL afternoon in Japan Kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePack
Sadayuki Furuhashi
 
NoSQL afternoon in Japan kumofs & MessagePack
NoSQL afternoon in Japan kumofs & MessagePackNoSQL afternoon in Japan kumofs & MessagePack
NoSQL afternoon in Japan kumofs & MessagePack
Sadayuki Furuhashi
 
Ad

Recently uploaded (20)

Topic 26 Security Testing Considerations.pptx
Topic 26 Security Testing Considerations.pptxTopic 26 Security Testing Considerations.pptx
Topic 26 Security Testing Considerations.pptx
marutnand8
 
How to purchase, license and subscribe to Microsoft Azure_PDF.pdf
How to purchase, license and subscribe to Microsoft Azure_PDF.pdfHow to purchase, license and subscribe to Microsoft Azure_PDF.pdf
How to purchase, license and subscribe to Microsoft Azure_PDF.pdf
victordsane
 
Software Engineering Process, Notation & Tools Introduction - Part 3
Software Engineering Process, Notation & Tools Introduction - Part 3Software Engineering Process, Notation & Tools Introduction - Part 3
Software Engineering Process, Notation & Tools Introduction - Part 3
Gaurav Sharma
 
The Future of Open Source Reporting Best Alternatives to Jaspersoft.pdf
The Future of Open Source Reporting Best Alternatives to Jaspersoft.pdfThe Future of Open Source Reporting Best Alternatives to Jaspersoft.pdf
The Future of Open Source Reporting Best Alternatives to Jaspersoft.pdf
Varsha Nayak
 
The rise of e-commerce has redefined how retailers operate—and reconciliation...
The rise of e-commerce has redefined how retailers operate—and reconciliation...The rise of e-commerce has redefined how retailers operate—and reconciliation...
The rise of e-commerce has redefined how retailers operate—and reconciliation...
Prachi Desai
 
From Chaos to Clarity - Designing (AI-Ready) APIs with APIOps Cycles
From Chaos to Clarity - Designing (AI-Ready) APIs with APIOps CyclesFrom Chaos to Clarity - Designing (AI-Ready) APIs with APIOps Cycles
From Chaos to Clarity - Designing (AI-Ready) APIs with APIOps Cycles
Marjukka Niinioja
 
Generative Artificial Intelligence and its Applications
Generative Artificial Intelligence and its ApplicationsGenerative Artificial Intelligence and its Applications
Generative Artificial Intelligence and its Applications
SandeepKS52
 
Top 11 Fleet Management Software Providers in 2025 (2).pdf
Top 11 Fleet Management Software Providers in 2025 (2).pdfTop 11 Fleet Management Software Providers in 2025 (2).pdf
Top 11 Fleet Management Software Providers in 2025 (2).pdf
Trackobit
 
Eliminate the complexities of Event-Driven Architecture with Domain-Driven De...
Eliminate the complexities of Event-Driven Architecture with Domain-Driven De...Eliminate the complexities of Event-Driven Architecture with Domain-Driven De...
Eliminate the complexities of Event-Driven Architecture with Domain-Driven De...
SheenBrisals
 
How Insurance Policy Administration Streamlines Policy Lifecycle for Agile Op...
How Insurance Policy Administration Streamlines Policy Lifecycle for Agile Op...How Insurance Policy Administration Streamlines Policy Lifecycle for Agile Op...
How Insurance Policy Administration Streamlines Policy Lifecycle for Agile Op...
Insurance Tech Services
 
Essentials of Resource Planning in a Downturn
Essentials of Resource Planning in a DownturnEssentials of Resource Planning in a Downturn
Essentials of Resource Planning in a Downturn
OnePlan Solutions
 
COBOL Programming with VSCode - IBM Certificate
COBOL Programming with VSCode - IBM CertificateCOBOL Programming with VSCode - IBM Certificate
COBOL Programming with VSCode - IBM Certificate
VICTOR MAESTRE RAMIREZ
 
Bonk coin airdrop_ Everything You Need to Know.pdf
Bonk coin airdrop_ Everything You Need to Know.pdfBonk coin airdrop_ Everything You Need to Know.pdf
Bonk coin airdrop_ Everything You Need to Know.pdf
Herond Labs
 
Agentic Techniques in Retrieval-Augmented Generation with Azure AI Search
Agentic Techniques in Retrieval-Augmented Generation with Azure AI SearchAgentic Techniques in Retrieval-Augmented Generation with Azure AI Search
Agentic Techniques in Retrieval-Augmented Generation with Azure AI Search
Maxim Salnikov
 
Top 5 Task Management Software to Boost Productivity in 2025
Top 5 Task Management Software to Boost Productivity in 2025Top 5 Task Management Software to Boost Productivity in 2025
Top 5 Task Management Software to Boost Productivity in 2025
Orangescrum
 
AI and Deep Learning with NVIDIA Technologies
AI and Deep Learning with NVIDIA TechnologiesAI and Deep Learning with NVIDIA Technologies
AI and Deep Learning with NVIDIA Technologies
SandeepKS52
 
Integrating Survey123 and R&H Data Using FME
Integrating Survey123 and R&H Data Using FMEIntegrating Survey123 and R&H Data Using FME
Integrating Survey123 and R&H Data Using FME
Safe Software
 
Build enterprise-ready applications using skills you already have!
Build enterprise-ready applications using skills you already have!Build enterprise-ready applications using skills you already have!
Build enterprise-ready applications using skills you already have!
PhilMeredith3
 
Software Engineering Process, Notation & Tools Introduction - Part 4
Software Engineering Process, Notation & Tools Introduction - Part 4Software Engineering Process, Notation & Tools Introduction - Part 4
Software Engineering Process, Notation & Tools Introduction - Part 4
Gaurav Sharma
 
Marketo & Dynamics can be Most Excellent to Each Other – The Sequel
Marketo & Dynamics can be Most Excellent to Each Other – The SequelMarketo & Dynamics can be Most Excellent to Each Other – The Sequel
Marketo & Dynamics can be Most Excellent to Each Other – The Sequel
BradBedford3
 
Topic 26 Security Testing Considerations.pptx
Topic 26 Security Testing Considerations.pptxTopic 26 Security Testing Considerations.pptx
Topic 26 Security Testing Considerations.pptx
marutnand8
 
How to purchase, license and subscribe to Microsoft Azure_PDF.pdf
How to purchase, license and subscribe to Microsoft Azure_PDF.pdfHow to purchase, license and subscribe to Microsoft Azure_PDF.pdf
How to purchase, license and subscribe to Microsoft Azure_PDF.pdf
victordsane
 
Software Engineering Process, Notation & Tools Introduction - Part 3
Software Engineering Process, Notation & Tools Introduction - Part 3Software Engineering Process, Notation & Tools Introduction - Part 3
Software Engineering Process, Notation & Tools Introduction - Part 3
Gaurav Sharma
 
The Future of Open Source Reporting Best Alternatives to Jaspersoft.pdf
The Future of Open Source Reporting Best Alternatives to Jaspersoft.pdfThe Future of Open Source Reporting Best Alternatives to Jaspersoft.pdf
The Future of Open Source Reporting Best Alternatives to Jaspersoft.pdf
Varsha Nayak
 
The rise of e-commerce has redefined how retailers operate—and reconciliation...
The rise of e-commerce has redefined how retailers operate—and reconciliation...The rise of e-commerce has redefined how retailers operate—and reconciliation...
The rise of e-commerce has redefined how retailers operate—and reconciliation...
Prachi Desai
 
From Chaos to Clarity - Designing (AI-Ready) APIs with APIOps Cycles
From Chaos to Clarity - Designing (AI-Ready) APIs with APIOps CyclesFrom Chaos to Clarity - Designing (AI-Ready) APIs with APIOps Cycles
From Chaos to Clarity - Designing (AI-Ready) APIs with APIOps Cycles
Marjukka Niinioja
 
Generative Artificial Intelligence and its Applications
Generative Artificial Intelligence and its ApplicationsGenerative Artificial Intelligence and its Applications
Generative Artificial Intelligence and its Applications
SandeepKS52
 
Top 11 Fleet Management Software Providers in 2025 (2).pdf
Top 11 Fleet Management Software Providers in 2025 (2).pdfTop 11 Fleet Management Software Providers in 2025 (2).pdf
Top 11 Fleet Management Software Providers in 2025 (2).pdf
Trackobit
 
Eliminate the complexities of Event-Driven Architecture with Domain-Driven De...
Eliminate the complexities of Event-Driven Architecture with Domain-Driven De...Eliminate the complexities of Event-Driven Architecture with Domain-Driven De...
Eliminate the complexities of Event-Driven Architecture with Domain-Driven De...
SheenBrisals
 
How Insurance Policy Administration Streamlines Policy Lifecycle for Agile Op...
How Insurance Policy Administration Streamlines Policy Lifecycle for Agile Op...How Insurance Policy Administration Streamlines Policy Lifecycle for Agile Op...
How Insurance Policy Administration Streamlines Policy Lifecycle for Agile Op...
Insurance Tech Services
 
Essentials of Resource Planning in a Downturn
Essentials of Resource Planning in a DownturnEssentials of Resource Planning in a Downturn
Essentials of Resource Planning in a Downturn
OnePlan Solutions
 
COBOL Programming with VSCode - IBM Certificate
COBOL Programming with VSCode - IBM CertificateCOBOL Programming with VSCode - IBM Certificate
COBOL Programming with VSCode - IBM Certificate
VICTOR MAESTRE RAMIREZ
 
Bonk coin airdrop_ Everything You Need to Know.pdf
Bonk coin airdrop_ Everything You Need to Know.pdfBonk coin airdrop_ Everything You Need to Know.pdf
Bonk coin airdrop_ Everything You Need to Know.pdf
Herond Labs
 
Agentic Techniques in Retrieval-Augmented Generation with Azure AI Search
Agentic Techniques in Retrieval-Augmented Generation with Azure AI SearchAgentic Techniques in Retrieval-Augmented Generation with Azure AI Search
Agentic Techniques in Retrieval-Augmented Generation with Azure AI Search
Maxim Salnikov
 
Top 5 Task Management Software to Boost Productivity in 2025
Top 5 Task Management Software to Boost Productivity in 2025Top 5 Task Management Software to Boost Productivity in 2025
Top 5 Task Management Software to Boost Productivity in 2025
Orangescrum
 
AI and Deep Learning with NVIDIA Technologies
AI and Deep Learning with NVIDIA TechnologiesAI and Deep Learning with NVIDIA Technologies
AI and Deep Learning with NVIDIA Technologies
SandeepKS52
 
Integrating Survey123 and R&H Data Using FME
Integrating Survey123 and R&H Data Using FMEIntegrating Survey123 and R&H Data Using FME
Integrating Survey123 and R&H Data Using FME
Safe Software
 
Build enterprise-ready applications using skills you already have!
Build enterprise-ready applications using skills you already have!Build enterprise-ready applications using skills you already have!
Build enterprise-ready applications using skills you already have!
PhilMeredith3
 
Software Engineering Process, Notation & Tools Introduction - Part 4
Software Engineering Process, Notation & Tools Introduction - Part 4Software Engineering Process, Notation & Tools Introduction - Part 4
Software Engineering Process, Notation & Tools Introduction - Part 4
Gaurav Sharma
 
Marketo & Dynamics can be Most Excellent to Each Other – The Sequel
Marketo & Dynamics can be Most Excellent to Each Other – The SequelMarketo & Dynamics can be Most Excellent to Each Other – The Sequel
Marketo & Dynamics can be Most Excellent to Each Other – The Sequel
BradBedford3
 
Ad

Confluent building a real-time streaming platform using kafka streams and kafka connect-20min-version

  • 1. Building a real- time streaming platform using Kafka Connect + Kafka Streams Jeremy Custenborder, Systems Engineer, Confluent
  • 26. • Everything in the company is a real-time stream • > 1.2 trillion messages written per day • > 3.4 trillion messages read per day • ~ 1 PB of stream data • Thousands of engineers • Tens of thousands of producer processes
  • 55. Resources • Confluent • Company website: http://www.confluent.io • Blog: http://www.confluent.io/blog • Free Ebook “Making Sense of Stream Processing” http://www.confluent.io/making-sense-of-stream-processing-ebook • Apache Kafka • http://kafka.apache.org • Kafka Connect • http://www.confluent.io/blog/announcing-kafka-connect-building-large-scale- low-latency-data-pipelines • Kafka Streams • http://www.confluent.io/blog/introducing-kafka-streams-stream-processing- made-simple
  • 56. Thanks! Jeremy Custenborder | [email protected] | Download Kafka and Confluent Platform www.confluent.io/download

Editor's Notes

  • #2: Hi, I’m Neha Narkhede… There is a big paradigm shift happening around the world where companies are moving rapidly towards leveraging data in real-time and fundamentally moving away from batch-oriented computing. But how do you do that? Well that is what today’s talk is about. I’m going to summarize 6 years of work in 15 mins, so let’s get started.
  • #4: Unordered, unbounded and large-scale datasets are increasingly common in day-to-day business. Stream data means different things for different businesses. For retail, it might mean streams of orders and shipments, for finance, it might mean streams of stock ticker data while for web companies, it might mean streams of user activity data. Stream data is everywhere. At the same time, there is a huge push towards getting faster results: doing instant credit card fraud detection, doing instant credit card payment processing vs only 5 times a day, being able to detect and alert on a problem that causes retail sales to dip in seconds vs a day later (you can only imagine what that would do to retail companies over black Friday)
  • #5: So the takeaway is that businesses operate in real-time not batch, if you go to a store to buy something, you don’t wait there for several hours to get it. So data processing required to make key business decisions and to operate a business effectively should also happen in real-time. Here are some examples to support that claim…
  • #6: Event = something that happened. Different for different businesses.
  • #8: Log files are also event streams. For instance, every line in a log file is an event that in this case tells you how the service is being used.
  • #9: There is an inherent duality in tables and streams; Traditional databases are all about tables full of state but are not designed to respond to streams of events that modify those tables.
  • #10: Tables have rows that store the latest value for a unique key. But…no notion of time
  • #11: If you look at how a table gets constructed over time, you will notice that…
  • #12: The operations are actually a stream of events where the event is just the operation that modifies the table. Every database does this internally and it is called a changelog
  • #13: So events are everywhere, what next? We need to fundamentally move to event-centric thinking. For a retail website, there are possibly various avenues that generate the “product view” event. A standard thing to do is to ensure that all product view data ends up in Hadoop so you can run analytics on user interest to power various business functions from marketing to product positioning and so on.
  • #15: Reality about 100x more complex. In some corner, you are using some messaging system for app-to-app communication. You might have a custom way of loading data from various databases into Hadoop. But then more destinations appear over time and now you have to feed the same data to a search system, various caches etc. This is a common reality and a simplified version. 300 services ~100 databases Multi-datacenter Trolling: load into Oracle, search, etc
  • #16: The core insight is that a data pipeline is also an event stream.
  • #17: What you need instead of that scary picture is a central streaming platform at the heart of a datacenter. A central nervous system that collects data from various sources and feeds all other systems and apps that need to consume and process data in real-time. Why does this make sense?
  • #18: Why is a streaming platform needed? Because data sources and destinations add up over time. Initially you might have just the web app that produces the product view event and maybe you’ve only thought about analyzing it in Hadoop.
  • #19: But over time, the mobile app shows up that also produces the same data and several more applications as destinations for search, recommendations, security etc. Event centric thinking involves building a forward-compatible architecture. You will never be able to foresee what future apps might show up that will need the same data. So capture it in a central, scalable streaming platform that asynchronously feeds downstream systems.
  • #20: So how do you build such a streaming platform?
  • #21: That journey starts with Apache Kafka.
  • #22: At a high-level, Kafka is a pub-sub messaging system that has producers that capture events. Events are sent to and stored locally on a central cluster of brokers. And consumers subscribe to topics or named categories of data. End-to-end, producers to consumer data flow is real-time.
  • #23: Magic of Kafka is in the implementation. It is not just a pub-sub messaging system, it is a modern distributed platform… How so?
  • #27: All that means, you can throw lots of data at Kafka and have it be made available throughout the company within milliseconds. At LinkedIn and several other companies, Kafka is deployed at a large scale…
  • #28: In the last 5 years since it was open-sourced, it has been widely adopted by 1000s of companies worldwide.
  • #29: So Kafka is the foundation of the central streaming platform.
  • #31: Infrastructure is really only as useful as the data it has. The next step moving to a streaming platform based data architecture is solving the ETL problem.
  • #32: 0.9
  • #33: REST Apis for management
  • #36: Core: Data pipeline Venture bet: Stream processing
  • #37: Most people think they know…
  • #38: Doesn’t mean you drop everything on the floor if anything slows down Streaming algorithms—online space Can compute median
  • #39: About how inputs are translated into outputs (very fundamental)
  • #40: HTTP/REST All databases Run all the time Each request totally independent—No real ordering Can fail individual requests if you want Very simple! About the future!
  • #41: “Ed, the MapReduce job never finishes if you watch it like that” Job kicks off at a certain time Cron! Processes all the input, produces all the input Data is usually static Hadoop! DWH, JCL Archaic but powerful. Can do analytics! Compex algorithms! Also can be really efficient! Inherently high latency
  • #42: Generalizes request/response and batch. Program takes some inputs and produces some outputs Could be all inputs Could be one at a time Runs continuously forever!
  • #43: For some time, stream processing was thought of as a faster map-reduce layer useful for faster analytics, requiring deployment of a central cluster much like Hadoop. But in my experience, I’ve learnt that the most compelling applications that do stream processing look much more like an event-driven microservice and less like a Hive query or Spark job.
  • #44: Companies == streams What a retail store do Streams Retail - Sales - Shipments and logistics - Pricing - Re-ordering - Analytics - Fraud and theft
  • #50: Let’s dive into the real-time analytics and apps area
  • #53: Only one thing you can do if you think the world needs to change, you live in Silicon Valley—quit your job and do it. Mission: Build a Streaming Platform Product: Confluent Platform
  • #57: Thank you slide. Add to the end of your presentation.
  • #58: Thank you slide. Add to the end of your presentation.