SlideShare a Scribd company logo
Real Time Data Analytics @
UberAnkur Bansal
Apache Big Data Europe
November 14, 2016
About Me
● Sr. Software Engineer, Streaming Team @ Uber
○ Streaming team supports platform for real time data
analytics: Kafka, Samza, Flink, Pinot.. and plenty more
○ Focused on scaling Kafka at Uber’s pace
● Staff software Engineer @ Ebay
○ Build & scale Ebay’s cloud using openstack
● Apache Kylin: Committer, Emeritus PMC
Agenda
● Real time Use Cases
● Kafka Infrastructure Deep Dive
● Our own Development:
○ Rest Proxy & Clients
○ Local Agent
○ uReplicator (Mirrormaker)
○ Chaperone (Auditing)
● Operations/Tooling
Important Use Cases
Stream
Processing
Real-time Price Surging
SURGE
MULTIPLIERS
Rider eyeballs
Open car information
KAFKA
Real-time Machine Learning - UberEats ETD
Uber Real Time Data Analytics
● Fraud detection
● Share my ETA
And many more ...
Apache Kafka is Uber’s Lifeline
Kafka ecosystem @ Uber
100s of billion
100s TB
Messages/day
bytes/day
Kafka cluster stats
Multiple data centers
Kafka Infrastructure Deep Dive
Requirements
● Scale to 100s Billions/day → 1 Trillion/day
● High Throughput ( Scale: 100s TB → PB)
● Low Latency for most use cases(<5ms )
● Reliability - 99.99% ( #Msgs Available /#Msgs Produced)
● Multi-Language Support
● Tens of thousands of simultaneous clients.
● Reliable data replication across DC
Kafka Pipeline
Local
Agent
uReplicator
Kafka Pipeline: Data Flow
1
2
3 5 7
64 8
Kafka Clusters
Local
Agent
uReplicator
Kafka Clusters
● Use case based clusters
○ Data (async, reliable)
○ Logging (High throughput)
○ Time Sensitive (Low Latency e.g. Surge, Push
notifications)
○ High Value Data (At-least once, Sync e.g. Payments)
● Secondary cluster as fallback
● Aggregate clusters for all data topics.
Kafka Clusters
● Scale to 100s Billions/day → 1 Trillion/day
● High Throughput ( Scale: 100s TB → PB)
● Low Latency for most use cases(<5ms )
● Reliability - 99.99% ( #Msgs Available /#Msgs Produced)
● Multi-Language Support
● Tens of thousands of simultaneous clients.
● Reliable data replication across DC
Kafka Rest Proxy
Local
Agent
uReplicator
Why Kafka Rest Proxy ?
● Simplified Client API
● Multi-lang support (Java, NodeJs, Python, Golang)
● Decouple client from Kafka broker
○ Thin clients = operational ease
○ Less connections to Kafka brokers
○ Future kafka upgrade
● Enhanced Reliability
○ Primary & Secondary Kafka Clusters
Kafka Rest Proxy: Internals
Kafka Rest Proxy: Internals
Kafka Rest Proxy: Internals
● Based on Confluent’s open sourced Rest Proxy
● Performance enhancements
○ Simple http servlets on jetty instead of Jersey
○ Optimized for binary payloads.
○ Performance increase from 7K* to 45-50K QPS/box
● Caching of topic metadata.
● Reliability improvements*
○ Support for Fallback cluster
○ Support for multiple Producers (SLA based segregation)
● Plan to contribute back to community
*Based on benchmarking & analysis done in Jun ’2015
Rest Proxy: performance (1 box)
Message rate (K/second) at single node
End-endLatency(ms)
Kafka Clusters + Rest Proxy
● Scale to 100s Billions/day → 1 Trillion/day
● High Throughput ( Scale: 100s TB → PB)
● Low Latency for most use cases(<5ms )
● Reliability - 99.99% ( #Msgs Available /#Msgs Produced)
● Multi-Language Support
● Tens of thousands of simultaneous clients.
● Reliable data replication across DC
Kafka Clients
Local
Agent
uReplicator
Client Libraries
● Support for multiple clusters.
● High Throughput
○ Non-blocking, async, batching
○ <1ms produce latency for clients
○ Handles Throttling/BackOff signals from Rest Proxy
● Topic Discovery
○ Discovers the kafka cluster a topic belongs
○ Able to multiplex to different kafka clusters
● Integration with Local Agent for critical data
Client Libraries
Add
Figure
What if there is
network glitch /
outage?
Client Libraries
Add
Figure
Kafka Clusters + Rest Proxy + Clients
● Scale to 100s Billions/day → 1 Trillion/day
● High Throughput ( Scale: 100s TB → PB)
● Low Latency for most use cases(<5ms )
● Reliability - 99.99% ( #Msgs Available /#Msgs Produced)
● Multi-Language Support
● Tens of thousands of simultaneous clients.
● Reliable data replication across DC
Local Agent
Local
Agent
uReplicator
Local Agent
● Local spooling in case of downstream outage/backpressure
● Backfills at the controlled rate to avoid hammering
infrastructure recovering from outage
● Implementation:
○ Reuses code from rest-proxy and kafka’s log module.
○ Appends all topics to same file for high throughput.
Local Agent Architecture
Add
Figure
Local Agent in Action
Add
Figure
Kafka Clusters + Rest Proxy + Clients + Local Agent
● Scale to 100s Billions/day → 1 Trillion/day
● High Throughput ( Scale: 100s TB → PB)
● Low Latency for most use cases(<5ms )
● Reliability - 99.99% ( #Msgs Available /#Msgs Produced)
● Multi-Language Support
● Tens of thousands of simultaneous clients.
● Reliable data replication across DC
uReplicator
Local
Agent
uReplicator
Multi-DC data flow
CONFIDENTIAL
>> INSERT SCREENSHOT HERE <<
Mirrormaker : existing problems
● New Topic added
● New partitions added
● Mirrormaker bounced
● New mirrormaker added
uReplicator: In-house solution
Zookeeper
Helix MM
Controller
Helix
Agent
Thread 1
Thread N
Topic-partition
Helix
Agent
Thread 1
Thread N
Topic-partition
Helix
Agent
Thread 1
Thread N
Topic-partition
MM worker1 MM worker2 MM worker3
uReplicator
Zookeeper
Helix MM
Controller
Helix
Agent
Thread 1
Thread N
Topic-partition
Helix
Agent
Thread 1
Thread N
Topic-partition
Helix
Agent
Thread 1
Thread N
Topic-partition
MM worker1 MM worker2 MM worker3
Kafka Clusters + Rest Proxy + Clients + Local Agent
● Scale to 100s Billions/day → 1 Trillion/day
● High Throughput ( Scale: 100s TB → PB)
● Low Latency for most use cases(<5ms )
● Reliability - 99.99% ( #Msgs Available /#Msgs Produced)
● Multi-Language Support
● Tens of thousands of simultaneous clients.
● Reliable data replication across DC
uReplicator
● Running in production for 1+ year
● Open sourced: https://github.com/uber/uReplicator
● Blog: https://eng.uber.com/ureplicator/
Chaperone - E2E Auditing
Chaperone Architecture
CONFIDENTIAL
>> INSERT SCREENSHOT HERE <<
Chaperone : Track counts
CONFIDENTIAL
>> INSERT SCREENSHOT HERE <<
Chaperone : Track Latency
Chaperone
● Running in production for 1+ year
● Planning to open source in ~2 Weeks
At-least Once Kafka
Why do we need it?
1
2
3 5 7
64 8
● Most of infrastructure tuned for high throughput
○ Batching at each stage
○ Ack before produce (ack’ed != committed)
● Single node failure in any stage leads to data loss
● Need a reliable pipeline for High Value Data e.g. Payments
How did we achieve it?
● Brokers:
○ min.insync.replicas=2, can only torrent one node failure
○ unclean.leader.election= false, need to wait until the old
leader comes back
● Rest Proxy:
○ Partition Failover
● Improved Operations:
○ Replication throttling, to reduce impact of node bootstrap
○ Prevent catching up nodes to become ISR
Operations/Tooling
Partition Rebalancing
Add
Figure
Partition Rebalancing
● Calculates partition
imbalance and inter-broker
dependency.
● Generates & Executes
Rebalance Plan.
● Rebalance plans are
incremental, can be stopped
and resumed.
● Currently on-demand,
Automated in the future.
XFS vs EXT4
Add
Figure
Summary: Scale
● Kafka Brokers:
○ Multiple Clusters per DC
○ Use case based tuning
● Rest Proxy to reduce connections and better batching
● Rest Proxy & Clients
○ Batch everywhere, Async produce
○ Replace Jersey with Jetty
● XFS
Summary: Reliability
● Local Agent
● Secondary Clusters
● Multi Producer support in Rest Proxy
● uReplicator
● Auditing via Chaperone
Future Work
● Open source contribution
○ Chaperone
○ Toolkit
● Data Lineage
● Active Active Kafka
● Chargeback
● Exactly once mirroring via uReplicator
Questions ?
ankur@uber.com
Extra Slides
Kafka Durability (acks=1)
Broker 1
100
101
102
103
Broker 2
100
101
Broker 3
100
101
Leader
Committed
Producer
Acked
Kafka Durability (acks=1)
Broker 1
100
101
102
103
Broker 2
100
101
Broker 3
100
101
Leader
Committed
Producer
Failed
Acked
Kafka Durability (acks=1)
Broker 1
100
101
102
103
Broker 2
100
101
Broker 3
100
101
Leader
Committed
Producer
Kafka Durability (acks=1)
Broker 1
100
101
102
103
Broker 2
100
101
104
105
106
Broker 3
100
101
104
105
Leader
Committed
Producer
Old HW
Kafka Durability (acks=1)
Broker 1
100
101
102
103
Broker 2
100
101
104
105
106
Broker 3
100
101
104
105
Leader
Committed
Producer
X
Old HW
X
Kafka Durability (acks=1)
Broker 1
100
101
104
105
106
Broker 2
100
101
104
105
106
Broker 3
100
101
105
106
Leader
Committed
Producer
data loss!!
Distributed Messaging system
* Supported in Kafka 0.8+
● High throughput
● Low latency
● Scalable
● Centralized
● Real-time
What is Kafka?
● Distributed
● Partitioned
● Replicated
● Commit Log
Broker 1 Broker 2 Broker 3
ZooKeeper
What is Kafka?
● Distributed
● Partitioned
● Replicated
● Commit Log
Broker 1
Partition 0
Broker 2
Partition 1
Broker 3
Partition 2
ZooKeeper
What is Kafka?
● Distributed
● Partitioned
● Replicated
● Commit Log
Broker 1
Partition 0
Partition 2
Broker 2
Partition 1
Partition 0
Broker 3
Partition 2
Partition 1
ZooKeeper
What is Kafka?
● Distributed
● Partitioned
● Replicated
● Commit Log
Broker 1
Partition 0
0 1 2 3
Partition 2
0 1 2 3
Broker 2
Partition 1
0 1 2 3
Partition 0
0 1 2 3
Broker 3
Partition 2
0 1 2 3
Partition 1
0 1 2 3
ZooKeeper
Kafka Concepts
Ad

More Related Content

What's hot (20)

Circuit Breaker Pattern
Circuit Breaker PatternCircuit Breaker Pattern
Circuit Breaker Pattern
Vikash Kodati
 
Integrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your EnvironmentIntegrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your Environment
confluent
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
Clement Demonchy
 
Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafka
confluent
 
Black and Blue APIs: Attacker's and Defender's View of API Vulnerabilities
Black and Blue APIs: Attacker's and Defender's View of API VulnerabilitiesBlack and Blue APIs: Attacker's and Defender's View of API Vulnerabilities
Black and Blue APIs: Attacker's and Defender's View of API Vulnerabilities
Matt Tesauro
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
HostedbyConfluent
 
Real Time Test Data with Grafana
Real Time Test Data with GrafanaReal Time Test Data with Grafana
Real Time Test Data with Grafana
Ioannis Papadakis
 
Appdynamics Training Session
Appdynamics Training SessionAppdynamics Training Session
Appdynamics Training Session
CodvaTech Labs
 
Grafana introduction
Grafana introductionGrafana introduction
Grafana introduction
Rico Chen
 
Log analysis using elk
Log analysis using elkLog analysis using elk
Log analysis using elk
Rushika Shah
 
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
DataWorks Summit/Hadoop Summit
 
Microservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka EcosystemMicroservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka Ecosystem
confluent
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Kumar Shivam
 
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
confluent
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe Crobak
Hakka Labs
 
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaReal-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Kai Wähner
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
Knoldus Inc.
 
Kafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformKafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platform
Jean-Paul Azar
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
Discover Pinterest
 
Circuit Breaker Pattern
Circuit Breaker PatternCircuit Breaker Pattern
Circuit Breaker Pattern
Vikash Kodati
 
Integrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your EnvironmentIntegrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your Environment
confluent
 
Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafka
confluent
 
Black and Blue APIs: Attacker's and Defender's View of API Vulnerabilities
Black and Blue APIs: Attacker's and Defender's View of API VulnerabilitiesBlack and Blue APIs: Attacker's and Defender's View of API Vulnerabilities
Black and Blue APIs: Attacker's and Defender's View of API Vulnerabilities
Matt Tesauro
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
HostedbyConfluent
 
Real Time Test Data with Grafana
Real Time Test Data with GrafanaReal Time Test Data with Grafana
Real Time Test Data with Grafana
Ioannis Papadakis
 
Appdynamics Training Session
Appdynamics Training SessionAppdynamics Training Session
Appdynamics Training Session
CodvaTech Labs
 
Grafana introduction
Grafana introductionGrafana introduction
Grafana introduction
Rico Chen
 
Log analysis using elk
Log analysis using elkLog analysis using elk
Log analysis using elk
Rushika Shah
 
Microservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka EcosystemMicroservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka Ecosystem
confluent
 
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
confluent
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe Crobak
Hakka Labs
 
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaReal-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Kai Wähner
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
Knoldus Inc.
 
Kafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformKafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platform
Jean-Paul Azar
 

Viewers also liked (20)

Spark Meetup at Uber
Spark Meetup at UberSpark Meetup at Uber
Spark Meetup at Uber
Databricks
 
ML and Data Science at Uber - GITPro talk 2017
ML and Data Science at Uber - GITPro talk 2017ML and Data Science at Uber - GITPro talk 2017
ML and Data Science at Uber - GITPro talk 2017
Sudhir Tonse
 
Stream Computing & Analytics at Uber
Stream Computing & Analytics at UberStream Computing & Analytics at Uber
Stream Computing & Analytics at Uber
Sudhir Tonse
 
Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived
Vinoth Chandar
 
Uber's Business Model
Uber's Business ModelUber's Business Model
Uber's Business Model
Jeffrey Funk Business Models
 
Uber Analytics Test
Uber Analytics TestUber Analytics Test
Uber Analytics Test
Coursetake
 
Metrics at Scale @ UBER (Mantas Klasavicius Technology Stream)
Metrics at Scale @ UBER (Mantas Klasavicius Technology Stream)Metrics at Scale @ UBER (Mantas Klasavicius Technology Stream)
Metrics at Scale @ UBER (Mantas Klasavicius Technology Stream)
IT Arena
 
Go and Uber’s time series database m3
Go and Uber’s time series database m3Go and Uber’s time series database m3
Go and Uber’s time series database m3
Rob Skillington
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
Jen Aman
 
Expliseat - World's Lightest Seats
Expliseat - World's Lightest SeatsExpliseat - World's Lightest Seats
Expliseat - World's Lightest Seats
Jeffrey Funk Business Models
 
Biz Models for High-Tech Products
Biz Models for High-Tech ProductsBiz Models for High-Tech Products
Biz Models for High-Tech Products
Jeffrey Funk Business Models
 
Garena Online
Garena OnlineGarena Online
Garena Online
Jeffrey Funk Business Models
 
Beyond Flipped Classrooms and MOOCs: The future of engineering and management...
Beyond Flipped Classrooms and MOOCs: The future of engineering and management...Beyond Flipped Classrooms and MOOCs: The future of engineering and management...
Beyond Flipped Classrooms and MOOCs: The future of engineering and management...
Jeffrey Funk Business Models
 
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron SchildkroutKafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
confluent
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data Governance
Hortonworks
 
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
Hortonworks
 
Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series Analysis
QAware GmbH
 
Product management
Product managementProduct management
Product management
Gaganpreet Singh Shah
 
Using NCC Group Web Performance Data Creatively
Using NCC Group Web Performance Data CreativelyUsing NCC Group Web Performance Data Creatively
Using NCC Group Web Performance Data Creatively
Gareth Hughes
 
Pro_Tools_Tier_2
Pro_Tools_Tier_2Pro_Tools_Tier_2
Pro_Tools_Tier_2
Mark Brisbane
 
Spark Meetup at Uber
Spark Meetup at UberSpark Meetup at Uber
Spark Meetup at Uber
Databricks
 
ML and Data Science at Uber - GITPro talk 2017
ML and Data Science at Uber - GITPro talk 2017ML and Data Science at Uber - GITPro talk 2017
ML and Data Science at Uber - GITPro talk 2017
Sudhir Tonse
 
Stream Computing & Analytics at Uber
Stream Computing & Analytics at UberStream Computing & Analytics at Uber
Stream Computing & Analytics at Uber
Sudhir Tonse
 
Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived
Vinoth Chandar
 
Uber Analytics Test
Uber Analytics TestUber Analytics Test
Uber Analytics Test
Coursetake
 
Metrics at Scale @ UBER (Mantas Klasavicius Technology Stream)
Metrics at Scale @ UBER (Mantas Klasavicius Technology Stream)Metrics at Scale @ UBER (Mantas Klasavicius Technology Stream)
Metrics at Scale @ UBER (Mantas Klasavicius Technology Stream)
IT Arena
 
Go and Uber’s time series database m3
Go and Uber’s time series database m3Go and Uber’s time series database m3
Go and Uber’s time series database m3
Rob Skillington
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
Jen Aman
 
Beyond Flipped Classrooms and MOOCs: The future of engineering and management...
Beyond Flipped Classrooms and MOOCs: The future of engineering and management...Beyond Flipped Classrooms and MOOCs: The future of engineering and management...
Beyond Flipped Classrooms and MOOCs: The future of engineering and management...
Jeffrey Funk Business Models
 
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron SchildkroutKafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
confluent
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data Governance
Hortonworks
 
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
Hortonworks
 
Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series Analysis
QAware GmbH
 
Using NCC Group Web Performance Data Creatively
Using NCC Group Web Performance Data CreativelyUsing NCC Group Web Performance Data Creatively
Using NCC Group Web Performance Data Creatively
Gareth Hughes
 
Ad

Similar to Uber Real Time Data Analytics (20)

Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Monal Daxini
 
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per DayHadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Ankur Bansal
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per day
DataWorks Summit
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipeline
Monal Daxini
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Monal Daxini
 
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetupKafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Mingmin Chen
 
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/SecNetflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Peter Bakas
 
BDX 2016- Monal daxini @ Netflix
BDX 2016-  Monal daxini  @ NetflixBDX 2016-  Monal daxini  @ Netflix
BDX 2016- Monal daxini @ Netflix
Ido Shilon
 
Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016
Peter Bakas
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
Allen (Xiaozhong) Wang
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
Steven Wu
 
Monal Daxini - Beaming Flink to the Cloud @ Netflix
Monal Daxini - Beaming Flink to the Cloud @ NetflixMonal Daxini - Beaming Flink to the Cloud @ Netflix
Monal Daxini - Beaming Flink to the Cloud @ Netflix
Flink Forward
 
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
Beaming flink to the cloud @ netflix   ff 2016-monal-daxiniBeaming flink to the cloud @ netflix   ff 2016-monal-daxini
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
Monal Daxini
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
aspyker
 
Twitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
Twitter’s Apache Kafka Adoption Journey | Ming Liu, TwitterTwitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
Twitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
HostedbyConfluent
 
Insta clustr seattle kafka meetup presentation bb
Insta clustr seattle kafka meetup presentation   bbInsta clustr seattle kafka meetup presentation   bb
Insta clustr seattle kafka meetup presentation bb
Nitin Kumar
 
A Tour of Apache Kafka
A Tour of Apache KafkaA Tour of Apache Kafka
A Tour of Apache Kafka
confluent
 
From Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka JourneyFrom Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka Journey
Allen (Xiaozhong) Wang
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
Samuel Kerrien
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
confluent
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Monal Daxini
 
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per DayHadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Ankur Bansal
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per day
DataWorks Summit
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipeline
Monal Daxini
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Monal Daxini
 
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetupKafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Mingmin Chen
 
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/SecNetflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Peter Bakas
 
BDX 2016- Monal daxini @ Netflix
BDX 2016-  Monal daxini  @ NetflixBDX 2016-  Monal daxini  @ Netflix
BDX 2016- Monal daxini @ Netflix
Ido Shilon
 
Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016
Peter Bakas
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
Steven Wu
 
Monal Daxini - Beaming Flink to the Cloud @ Netflix
Monal Daxini - Beaming Flink to the Cloud @ NetflixMonal Daxini - Beaming Flink to the Cloud @ Netflix
Monal Daxini - Beaming Flink to the Cloud @ Netflix
Flink Forward
 
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
Beaming flink to the cloud @ netflix   ff 2016-monal-daxiniBeaming flink to the cloud @ netflix   ff 2016-monal-daxini
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
Monal Daxini
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
aspyker
 
Twitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
Twitter’s Apache Kafka Adoption Journey | Ming Liu, TwitterTwitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
Twitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
HostedbyConfluent
 
Insta clustr seattle kafka meetup presentation bb
Insta clustr seattle kafka meetup presentation   bbInsta clustr seattle kafka meetup presentation   bb
Insta clustr seattle kafka meetup presentation bb
Nitin Kumar
 
A Tour of Apache Kafka
A Tour of Apache KafkaA Tour of Apache Kafka
A Tour of Apache Kafka
confluent
 
From Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka JourneyFrom Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka Journey
Allen (Xiaozhong) Wang
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
Samuel Kerrien
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
confluent
 
Ad

Recently uploaded (20)

International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)
samueljackson3773
 
Value Stream Mapping Worskshops for Intelligent Continuous Security
Value Stream Mapping Worskshops for Intelligent Continuous SecurityValue Stream Mapping Worskshops for Intelligent Continuous Security
Value Stream Mapping Worskshops for Intelligent Continuous Security
Marc Hornbeek
 
AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)
Vəhid Gəruslu
 
Reagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptxReagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptx
AlejandroOdio
 
IntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdfIntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdf
Luiz Carneiro
 
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Journal of Soft Computing in Civil Engineering
 
theory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptxtheory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptx
sanchezvanessa7896
 
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G..."Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
Infopitaara
 
DSP and MV the Color image processing.ppt
DSP and MV the  Color image processing.pptDSP and MV the  Color image processing.ppt
DSP and MV the Color image processing.ppt
HafizAhamed8
 
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITYADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ijscai
 
Avnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights FlyerAvnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights Flyer
WillDavies22
 
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
charlesdick1345
 
Smart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineeringSmart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineering
rushikeshnavghare94
 
Compiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptxCompiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptx
RushaliDeshmukh2
 
Degree_of_Automation.pdf for Instrumentation and industrial specialist
Degree_of_Automation.pdf for  Instrumentation  and industrial specialistDegree_of_Automation.pdf for  Instrumentation  and industrial specialist
Degree_of_Automation.pdf for Instrumentation and industrial specialist
shreyabhosale19
 
Data Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptxData Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptx
RushaliDeshmukh2
 
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdfMAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
ssuser562df4
 
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptxExplainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
MahaveerVPandit
 
π0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalizationπ0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalization
NABLAS株式会社
 
Metal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistryMetal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistry
mee23nu
 
International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)
samueljackson3773
 
Value Stream Mapping Worskshops for Intelligent Continuous Security
Value Stream Mapping Worskshops for Intelligent Continuous SecurityValue Stream Mapping Worskshops for Intelligent Continuous Security
Value Stream Mapping Worskshops for Intelligent Continuous Security
Marc Hornbeek
 
AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)
Vəhid Gəruslu
 
Reagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptxReagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptx
AlejandroOdio
 
IntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdfIntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdf
Luiz Carneiro
 
theory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptxtheory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptx
sanchezvanessa7896
 
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G..."Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
Infopitaara
 
DSP and MV the Color image processing.ppt
DSP and MV the  Color image processing.pptDSP and MV the  Color image processing.ppt
DSP and MV the Color image processing.ppt
HafizAhamed8
 
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITYADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ijscai
 
Avnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights FlyerAvnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights Flyer
WillDavies22
 
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
charlesdick1345
 
Smart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineeringSmart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineering
rushikeshnavghare94
 
Compiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptxCompiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptx
RushaliDeshmukh2
 
Degree_of_Automation.pdf for Instrumentation and industrial specialist
Degree_of_Automation.pdf for  Instrumentation  and industrial specialistDegree_of_Automation.pdf for  Instrumentation  and industrial specialist
Degree_of_Automation.pdf for Instrumentation and industrial specialist
shreyabhosale19
 
Data Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptxData Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptx
RushaliDeshmukh2
 
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdfMAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
ssuser562df4
 
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptxExplainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
MahaveerVPandit
 
π0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalizationπ0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalization
NABLAS株式会社
 
Metal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistryMetal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistry
mee23nu
 

Uber Real Time Data Analytics

  • 1. Real Time Data Analytics @ UberAnkur Bansal Apache Big Data Europe November 14, 2016
  • 2. About Me ● Sr. Software Engineer, Streaming Team @ Uber ○ Streaming team supports platform for real time data analytics: Kafka, Samza, Flink, Pinot.. and plenty more ○ Focused on scaling Kafka at Uber’s pace ● Staff software Engineer @ Ebay ○ Build & scale Ebay’s cloud using openstack ● Apache Kylin: Committer, Emeritus PMC
  • 3. Agenda ● Real time Use Cases ● Kafka Infrastructure Deep Dive ● Our own Development: ○ Rest Proxy & Clients ○ Local Agent ○ uReplicator (Mirrormaker) ○ Chaperone (Auditing) ● Operations/Tooling
  • 6. Real-time Machine Learning - UberEats ETD
  • 8. ● Fraud detection ● Share my ETA And many more ...
  • 9. Apache Kafka is Uber’s Lifeline
  • 11. 100s of billion 100s TB Messages/day bytes/day Kafka cluster stats Multiple data centers
  • 13. Requirements ● Scale to 100s Billions/day → 1 Trillion/day ● High Throughput ( Scale: 100s TB → PB) ● Low Latency for most use cases(<5ms ) ● Reliability - 99.99% ( #Msgs Available /#Msgs Produced) ● Multi-Language Support ● Tens of thousands of simultaneous clients. ● Reliable data replication across DC
  • 15. Kafka Pipeline: Data Flow 1 2 3 5 7 64 8
  • 17. Kafka Clusters ● Use case based clusters ○ Data (async, reliable) ○ Logging (High throughput) ○ Time Sensitive (Low Latency e.g. Surge, Push notifications) ○ High Value Data (At-least once, Sync e.g. Payments) ● Secondary cluster as fallback ● Aggregate clusters for all data topics.
  • 18. Kafka Clusters ● Scale to 100s Billions/day → 1 Trillion/day ● High Throughput ( Scale: 100s TB → PB) ● Low Latency for most use cases(<5ms ) ● Reliability - 99.99% ( #Msgs Available /#Msgs Produced) ● Multi-Language Support ● Tens of thousands of simultaneous clients. ● Reliable data replication across DC
  • 20. Why Kafka Rest Proxy ? ● Simplified Client API ● Multi-lang support (Java, NodeJs, Python, Golang) ● Decouple client from Kafka broker ○ Thin clients = operational ease ○ Less connections to Kafka brokers ○ Future kafka upgrade ● Enhanced Reliability ○ Primary & Secondary Kafka Clusters
  • 21. Kafka Rest Proxy: Internals
  • 22. Kafka Rest Proxy: Internals
  • 23. Kafka Rest Proxy: Internals ● Based on Confluent’s open sourced Rest Proxy ● Performance enhancements ○ Simple http servlets on jetty instead of Jersey ○ Optimized for binary payloads. ○ Performance increase from 7K* to 45-50K QPS/box ● Caching of topic metadata. ● Reliability improvements* ○ Support for Fallback cluster ○ Support for multiple Producers (SLA based segregation) ● Plan to contribute back to community *Based on benchmarking & analysis done in Jun ’2015
  • 24. Rest Proxy: performance (1 box) Message rate (K/second) at single node End-endLatency(ms)
  • 25. Kafka Clusters + Rest Proxy ● Scale to 100s Billions/day → 1 Trillion/day ● High Throughput ( Scale: 100s TB → PB) ● Low Latency for most use cases(<5ms ) ● Reliability - 99.99% ( #Msgs Available /#Msgs Produced) ● Multi-Language Support ● Tens of thousands of simultaneous clients. ● Reliable data replication across DC
  • 27. Client Libraries ● Support for multiple clusters. ● High Throughput ○ Non-blocking, async, batching ○ <1ms produce latency for clients ○ Handles Throttling/BackOff signals from Rest Proxy ● Topic Discovery ○ Discovers the kafka cluster a topic belongs ○ Able to multiplex to different kafka clusters ● Integration with Local Agent for critical data
  • 28. Client Libraries Add Figure What if there is network glitch / outage?
  • 30. Kafka Clusters + Rest Proxy + Clients ● Scale to 100s Billions/day → 1 Trillion/day ● High Throughput ( Scale: 100s TB → PB) ● Low Latency for most use cases(<5ms ) ● Reliability - 99.99% ( #Msgs Available /#Msgs Produced) ● Multi-Language Support ● Tens of thousands of simultaneous clients. ● Reliable data replication across DC
  • 32. Local Agent ● Local spooling in case of downstream outage/backpressure ● Backfills at the controlled rate to avoid hammering infrastructure recovering from outage ● Implementation: ○ Reuses code from rest-proxy and kafka’s log module. ○ Appends all topics to same file for high throughput.
  • 34. Local Agent in Action Add Figure
  • 35. Kafka Clusters + Rest Proxy + Clients + Local Agent ● Scale to 100s Billions/day → 1 Trillion/day ● High Throughput ( Scale: 100s TB → PB) ● Low Latency for most use cases(<5ms ) ● Reliability - 99.99% ( #Msgs Available /#Msgs Produced) ● Multi-Language Support ● Tens of thousands of simultaneous clients. ● Reliable data replication across DC
  • 38. CONFIDENTIAL >> INSERT SCREENSHOT HERE << Mirrormaker : existing problems ● New Topic added ● New partitions added ● Mirrormaker bounced ● New mirrormaker added
  • 39. uReplicator: In-house solution Zookeeper Helix MM Controller Helix Agent Thread 1 Thread N Topic-partition Helix Agent Thread 1 Thread N Topic-partition Helix Agent Thread 1 Thread N Topic-partition MM worker1 MM worker2 MM worker3
  • 40. uReplicator Zookeeper Helix MM Controller Helix Agent Thread 1 Thread N Topic-partition Helix Agent Thread 1 Thread N Topic-partition Helix Agent Thread 1 Thread N Topic-partition MM worker1 MM worker2 MM worker3
  • 41. Kafka Clusters + Rest Proxy + Clients + Local Agent ● Scale to 100s Billions/day → 1 Trillion/day ● High Throughput ( Scale: 100s TB → PB) ● Low Latency for most use cases(<5ms ) ● Reliability - 99.99% ( #Msgs Available /#Msgs Produced) ● Multi-Language Support ● Tens of thousands of simultaneous clients. ● Reliable data replication across DC
  • 42. uReplicator ● Running in production for 1+ year ● Open sourced: https://github.com/uber/uReplicator ● Blog: https://eng.uber.com/ureplicator/
  • 43. Chaperone - E2E Auditing
  • 45. CONFIDENTIAL >> INSERT SCREENSHOT HERE << Chaperone : Track counts
  • 46. CONFIDENTIAL >> INSERT SCREENSHOT HERE << Chaperone : Track Latency
  • 47. Chaperone ● Running in production for 1+ year ● Planning to open source in ~2 Weeks
  • 49. Why do we need it? 1 2 3 5 7 64 8 ● Most of infrastructure tuned for high throughput ○ Batching at each stage ○ Ack before produce (ack’ed != committed) ● Single node failure in any stage leads to data loss ● Need a reliable pipeline for High Value Data e.g. Payments
  • 50. How did we achieve it? ● Brokers: ○ min.insync.replicas=2, can only torrent one node failure ○ unclean.leader.election= false, need to wait until the old leader comes back ● Rest Proxy: ○ Partition Failover ● Improved Operations: ○ Replication throttling, to reduce impact of node bootstrap ○ Prevent catching up nodes to become ISR
  • 53. Partition Rebalancing ● Calculates partition imbalance and inter-broker dependency. ● Generates & Executes Rebalance Plan. ● Rebalance plans are incremental, can be stopped and resumed. ● Currently on-demand, Automated in the future.
  • 55. Summary: Scale ● Kafka Brokers: ○ Multiple Clusters per DC ○ Use case based tuning ● Rest Proxy to reduce connections and better batching ● Rest Proxy & Clients ○ Batch everywhere, Async produce ○ Replace Jersey with Jetty ● XFS
  • 56. Summary: Reliability ● Local Agent ● Secondary Clusters ● Multi Producer support in Rest Proxy ● uReplicator ● Auditing via Chaperone
  • 57. Future Work ● Open source contribution ○ Chaperone ○ Toolkit ● Data Lineage ● Active Active Kafka ● Chargeback ● Exactly once mirroring via uReplicator
  • 60. Kafka Durability (acks=1) Broker 1 100 101 102 103 Broker 2 100 101 Broker 3 100 101 Leader Committed Producer Acked
  • 61. Kafka Durability (acks=1) Broker 1 100 101 102 103 Broker 2 100 101 Broker 3 100 101 Leader Committed Producer Failed Acked
  • 62. Kafka Durability (acks=1) Broker 1 100 101 102 103 Broker 2 100 101 Broker 3 100 101 Leader Committed Producer
  • 63. Kafka Durability (acks=1) Broker 1 100 101 102 103 Broker 2 100 101 104 105 106 Broker 3 100 101 104 105 Leader Committed Producer Old HW
  • 64. Kafka Durability (acks=1) Broker 1 100 101 102 103 Broker 2 100 101 104 105 106 Broker 3 100 101 104 105 Leader Committed Producer X Old HW X
  • 65. Kafka Durability (acks=1) Broker 1 100 101 104 105 106 Broker 2 100 101 104 105 106 Broker 3 100 101 105 106 Leader Committed Producer data loss!!
  • 66. Distributed Messaging system * Supported in Kafka 0.8+ ● High throughput ● Low latency ● Scalable ● Centralized ● Real-time
  • 67. What is Kafka? ● Distributed ● Partitioned ● Replicated ● Commit Log Broker 1 Broker 2 Broker 3 ZooKeeper
  • 68. What is Kafka? ● Distributed ● Partitioned ● Replicated ● Commit Log Broker 1 Partition 0 Broker 2 Partition 1 Broker 3 Partition 2 ZooKeeper
  • 69. What is Kafka? ● Distributed ● Partitioned ● Replicated ● Commit Log Broker 1 Partition 0 Partition 2 Broker 2 Partition 1 Partition 0 Broker 3 Partition 2 Partition 1 ZooKeeper
  • 70. What is Kafka? ● Distributed ● Partitioned ● Replicated ● Commit Log Broker 1 Partition 0 0 1 2 3 Partition 2 0 1 2 3 Broker 2 Partition 1 0 1 2 3 Partition 0 0 1 2 3 Broker 3 Partition 2 0 1 2 3 Partition 1 0 1 2 3 ZooKeeper