SlideShare a Scribd company logo
Siphon - Kafka as DataBus in Microsoft
Nitin Kumar (nitin.kumar@Microsoft.com)
Dev Manager, Microsoft
https://www.linkedin.com/in/nikuma
Agenda
• Scale: Kafka at Microsoft (Bing, Ads, Office)
• Use Case: NRT Customer facing reports
• Kafka based Streaming Solution
• Collector
• Consumer Restful APIs
• Monitoring: Canary/Audit Trail
• Production Experience
• Key Takeaways
Wednesday, March 16, 2016
Scale: Kafka at Microsoft (Ads, Bing, Office)
Kafka Brokers 1000+ across 5 Datacenters
Operating System Windows Server 2012 R2
Hardware Spec 12 Cores, 32 GB RAM, 4x2 TB HDD (JBOD), 10 GB Network
Incoming Events 1 million per sec, (90 Billion per day, 100 TB per day)
Outgoing Events 5 million per sec, (1 Trillion per day, 500 TB per day)
Kafka Topics/Partitions 50+/5000+
Kafka version 0.8.1.1 (3 way replication)
Wednesday, March 16, 2016
Problem
Wednesday, March 16, 2016
Serving System{Q}
{R}
Online Fraud
Detection
ML
Classification
Aggregation Reporting DB
Keyword
1.5 hours 2.5 hours
Advertiser
Feature
Extraction
300
GB/h
200+
Features
Stats
25 TB
Log
Collection
Sorting /
Partitioning
What is the click through rate of my ad, that launched at 5pm?
Goals / Design Considerations
Wednesday, March 16, 2016
Reduce latency from 4 hours to 15 minutes
99.8% Log completeness Guarantees
Check pointing & Failure recovery
Exactly Once Semantic
Highly Available, Scalable and rolling upgrade
Reusing Existing C# Libraries
Siphon DataBus
Solution
{Q}
{R}
Kafka Audit
ML
Classification
Aggregation Reporting
DB
Keyword
1-2 sec (Minimize latency) < 15 minutes
Advertiser
Feature
Extraction
100
MBPS
200+
Features
Stats
25 TB
Wednesday, March 16, 2016
Serving System
Online Fraud
Detection
Kafka as a distributed Queue
StreamScope as a distributed processing system
StreamScope
Siphon
Wednesday, March 16, 2016
Asia DC
Zookeeper Canary
Kafka
Collector
Agent
Services Data Pull (Agent)
Services Data Push
Device Proxy Services
Consumer
API (Push/
Pull)
Europe DC
Zookeeper Canary
Kafka
US DC
Zookeeper Canary
Kafka
Streaming
Batch
Audit Trail
Open Source
Microsoft Internal
Siphon
Collector – Data Ingestion (Producers)
• Http(s) Server
• Restful API with SSL support.
• Abstraction from Kafka
internals (Partition, Kafka version)
• Throttling, QPS Monitoring
• PII scrubbing
• Load balancing/failover
Device Proxy Services
Collector
Kafka Brokers
Broker
Broker
Broker
Broker
P0
P1
P2
P3
P4
P5
P6
P7
P8
P9
P10
P11
Collector
Collector
LoadBalancer
Services Data Push
Agent
Services Data Pull (Agent)
Wednesday, March 16, 2016
Open Source
Microsoft Internal
Siphon
Consumer API (Push/Pull)
• Restful Pull API – Simple consumer
• Config driven subscriptions for preconfigured sinks like (HDFS, Cosmos, ELK).
Wednesday, March 16, 2016
Config (ZK)
Executor
Kafka .NET
Library
Kafka
Supported destinations –
• Cosmos
• Elastic Search
• Kafka
• HDFS
High Level
Consumer
Monitoring using Canary, Audit Trail
Device Proxy Services
Collector
Kafka Brokers
Broker
Broker
Broker
Broker
P0
P1
P2
P3
P4
P5
P6
P7
P8
P9
P10
P11
Collector
Collector
LoadBalancer
Services Data Push
Agent
Services Data Pull (Agent)
Wednesday, March 16, 2016
Synthetic
message
Audit Trail
Production Experience
• System in production for 15 months
• End to End Advertiser report latency of 12+ minutes.
• Other use cases from Office, Bing.
• Integration with other streaming systems – Storm, Spark.
• Monitoring using ELK
Wednesday, March 16, 2016
Key Takeaways
• Scale out with Kafka (50K -> 1M -> multi-million Events Per sec)
• Ability to build tunable Auditing/Monitoring
• Producer/Consumer Restful API provides a nice abstraction
• Config driven Pub/Sub system
Wednesday, March 16, 2016
“We are Hiring.”
Thank You
Nitin Kumar (nitin.kumar@Microsoft.com)
https://www.linkedin.com/in/nikuma
Wednesday, March 16, 2016

More Related Content

What's hot (20)

PDF
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Flink Forward
 
PDF
Maximilian Michels - Flink and Beam
Flink Forward
 
PDF
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Ankur Bansal
 
PDF
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
confluent
 
PDF
Kafka Summit SF 2017 - Keynote - Go Against the Flow: Databases and Stream Pr...
confluent
 
PDF
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
HostedbyConfluent
 
PDF
Uber Real Time Data Analytics
Ankur Bansal
 
PDF
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
confluent
 
PDF
Kafka Summit NYC 2017 - Every Message Counts: Kafka as a Foundation for Highl...
confluent
 
PDF
Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apac...
Flink Forward
 
PDF
Kafka Summit NYC 2017 - Stream it Together: 3 Realities of Modern Programming
confluent
 
PDF
Putting the Micro into Microservices with Stateful Stream Processing
confluent
 
PDF
Matching the Scale at Tinder with Kafka
confluent
 
PDF
Kafka Summit SF 2017 - Database Streaming at WePay
confluent
 
PPTX
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
Michael Noll
 
PDF
Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...
confluent
 
PDF
Kafka Summit SF 2017 - Fast Data in Supply Chain Planning
confluent
 
PDF
You Must Construct Additional Pipelines: Pub-Sub on Kafka at Blizzard
confluent
 
PDF
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
confluent
 
PPTX
Taking a look under the hood of Apache Flink's relational APIs.
Fabian Hueske
 
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Flink Forward
 
Maximilian Michels - Flink and Beam
Flink Forward
 
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Ankur Bansal
 
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
confluent
 
Kafka Summit SF 2017 - Keynote - Go Against the Flow: Databases and Stream Pr...
confluent
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
HostedbyConfluent
 
Uber Real Time Data Analytics
Ankur Bansal
 
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
confluent
 
Kafka Summit NYC 2017 - Every Message Counts: Kafka as a Foundation for Highl...
confluent
 
Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apac...
Flink Forward
 
Kafka Summit NYC 2017 - Stream it Together: 3 Realities of Modern Programming
confluent
 
Putting the Micro into Microservices with Stateful Stream Processing
confluent
 
Matching the Scale at Tinder with Kafka
confluent
 
Kafka Summit SF 2017 - Database Streaming at WePay
confluent
 
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
Michael Noll
 
Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...
confluent
 
Kafka Summit SF 2017 - Fast Data in Supply Chain Planning
confluent
 
You Must Construct Additional Pipelines: Pub-Sub on Kafka at Blizzard
confluent
 
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
confluent
 
Taking a look under the hood of Apache Flink's relational APIs.
Fabian Hueske
 

Viewers also liked (7)

PPTX
Introduction to Kafka
Ducas Francis
 
PDF
Introduction to apache kafka
Samuel Kerrien
 
PDF
Stream Processing with Kafka in Uber, Danny Yuan
confluent
 
PPTX
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
confluent
 
PPTX
Reducing Microservice Complexity with Kafka and Reactive Streams
jimriecken
 
PPTX
Introduction to Apache Kafka
Jeff Holoman
 
PPTX
Microservices in the Apache Kafka Ecosystem
confluent
 
Introduction to Kafka
Ducas Francis
 
Introduction to apache kafka
Samuel Kerrien
 
Stream Processing with Kafka in Uber, Danny Yuan
confluent
 
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
confluent
 
Reducing Microservice Complexity with Kafka and Reactive Streams
jimriecken
 
Introduction to Apache Kafka
Jeff Holoman
 
Microservices in the Apache Kafka Ecosystem
confluent
 
Ad

Similar to Seattle kafka meetup nov 2015 published siphon (20)

PDF
Data pipeline with kafka
Mole Wong
 
PDF
Introduction to Apache Kafka
Ricardo Bravo
 
PPTX
messaging.pptx
NParakh1
 
PPTX
Kafka for Scale
Eyal Ben Ivri
 
PPTX
Understanding kafka
AmitDhodi
 
PDF
Apache kafka
NexThoughts Technologies
 
PDF
Kafka short
Tikal Knowledge
 
PPTX
Kafka
Majid Hajibaba
 
PPTX
Unleashing Real-time Power with Kafka.pptx
Knoldus Inc.
 
PDF
Kafka Vienna Meetup 020719
Patrik Kleindl
 
PDF
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Kai Wähner
 
PDF
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Guido Schmutz
 
PPT
Apache kafka- Onkar Kadam
Onkar Kadam
 
PPTX
Microsoft challenges of a multi tenant kafka service
Nitin Kumar
 
PPTX
Challenges of a multi tenant kafka service
Thomas Alex
 
PPTX
Kafka Tutorial, Kafka ecosystem with clustering examples
Jean-Paul Azar
 
PPTX
Current and Future of Apache Kafka
Joe Stein
 
PDF
Kafka syed academy_v1_introduction
Syed Hadoop
 
PDF
Kafka In Action Meap V12 Meap Dylan D Scott Viktor Gamov Dave Klein
gygerurwind8
 
PPTX
Kafka Tutorial: Streaming Data Architecture
Jean-Paul Azar
 
Data pipeline with kafka
Mole Wong
 
Introduction to Apache Kafka
Ricardo Bravo
 
messaging.pptx
NParakh1
 
Kafka for Scale
Eyal Ben Ivri
 
Understanding kafka
AmitDhodi
 
Kafka short
Tikal Knowledge
 
Unleashing Real-time Power with Kafka.pptx
Knoldus Inc.
 
Kafka Vienna Meetup 020719
Patrik Kleindl
 
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Kai Wähner
 
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Guido Schmutz
 
Apache kafka- Onkar Kadam
Onkar Kadam
 
Microsoft challenges of a multi tenant kafka service
Nitin Kumar
 
Challenges of a multi tenant kafka service
Thomas Alex
 
Kafka Tutorial, Kafka ecosystem with clustering examples
Jean-Paul Azar
 
Current and Future of Apache Kafka
Joe Stein
 
Kafka syed academy_v1_introduction
Syed Hadoop
 
Kafka In Action Meap V12 Meap Dylan D Scott Viktor Gamov Dave Klein
gygerurwind8
 
Kafka Tutorial: Streaming Data Architecture
Jean-Paul Azar
 
Ad

More from Nitin Kumar (15)

PDF
Deep learning with kafka
Nitin Kumar
 
PDF
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
Nitin Kumar
 
PDF
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
Nitin Kumar
 
PPTX
Processing trillions of events per day with apache
Nitin Kumar
 
PPTX
Ren cao kafka connect
Nitin Kumar
 
PDF
Insta clustr seattle kafka meetup presentation bb
Nitin Kumar
 
PPTX
EventHub for kafka ecosystems kafka meetup
Nitin Kumar
 
PPTX
Kafka eos
Nitin Kumar
 
PDF
Net flix kafka seattle meetup
Nitin Kumar
 
PDF
Avvo fkafka
Nitin Kumar
 
PPTX
Brandon obrien streaming_data
Nitin Kumar
 
PDF
Confluent kafka meetupseattle jan2017
Nitin Kumar
 
PPTX
Microsoft kafka load imbalance
Nitin Kumar
 
PPTX
Map r seattle streams meetup oct 2016
Nitin Kumar
 
PPTX
Linked in multi tier, multi-tenant, multi-problem kafka
Nitin Kumar
 
Deep learning with kafka
Nitin Kumar
 
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
Nitin Kumar
 
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
Nitin Kumar
 
Processing trillions of events per day with apache
Nitin Kumar
 
Ren cao kafka connect
Nitin Kumar
 
Insta clustr seattle kafka meetup presentation bb
Nitin Kumar
 
EventHub for kafka ecosystems kafka meetup
Nitin Kumar
 
Kafka eos
Nitin Kumar
 
Net flix kafka seattle meetup
Nitin Kumar
 
Avvo fkafka
Nitin Kumar
 
Brandon obrien streaming_data
Nitin Kumar
 
Confluent kafka meetupseattle jan2017
Nitin Kumar
 
Microsoft kafka load imbalance
Nitin Kumar
 
Map r seattle streams meetup oct 2016
Nitin Kumar
 
Linked in multi tier, multi-tenant, multi-problem kafka
Nitin Kumar
 

Recently uploaded (20)

PPTX
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
PDF
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
PDF
Per Axbom: The spectacular lies of maps
Nexer Digital
 
PDF
introduction to computer hardware and sofeware
chauhanshraddha2007
 
PPTX
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PPTX
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PPTX
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
Market Insight : ETH Dominance Returns
CIFDAQ
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
Per Axbom: The spectacular lies of maps
Nexer Digital
 
introduction to computer hardware and sofeware
chauhanshraddha2007
 
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Market Insight : ETH Dominance Returns
CIFDAQ
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 

Seattle kafka meetup nov 2015 published siphon

  • 1. Siphon - Kafka as DataBus in Microsoft Nitin Kumar ([email protected]) Dev Manager, Microsoft https://www.linkedin.com/in/nikuma
  • 2. Agenda • Scale: Kafka at Microsoft (Bing, Ads, Office) • Use Case: NRT Customer facing reports • Kafka based Streaming Solution • Collector • Consumer Restful APIs • Monitoring: Canary/Audit Trail • Production Experience • Key Takeaways Wednesday, March 16, 2016
  • 3. Scale: Kafka at Microsoft (Ads, Bing, Office) Kafka Brokers 1000+ across 5 Datacenters Operating System Windows Server 2012 R2 Hardware Spec 12 Cores, 32 GB RAM, 4x2 TB HDD (JBOD), 10 GB Network Incoming Events 1 million per sec, (90 Billion per day, 100 TB per day) Outgoing Events 5 million per sec, (1 Trillion per day, 500 TB per day) Kafka Topics/Partitions 50+/5000+ Kafka version 0.8.1.1 (3 way replication) Wednesday, March 16, 2016
  • 4. Problem Wednesday, March 16, 2016 Serving System{Q} {R} Online Fraud Detection ML Classification Aggregation Reporting DB Keyword 1.5 hours 2.5 hours Advertiser Feature Extraction 300 GB/h 200+ Features Stats 25 TB Log Collection Sorting / Partitioning What is the click through rate of my ad, that launched at 5pm?
  • 5. Goals / Design Considerations Wednesday, March 16, 2016 Reduce latency from 4 hours to 15 minutes 99.8% Log completeness Guarantees Check pointing & Failure recovery Exactly Once Semantic Highly Available, Scalable and rolling upgrade Reusing Existing C# Libraries
  • 6. Siphon DataBus Solution {Q} {R} Kafka Audit ML Classification Aggregation Reporting DB Keyword 1-2 sec (Minimize latency) < 15 minutes Advertiser Feature Extraction 100 MBPS 200+ Features Stats 25 TB Wednesday, March 16, 2016 Serving System Online Fraud Detection Kafka as a distributed Queue StreamScope as a distributed processing system StreamScope
  • 7. Siphon Wednesday, March 16, 2016 Asia DC Zookeeper Canary Kafka Collector Agent Services Data Pull (Agent) Services Data Push Device Proxy Services Consumer API (Push/ Pull) Europe DC Zookeeper Canary Kafka US DC Zookeeper Canary Kafka Streaming Batch Audit Trail Open Source Microsoft Internal Siphon
  • 8. Collector – Data Ingestion (Producers) • Http(s) Server • Restful API with SSL support. • Abstraction from Kafka internals (Partition, Kafka version) • Throttling, QPS Monitoring • PII scrubbing • Load balancing/failover Device Proxy Services Collector Kafka Brokers Broker Broker Broker Broker P0 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 Collector Collector LoadBalancer Services Data Push Agent Services Data Pull (Agent) Wednesday, March 16, 2016 Open Source Microsoft Internal Siphon
  • 9. Consumer API (Push/Pull) • Restful Pull API – Simple consumer • Config driven subscriptions for preconfigured sinks like (HDFS, Cosmos, ELK). Wednesday, March 16, 2016 Config (ZK) Executor Kafka .NET Library Kafka Supported destinations – • Cosmos • Elastic Search • Kafka • HDFS
  • 10. High Level Consumer Monitoring using Canary, Audit Trail Device Proxy Services Collector Kafka Brokers Broker Broker Broker Broker P0 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 Collector Collector LoadBalancer Services Data Push Agent Services Data Pull (Agent) Wednesday, March 16, 2016 Synthetic message Audit Trail
  • 11. Production Experience • System in production for 15 months • End to End Advertiser report latency of 12+ minutes. • Other use cases from Office, Bing. • Integration with other streaming systems – Storm, Spark. • Monitoring using ELK Wednesday, March 16, 2016
  • 12. Key Takeaways • Scale out with Kafka (50K -> 1M -> multi-million Events Per sec) • Ability to build tunable Auditing/Monitoring • Producer/Consumer Restful API provides a nice abstraction • Config driven Pub/Sub system Wednesday, March 16, 2016
  • 13. “We are Hiring.” Thank You Nitin Kumar ([email protected]) https://www.linkedin.com/in/nikuma Wednesday, March 16, 2016

Editor's Notes

  • #11: Client Agent generates a unique BatchId for every batch and appends it to the Extended Header. Client Agent sends a “produced” audit message (BatchId, DateTime, Number of Records) to the audit system. Each consumer, upon receiving the batch, de-serialize the header to extract BatchId and sends a “consumed” audit signal to the Audit system. Audit system compares produced vs consumed audits every 5 minutes and raise alerts.