SlideShare a Scribd company logo
Thursday, April 14, 2016
Siphon – Near Real Time Databus Using Kafka
Eric Boyd – CVP Engineering – Microsoft
Nitin Kumar – Principal Eng Manager - Microsoft
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Linux is a
cancer
Thursday, April 14, 2016
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Ads Oslo Schedule
Ads Oslo Feature List
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Bing Ads Execution
• Shipped once every 6
months
• Averaged 3 marketplace
experiments per month
• Big bets on marketplace
features that didn’t work.
• Focused teams on 6 tracks with
independent metrics.
• Pushed teams to ship as quickly as they
could, focusing only on moving their
metric.
• Built/borrowed infrastructure to enable
much more rapid experimentation.
• Over 3 years got to a rate of >1000
experiments a month
Profitability!!
Eric joins
MSFT
What drove the turnaround?
• Focus on small teams with clear metrics each team was driving.
• Pushing each team to experiment and iterate as fast as possible. Data
alone determines what gets shipped.
• Iterated on key metrics until we found the ones with the most impact.
• Commitment that we would get 1.5-2% better each month, and ship a
package of experimentally tested improvements each month.
Relationship with Open Source
• From “Linux is a cancer…”
• To contributing to open source
• Storm with C# - SCP.NET (http://www.nuget.org/packages/Microsoft.SCP.Net.SDK/)
• Spark with C# - Mobius (https://github.com/Microsoft/Mobius)
• Kafka with C# - C# Client for Kafka (https://github.com/Microsoft/Kafkanet)
• BOND (https://github.com/Microsoft/bond)
• Across MSFT
• C#
• VSCode
• Hyper-V drivers for Linux
• https://github.com/Microsoft/ with 18 pages of repositories!
Microsoft Big Data History
• Massive batch oriented systems
• Hundreds of thousands of machines
• Exabytes of storage
• SQL-like language with C# extensions
Moving to streaming
Data Bus
Devices Services
Streaming
Processing
Batch
Processing
Applications
Scalable pub/sub for NRT data streams
Interactive analytics
Vision
• A Databus for all Near Real Time (NRT) data in an organization.
• Quick and Easy Publication, Discovery and Subscription of NRT
dataset.
• Compatibility with various Stream Processing systems like
Storm, Spark, Splunk.
Siphon Adoption
15 months since launch
Excel Word Outlook
Windows 10
Usage
Bing Ads Campaign perf
Bing Live site telemetry
Cortana
Office 365
0
10
20
30
40
50
60
70
80
Throughput(inGBps)
Siphon Data Volume (Ingress and Egress)
Volume published (GBps) Volume subscribed (GBps) Total Volume (GBps)
0
2
4
6
8
10
12
14
16
18
Throughput(eventspersec)Millions
Siphon Events per second (Ingress and Egress)
EPS In Eps Out Total EPS
1.3 million
EVENTS PER SECOND INGRESS AT PEAK
~1 trillion
EVENTS PER DAY PROCESSED AT PEAK
3.5 petabytes
PROCESSED PER DAY
100 thousand
UNIQUE DEVICES AND MACHINES
1,300
PRODUCTION KAFKA BROKERS
Scale: Kafka at Microsoft (Ads, Bing, Office)
Kafka Brokers 1300+ across 5 Datacenters
Operating System Windows Server 2012 R2
Hardware Spec 12 Cores, 32 GB RAM, 4x2 TB HDD (JBOD), 10 GB Network
Incoming Events 1.3 million per sec, (112 Billion per day, 500 TB per day)
Outgoing Events 5 million per sec, (~1 Trillion per day, 3.5 PB per day)
Kafka Topics/Partitions 50+/5000+
Kafka version 0.8.1.1 (3 way replication)
Siphon Architecture
Asia DC
Zookeeper Canary
Kafka
Collector
Agent
Services Data Pull (Agent)
Services Data Push
Device Proxy Services
Consumer
API (Push/
Pull)
Europe DC
Zookeeper Canary
Kafka
US DC
Zookeeper Canary
Kafka
Streaming
Batch
Audit Trail
Open Source
Microsoft Internal
Siphon
Multiple sources and schemas
Siphon
Bond
Schema
PartA
Main
Header
MessageId
AuditId
TimeStamp
PartB
Extended
Header
Key-Value[]
PartC
Payload
CSV
XML
JSON
JSON
XML
CSV
Siphon
Bond
Schema
Bond (https://github.com/Microsoft/bond)
 Cross platform framework for working with schematized data.
 Cross language (de) serialization.
 Similar to Protobuf, Thrift and AVRO.
Collector – Data Ingestion (Producer)
• Http(s) Server
• Restful API with SSL support.
• Abstraction from Kafka
internals (Partition, Kafka version)
• Throttling, QPS Monitoring
• PII scrubbing
• Load balancing/failover to multiple DCs
• Supported for both Windows and Linux
servers.
Device Proxy Services
Collector
Kafka Brokers
Broker
Broker
Broker
Broker
P0
P1
P2
P3
P4
P5
P6
P7
P8
P9
P10
P11
Collector
Collector
LoadBalancer
Services Data Push
Agent
Services Data Pull (Agent)
Open Source
Microsoft Internal
Siphon
URL : http://localhost/produce/<version>?topic=<toipic>
Method : POST
Pull & Push Consumers
Virtual Network A
HLC
Pull
Kafka Brokers
Broker
Broker
Broker
Broker
P0
P2
P3
P4
P5
P6
P7
P8
P9
P10
P11
P1
Collector
Collector
RESTAPI
Virtual
Network B
Pull
• RESTful API with SSL support
• Works for out of network consumers
• Supports metadata and data operation
• Implement Simple consumer APIs
• Spark streaming receiver for Kafka REST
Push
• Configurable push to destinations like HDFS,
Cosmos, Kafka.
• Utilizes KafkaNet - .NET High Level Consumer
(https://github.com/Microsoft/Kafkanet)
High Level
Consumer
Monitoring using Canary
Device Proxy Services
Collector
Kafka Brokers
Broker
Broker
Broker
Broker
P0
P1
P2
P3
P4
P5
P6
P7
P8
P9
P10
P11
Collector
Collector
LoadBalancer
Services Data Push
Agent
Services Data Pull (Agent)
Synthetic
message
Audit Trail
Canary - https://github.com/Microsoft/Availability-Monitor-for-Kafka
High Level
Consumer
Device Proxy Services
Collector
Kafka Brokers
Broker
Broker
Broker
Broker
P0
P1
P2
P3
P4
P5
P6
P7
P8
P9
P10
P11
Collector
Collector
LoadBalancer
Services Data Push
Agent
Services Data Pull (Agent)
Audit Trail
Sampled vs Full
Auditing support
Data completeness – Audit Trail
Production Experience – Telemetry Charts
• Monitoring using ELK
• E2E Latency
• Data Completeness
• Processing Lag
• EPS breakdown by data
center.
Key Takeaways
• Scale out with Kafka (50K -> 1M -> multi-million Events Per sec)
• Ability to build tunable Auditing/Monitoring
• Producer/Consumer Restful API provides a nice abstraction
• Config driven Pub/Sub system

More Related Content

What's hot (20)

PPTX
Kafka presentation
Mohammed Fazuluddin
 
PDF
SAP ERP IMPLEMENTATION AND Sap migration
Arig
 
PDF
Developing real-time data pipelines with Spring and Kafka
marius_bogoevici
 
PDF
Data Discovery at Databricks with Amundsen
Databricks
 
PDF
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Edureka!
 
PDF
Migration scenarios RISE with SAP S4HANA Cloud, Private Edition - Version #1....
Yevilina Rizka
 
PPTX
Presentation des Essentiels de MS Office365
Laurent Rouable
 
PDF
ADF Applications and Metadata
Nakul Thacker
 
PDF
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Databricks
 
PPTX
Migrating to SAP S/4HANA
Accenture Technology
 
PPTX
Migrations de données et transition SAP S/4HANA
Precisely
 
PPTX
Master IAM in the Cloud with SCIM v2.0
Kelly Grizzle
 
PDF
How Uber scaled its Real Time Infrastructure to Trillion events per day
DataWorks Summit
 
PDF
Making Structured Streaming Ready for Production
Databricks
 
PPTX
A small comparison between SAP S/4HANA Cloud : Public Vs Private Vs On Premise
Soumya De
 
PDF
Introduction to Apache NiFi dws19 DWS - DC 2019
Timothy Spann
 
PDF
Azure Custom Backup Solution for SAP NetWeaver
Gary Jackson MBCS
 
PPTX
SAP HANA Migration Deck.pptx
SingbBablu
 
PDF
Sizing sap hana
Jaleel Ahmed Gulammohiddin
 
PDF
Apache Kafka, Un système distribué de messagerie hautement performant
ALTIC Altic
 
Kafka presentation
Mohammed Fazuluddin
 
SAP ERP IMPLEMENTATION AND Sap migration
Arig
 
Developing real-time data pipelines with Spring and Kafka
marius_bogoevici
 
Data Discovery at Databricks with Amundsen
Databricks
 
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Edureka!
 
Migration scenarios RISE with SAP S4HANA Cloud, Private Edition - Version #1....
Yevilina Rizka
 
Presentation des Essentiels de MS Office365
Laurent Rouable
 
ADF Applications and Metadata
Nakul Thacker
 
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Databricks
 
Migrating to SAP S/4HANA
Accenture Technology
 
Migrations de données et transition SAP S/4HANA
Precisely
 
Master IAM in the Cloud with SCIM v2.0
Kelly Grizzle
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
DataWorks Summit
 
Making Structured Streaming Ready for Production
Databricks
 
A small comparison between SAP S/4HANA Cloud : Public Vs Private Vs On Premise
Soumya De
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Timothy Spann
 
Azure Custom Backup Solution for SAP NetWeaver
Gary Jackson MBCS
 
SAP HANA Migration Deck.pptx
SingbBablu
 
Apache Kafka, Un système distribué de messagerie hautement performant
ALTIC Altic
 

Viewers also liked (20)

PDF
Stream Processing with Kafka in Uber, Danny Yuan
confluent
 
PPTX
The Rise of Real Time
confluent
 
PPTX
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
confluent
 
PPTX
Building Event-Driven Systems with Apache Kafka
Brian Ritchie
 
PPTX
Microservices in the Apache Kafka Ecosystem
confluent
 
PPTX
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
confluent
 
PPTX
Building an Event-oriented Data Platform with Kafka, Eric Sammer
confluent
 
PDF
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
confluent
 
PDF
Introducing Kafka Streams: Large-scale Stream Processing with Kafka, Neha Nar...
confluent
 
PPTX
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
confluent
 
PPTX
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
confluent
 
PPTX
Reducing Microservice Complexity with Kafka and Reactive Streams
jimriecken
 
PDF
The Data Dichotomy- Rethinking the Way We Treat Data and Services
confluent
 
PDF
Monitoring Apache Kafka with Confluent Control Center
confluent
 
PDF
Distributed stream processing with Apache Kafka
confluent
 
PDF
What's new in Confluent 3.2 and Apache Kafka 0.10.2
confluent
 
PDF
The Enterprise Service Bus is Dead! Long live the Enterprise Service Bus, Rim...
confluent
 
PDF
Kafka At Scale in the Cloud
confluent
 
PDF
Power of the Log: LSM & Append Only Data Structures
confluent
 
PPTX
Seattle kafka meetup nov 2015 published siphon
Nitin Kumar
 
Stream Processing with Kafka in Uber, Danny Yuan
confluent
 
The Rise of Real Time
confluent
 
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
confluent
 
Building Event-Driven Systems with Apache Kafka
Brian Ritchie
 
Microservices in the Apache Kafka Ecosystem
confluent
 
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
confluent
 
Building an Event-oriented Data Platform with Kafka, Eric Sammer
confluent
 
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
confluent
 
Introducing Kafka Streams: Large-scale Stream Processing with Kafka, Neha Nar...
confluent
 
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
confluent
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
confluent
 
Reducing Microservice Complexity with Kafka and Reactive Streams
jimriecken
 
The Data Dichotomy- Rethinking the Way We Treat Data and Services
confluent
 
Monitoring Apache Kafka with Confluent Control Center
confluent
 
Distributed stream processing with Apache Kafka
confluent
 
What's new in Confluent 3.2 and Apache Kafka 0.10.2
confluent
 
The Enterprise Service Bus is Dead! Long live the Enterprise Service Bus, Rim...
confluent
 
Kafka At Scale in the Cloud
confluent
 
Power of the Log: LSM & Append Only Data Structures
confluent
 
Seattle kafka meetup nov 2015 published siphon
Nitin Kumar
 
Ad

Similar to Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar (20)

PDF
Kafka Up And Running For Network Devops Set Your Network Data In Motion Eric ...
tjademargis
 
PDF
Superpower Your Apache Kafka Applications Development with Complementary Open...
Paul Brebner
 
PDF
Introduction to Apache Kafka
Ricardo Bravo
 
PPTX
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Joe Stein
 
PPTX
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit
 
PDF
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Guido Schmutz
 
PDF
Kafka in Action MEAP V12 Dylan D Scott Viktor Gamov Dave Klein
biruktresehb
 
PPTX
MuleSoft Meetup Singapore #8 March 2021
Julian Douch
 
PPTX
Unleashing Real-time Power with Kafka.pptx
Knoldus Inc.
 
PDF
Kafka In Action Meap V12 Meap Dylan D Scott Viktor Gamov Dave Klein
gygerurwind8
 
PDF
Apache kafka
NexThoughts Technologies
 
PDF
Maximizing Real-Time Data Processing with Apache Kafka and InfluxDB: A Compre...
HostedbyConfluent
 
PDF
Get Kafka in Action MEAP V12 Dylan D Scott Viktor Gamov Dave Klein free all c...
ophoriembriz
 
PPTX
Fraud Detection for Israel BigThings Meetup
Gwen (Chen) Shapira
 
PDF
OSSNA Building Modern Data Streaming Apps
Timothy Spann
 
PPTX
Current and Future of Apache Kafka
Joe Stein
 
PPTX
Developing Real-Time Data Pipelines with Apache Kafka
Joe Stein
 
PDF
Streaming Processing with a Distributed Commit Log
Joe Stein
 
PPTX
messaging.pptx
NParakh1
 
PPTX
Ingestion in data pipelines with Managed Kafka Clusters in Azure HDInsight
Microsoft Tech Community
 
Kafka Up And Running For Network Devops Set Your Network Data In Motion Eric ...
tjademargis
 
Superpower Your Apache Kafka Applications Development with Complementary Open...
Paul Brebner
 
Introduction to Apache Kafka
Ricardo Bravo
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Joe Stein
 
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit
 
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Guido Schmutz
 
Kafka in Action MEAP V12 Dylan D Scott Viktor Gamov Dave Klein
biruktresehb
 
MuleSoft Meetup Singapore #8 March 2021
Julian Douch
 
Unleashing Real-time Power with Kafka.pptx
Knoldus Inc.
 
Kafka In Action Meap V12 Meap Dylan D Scott Viktor Gamov Dave Klein
gygerurwind8
 
Maximizing Real-Time Data Processing with Apache Kafka and InfluxDB: A Compre...
HostedbyConfluent
 
Get Kafka in Action MEAP V12 Dylan D Scott Viktor Gamov Dave Klein free all c...
ophoriembriz
 
Fraud Detection for Israel BigThings Meetup
Gwen (Chen) Shapira
 
OSSNA Building Modern Data Streaming Apps
Timothy Spann
 
Current and Future of Apache Kafka
Joe Stein
 
Developing Real-Time Data Pipelines with Apache Kafka
Joe Stein
 
Streaming Processing with a Distributed Commit Log
Joe Stein
 
messaging.pptx
NParakh1
 
Ingestion in data pipelines with Managed Kafka Clusters in Azure HDInsight
Microsoft Tech Community
 
Ad

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
PDF
Migration, backup and restore made easy using Kannika
confluent
 
PDF
Five Things You Need to Know About Data Streaming in 2025
confluent
 
PDF
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
PDF
Unlocking value with event-driven architecture by Confluent
confluent
 
PDF
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
PDF
Building API data products on top of your real-time data infrastructure
confluent
 
PDF
Speed Wins: From Kafka to APIs in Minutes
confluent
 
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
Migration, backup and restore made easy using Kannika
confluent
 
Five Things You Need to Know About Data Streaming in 2025
confluent
 
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
Unlocking value with event-driven architecture by Confluent
confluent
 
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
Building API data products on top of your real-time data infrastructure
confluent
 
Speed Wins: From Kafka to APIs in Minutes
confluent
 
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 

Recently uploaded (20)

PDF
Water Industry Process Automation & Control Monthly July 2025
Water Industry Process Automation & Control
 
PPTX
GitOps_Without_K8s_Training_detailed git repository
DanialHabibi2
 
PPTX
Element 11. ELECTRICITY safety and hazards
merrandomohandas
 
PPTX
Day2 B2 Best.pptx
helenjenefa1
 
PPTX
Depth First Search Algorithm in 🧠 DFS in Artificial Intelligence (AI)
rafeeqshaik212002
 
PPTX
265587293-NFPA 101 Life safety code-PPT-1.pptx
chandermwason
 
PDF
Biomechanics of Gait: Engineering Solutions for Rehabilitation (www.kiu.ac.ug)
publication11
 
PDF
MAD Unit - 1 Introduction of Android IT Department
JappanMavani
 
PDF
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
PDF
Introduction to Productivity and Quality
মোঃ ফুরকান উদ্দিন জুয়েল
 
PDF
Reasons for the succes of MENARD PRESSUREMETER.pdf
majdiamz
 
PPTX
Damage of stability of a ship and how its change .pptx
ehamadulhaque
 
PPTX
Element 7. CHEMICAL AND BIOLOGICAL AGENT.pptx
merrandomohandas
 
PPTX
fatigue in aircraft structures-221113192308-0ad6dc8c.pptx
aviatecofficial
 
PPTX
Server Side Web Development Unit 1 of Nodejs.pptx
sneha852132
 
PDF
GTU Civil Engineering All Semester Syllabus.pdf
Vimal Bhojani
 
PPTX
Solar Thermal Energy System Seminar.pptx
Gpc Purapuza
 
PDF
AI TECHNIQUES FOR IDENTIFYING ALTERATIONS IN THE HUMAN GUT MICROBIOME IN MULT...
vidyalalltv1
 
PPTX
What is Shot Peening | Shot Peening is a Surface Treatment Process
Vibra Finish
 
PPTX
Heart Bleed Bug - A case study (Course: Cryptography and Network Security)
Adri Jovin
 
Water Industry Process Automation & Control Monthly July 2025
Water Industry Process Automation & Control
 
GitOps_Without_K8s_Training_detailed git repository
DanialHabibi2
 
Element 11. ELECTRICITY safety and hazards
merrandomohandas
 
Day2 B2 Best.pptx
helenjenefa1
 
Depth First Search Algorithm in 🧠 DFS in Artificial Intelligence (AI)
rafeeqshaik212002
 
265587293-NFPA 101 Life safety code-PPT-1.pptx
chandermwason
 
Biomechanics of Gait: Engineering Solutions for Rehabilitation (www.kiu.ac.ug)
publication11
 
MAD Unit - 1 Introduction of Android IT Department
JappanMavani
 
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
Introduction to Productivity and Quality
মোঃ ফুরকান উদ্দিন জুয়েল
 
Reasons for the succes of MENARD PRESSUREMETER.pdf
majdiamz
 
Damage of stability of a ship and how its change .pptx
ehamadulhaque
 
Element 7. CHEMICAL AND BIOLOGICAL AGENT.pptx
merrandomohandas
 
fatigue in aircraft structures-221113192308-0ad6dc8c.pptx
aviatecofficial
 
Server Side Web Development Unit 1 of Nodejs.pptx
sneha852132
 
GTU Civil Engineering All Semester Syllabus.pdf
Vimal Bhojani
 
Solar Thermal Energy System Seminar.pptx
Gpc Purapuza
 
AI TECHNIQUES FOR IDENTIFYING ALTERATIONS IN THE HUMAN GUT MICROBIOME IN MULT...
vidyalalltv1
 
What is Shot Peening | Shot Peening is a Surface Treatment Process
Vibra Finish
 
Heart Bleed Bug - A case study (Course: Cryptography and Network Security)
Adri Jovin
 

Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar

  • 1. Thursday, April 14, 2016 Siphon – Near Real Time Databus Using Kafka Eric Boyd – CVP Engineering – Microsoft Nitin Kumar – Principal Eng Manager - Microsoft
  • 9. Bing Ads Execution • Shipped once every 6 months • Averaged 3 marketplace experiments per month • Big bets on marketplace features that didn’t work. • Focused teams on 6 tracks with independent metrics. • Pushed teams to ship as quickly as they could, focusing only on moving their metric. • Built/borrowed infrastructure to enable much more rapid experimentation. • Over 3 years got to a rate of >1000 experiments a month
  • 11. What drove the turnaround? • Focus on small teams with clear metrics each team was driving. • Pushing each team to experiment and iterate as fast as possible. Data alone determines what gets shipped. • Iterated on key metrics until we found the ones with the most impact. • Commitment that we would get 1.5-2% better each month, and ship a package of experimentally tested improvements each month.
  • 12. Relationship with Open Source • From “Linux is a cancer…” • To contributing to open source • Storm with C# - SCP.NET (http://www.nuget.org/packages/Microsoft.SCP.Net.SDK/) • Spark with C# - Mobius (https://github.com/Microsoft/Mobius) • Kafka with C# - C# Client for Kafka (https://github.com/Microsoft/Kafkanet) • BOND (https://github.com/Microsoft/bond) • Across MSFT • C# • VSCode • Hyper-V drivers for Linux • https://github.com/Microsoft/ with 18 pages of repositories!
  • 13. Microsoft Big Data History • Massive batch oriented systems • Hundreds of thousands of machines • Exabytes of storage • SQL-like language with C# extensions
  • 16. Vision • A Databus for all Near Real Time (NRT) data in an organization. • Quick and Easy Publication, Discovery and Subscription of NRT dataset. • Compatibility with various Stream Processing systems like Storm, Spark, Splunk.
  • 17. Siphon Adoption 15 months since launch Excel Word Outlook Windows 10
  • 18. Usage Bing Ads Campaign perf Bing Live site telemetry Cortana Office 365 0 10 20 30 40 50 60 70 80 Throughput(inGBps) Siphon Data Volume (Ingress and Egress) Volume published (GBps) Volume subscribed (GBps) Total Volume (GBps) 0 2 4 6 8 10 12 14 16 18 Throughput(eventspersec)Millions Siphon Events per second (Ingress and Egress) EPS In Eps Out Total EPS 1.3 million EVENTS PER SECOND INGRESS AT PEAK ~1 trillion EVENTS PER DAY PROCESSED AT PEAK 3.5 petabytes PROCESSED PER DAY 100 thousand UNIQUE DEVICES AND MACHINES 1,300 PRODUCTION KAFKA BROKERS
  • 19. Scale: Kafka at Microsoft (Ads, Bing, Office) Kafka Brokers 1300+ across 5 Datacenters Operating System Windows Server 2012 R2 Hardware Spec 12 Cores, 32 GB RAM, 4x2 TB HDD (JBOD), 10 GB Network Incoming Events 1.3 million per sec, (112 Billion per day, 500 TB per day) Outgoing Events 5 million per sec, (~1 Trillion per day, 3.5 PB per day) Kafka Topics/Partitions 50+/5000+ Kafka version 0.8.1.1 (3 way replication)
  • 20. Siphon Architecture Asia DC Zookeeper Canary Kafka Collector Agent Services Data Pull (Agent) Services Data Push Device Proxy Services Consumer API (Push/ Pull) Europe DC Zookeeper Canary Kafka US DC Zookeeper Canary Kafka Streaming Batch Audit Trail Open Source Microsoft Internal Siphon
  • 21. Multiple sources and schemas Siphon Bond Schema PartA Main Header MessageId AuditId TimeStamp PartB Extended Header Key-Value[] PartC Payload CSV XML JSON JSON XML CSV Siphon Bond Schema Bond (https://github.com/Microsoft/bond)  Cross platform framework for working with schematized data.  Cross language (de) serialization.  Similar to Protobuf, Thrift and AVRO.
  • 22. Collector – Data Ingestion (Producer) • Http(s) Server • Restful API with SSL support. • Abstraction from Kafka internals (Partition, Kafka version) • Throttling, QPS Monitoring • PII scrubbing • Load balancing/failover to multiple DCs • Supported for both Windows and Linux servers. Device Proxy Services Collector Kafka Brokers Broker Broker Broker Broker P0 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 Collector Collector LoadBalancer Services Data Push Agent Services Data Pull (Agent) Open Source Microsoft Internal Siphon URL : http://localhost/produce/<version>?topic=<toipic> Method : POST
  • 23. Pull & Push Consumers Virtual Network A HLC Pull Kafka Brokers Broker Broker Broker Broker P0 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P1 Collector Collector RESTAPI Virtual Network B Pull • RESTful API with SSL support • Works for out of network consumers • Supports metadata and data operation • Implement Simple consumer APIs • Spark streaming receiver for Kafka REST Push • Configurable push to destinations like HDFS, Cosmos, Kafka. • Utilizes KafkaNet - .NET High Level Consumer (https://github.com/Microsoft/Kafkanet)
  • 24. High Level Consumer Monitoring using Canary Device Proxy Services Collector Kafka Brokers Broker Broker Broker Broker P0 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 Collector Collector LoadBalancer Services Data Push Agent Services Data Pull (Agent) Synthetic message Audit Trail Canary - https://github.com/Microsoft/Availability-Monitor-for-Kafka
  • 25. High Level Consumer Device Proxy Services Collector Kafka Brokers Broker Broker Broker Broker P0 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 Collector Collector LoadBalancer Services Data Push Agent Services Data Pull (Agent) Audit Trail Sampled vs Full Auditing support Data completeness – Audit Trail
  • 26. Production Experience – Telemetry Charts • Monitoring using ELK • E2E Latency • Data Completeness • Processing Lag • EPS breakdown by data center.
  • 27. Key Takeaways • Scale out with Kafka (50K -> 1M -> multi-million Events Per sec) • Ability to build tunable Auditing/Monitoring • Producer/Consumer Restful API provides a nice abstraction • Config driven Pub/Sub system