SlideShare a Scribd company logo
SmartStack, Docker and Yocalhost
How Yelp Does Service Discovery
[Demo]
● This works from (almost) any host in Yelp
● This works from Python, Java, command line etc.
● If a service supports HTTP or TCP then it can be made discoverable.
○ This includes third-party services such as MySQL and scribe
● It’s dynamic: for a given service, if new instances are added then they
will automatically become available.
Very Important Things to Note
● SmartStack (nerve and synapse) were written by Airbnb
● We’ve added some features
● The work here has been carried out by many people across Yelp
Credits
Registration
Architecture
hacheck
service_1
service_2
service_3
Service host
ZK
configure_nerve.py
nerve
Nerve registers service instance in ZooKeeper:
/nerve/region:myregion
├── service_1
│ └── server_1_0000013614
├── service_2
│ └── server_1_0000000959
├── service_3
│ ├── server_1_0000002468
│ └── server_2_0000002467
[...]
ZooKeeper data
The data in a znode is all that is required to connect to the corresponding
service instance.
We’ll shortly see how this is used for discovery.
{
"host":"10.0.0.123",
"port":31337,
"name":"server_1",
"weight":10,
}
ZooKeeper data
hacheck
Normally hacheck just acts as a transparent proxy for our healthchecks:
$ curl -s yocalhost:6666/http/service_1/1234/status | jq .
{
"uptime": 5693819.315988064,
"pid": 2595160,
"host": "server_1",
"version": "b6309e09d71da8f1e28213d251f7c3515878caca",
}
hacheck
We can also use it to fail healthchecks before we shut down a service.
This allows us to gracefully shutdown a service.
(Also provides a 1s cache to limit healthcheck rate.)
$ hadown service_1
$ curl -v yocalhost:6666/http/service_1/1234/status
Service service_1 in down state since 1443217910: billings
configure_nerve.py
How do we know what services to advertise? Every service host
periodically runs a script to regenerate the nerve configuration, reading
from the following sources:
● yelpsoa-configs
runs_on:
server_1
server_2
● puppet
nerve_simple::puppet_service {'foo'}
● mesos slave API
Discovery
Architecture
ZK
client
synapse
haproxy
configure_synapse.py
nerve
HAProxy
● By default bind to 0.0.0.0
● Bind only to yocalhost on public servers.
● HAProxy gives us a lot of goodies for all clients:
○ Redispatch on connection failures
○ Zero-downtime restarts (once you know how :)
○ Easy to insert connection logging
● Each host also exposes an HAProxy status page for easy introspection
configure_synapse.py
Every client host periodically runs a script to regenerate the synapse
configuration, reading service definitions from yelpsoa-configs.
For each service reads a smartstack.yaml file.
Restarts synapse if configuration has changed.
smartstack.yaml
main:
proxy_port: 20973
mode: http
healthcheck_uri: /status
timeout_server_ms: 1000
Namespaces
main:
proxy_port: 20001
mode: http
healthcheck_uri: /status
timeout_server_ms: 1000
long_timeout:
proxy_port: 20002
mode: http
healthcheck_uri: /status
timeout_server_ms: 3000
Same service,
different ports
Escape hatch
Some client libraries like to do their own load balancing e.g. cassandra,
memcached. Use synapse to dump the registration information to disk:
$ cat /var/run/synapse/services/devops.demo.json | jq .
[
{
"host":"10.0.0.123",
"port":31337,
"name":"server_1",
"weight":10,
}
]
Docker + Yocalhost
Architecture
haproxy
docker container 1
lo 127.0.0.1
docker container 2
lo 127.0.0.1
eth0 169.254.14.17
eth0 169.254.14.18
docker0 169.254.1.1
eth0 10.0.1.2
lo:0 169.254.255.254
lo 127.0.0.1
yocalhost
● We’d like to run only one nerve / synapse / haproxy per host
● What address should we bind haproxy to?
● 127.0.0.1 won’t work from within a container
● Instead we pick a link-local address 169.254.255.254 (yocalhost)
● This also works on servers without docker
Locality-aware discovery
Overview
We run services in both our own datacenters as well as AWS.
We logically group these environments according to latency.
Service authors get to decide how ‘widely’ their service instances are
advertised.
Everything is controlled via smartstack.yaml files.
Latency hierarchies
habitat
region
superregion
ZooKeepers live
here
Datacenters
or AZs in AWS
Habitats within
1ms round-trip
e.g. ‘us-west-1’
Regions within 5ms
round-trip e.g. ‘pacific
north-west’
main:
proxy_port: 20973
advertise: [habitat]
discover: habitat
advertise / discover
Synapse should look in the
habitat directory in its local
ZooKeeper
Nerve should register this
service in the habitat directory
of its local ZooKeeper
ZooKeeper data, revisited
/nerve
├── region:us-west-1
│ └── service_1
│ └── server_1_0000013614
├── region:us-west-2
│ └── service_2
│ └── server_2_0000000959
[...]
Extra advertisements
“Wouldn’t it be useful if we could make a service running in datacenter A
available in an (arbitrary) datacenter B?”
Why?
● Makes it easier to bring up a new datacenter
● Makes it easier to add more capacity to a datacenter in an emergency
● Makes it easier to keep a datacenter going in an emergency if a service
fails
main:
advertise: [region]
discover: region
extra_advertise:
region:us-west-1: [region:us-west-2]
extra_advertise
Design choices
Unix 4eva
● Lots of little components, each doing doing one thing well
● Very simple interface for clients and services
○ If it speaks TCP or HTTP we can register it
● Easy to independently replace components
○ HAProxy -> NGINX?
● Easy to observe behavior of components
It’s OK if ZooKeeper fails
● Nerve and Synapse keep retrying
● HAProxy keeps running but with no updates
● HAProxy performs its own healthchecks against service instances
○ If a service instance becomes unavailable then it will stop receiving
traffic after a short period
● The website stays up :)
Does it blend scale?
● Used to have scaling issues with internal load balancers, this is not a
problem with SmartStack :)
● Hit some scaling issues at 10s of thousands of ZooKeeper connections
○ Addressed this by using just a single ZooKeeper connection from
each nerve and synapse
● Used to have lots of HAProxy healthchecks hitting services
○ hacheck insulates services from this
○ We limit HAProxy restart rate
What about etcd / consul / …?
● We try to use boring components :)
● We’re already using Zookeeper for Kafka and ElasticSearch so it’s
natural to use it for our service discovery system too.
● etcd would probably also work, and is supported by SmartStack
● Conceptually similar to consul / consul-template
What about DNS?
● What TTL are you going to use?
● Are you clients even going to honor the TTL?
● Does the DNS resolution happen inline with requests?
Conclusions
● We’ve used SmartStack to create a robust service discovery system
● It’s UNIXy: lots of separate components, each doing one thing well
● It’s flexible: locality-aware discovery
● It’s reliable: new devs at Yelp view discovery as a solved problem
● It’s useful: SmartStack is the glue that holds our SOA together
Ad

More Related Content

What's hot (20)

Service discovery in Docker environments
Service discovery in Docker environmentsService discovery in Docker environments
Service discovery in Docker environments
alexandru giurgiu
 
Distributed Coordination with Python
Distributed Coordination with PythonDistributed Coordination with Python
Distributed Coordination with Python
OSCON Byrum
 
A Python Petting Zoo
A Python Petting ZooA Python Petting Zoo
A Python Petting Zoo
devondjones
 
Monitoring of OpenNebula installations
Monitoring of OpenNebula installationsMonitoring of OpenNebula installations
Monitoring of OpenNebula installations
NETWAYS
 
So we're running Apache ZooKeeper. Now What? By Camille Fournier
So we're running Apache ZooKeeper. Now What? By Camille Fournier So we're running Apache ZooKeeper. Now What? By Camille Fournier
So we're running Apache ZooKeeper. Now What? By Camille Fournier
Hakka Labs
 
Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
Codemotion
 
Comparing ZooKeeper and Consul
Comparing ZooKeeper and ConsulComparing ZooKeeper and Consul
Comparing ZooKeeper and Consul
Ivan Glushkov
 
Consuming Cinder from Docker
Consuming Cinder from DockerConsuming Cinder from Docker
Consuming Cinder from Docker
John Griffith
 
Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner)
Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner) Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner)
Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner)
Puppet
 
Openstack Scheduler and Scalability Issue
Openstack Scheduler and Scalability IssueOpenstack Scheduler and Scalability Issue
Openstack Scheduler and Scalability Issue
Vigneshvar A.S
 
Introduction openstack-meetup-nov-28
Introduction openstack-meetup-nov-28Introduction openstack-meetup-nov-28
Introduction openstack-meetup-nov-28
Sadique Puthen
 
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Codemotion
 
Supercharging Content Delivery with Varnish
Supercharging Content Delivery with VarnishSupercharging Content Delivery with Varnish
Supercharging Content Delivery with Varnish
Samantha Quiñones
 
Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...
Jimmy Lai
 
Automation with Ansible and Containers
Automation with Ansible and ContainersAutomation with Ansible and Containers
Automation with Ansible and Containers
Rodolfo Carvalho
 
London HUG 12/4
London HUG 12/4London HUG 12/4
London HUG 12/4
London HashiCorp User Group
 
Openstack study-nova-02
Openstack study-nova-02Openstack study-nova-02
Openstack study-nova-02
Jinho Shin
 
Openstack Overview
Openstack OverviewOpenstack Overview
Openstack Overview
rajdeep
 
Consuming Cinder from Docker
Consuming Cinder from DockerConsuming Cinder from Docker
Consuming Cinder from Docker
Tesora
 
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with Prometheus
Shiao-An Yuan
 
Service discovery in Docker environments
Service discovery in Docker environmentsService discovery in Docker environments
Service discovery in Docker environments
alexandru giurgiu
 
Distributed Coordination with Python
Distributed Coordination with PythonDistributed Coordination with Python
Distributed Coordination with Python
OSCON Byrum
 
A Python Petting Zoo
A Python Petting ZooA Python Petting Zoo
A Python Petting Zoo
devondjones
 
Monitoring of OpenNebula installations
Monitoring of OpenNebula installationsMonitoring of OpenNebula installations
Monitoring of OpenNebula installations
NETWAYS
 
So we're running Apache ZooKeeper. Now What? By Camille Fournier
So we're running Apache ZooKeeper. Now What? By Camille Fournier So we're running Apache ZooKeeper. Now What? By Camille Fournier
So we're running Apache ZooKeeper. Now What? By Camille Fournier
Hakka Labs
 
Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
Codemotion
 
Comparing ZooKeeper and Consul
Comparing ZooKeeper and ConsulComparing ZooKeeper and Consul
Comparing ZooKeeper and Consul
Ivan Glushkov
 
Consuming Cinder from Docker
Consuming Cinder from DockerConsuming Cinder from Docker
Consuming Cinder from Docker
John Griffith
 
Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner)
Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner) Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner)
Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner)
Puppet
 
Openstack Scheduler and Scalability Issue
Openstack Scheduler and Scalability IssueOpenstack Scheduler and Scalability Issue
Openstack Scheduler and Scalability Issue
Vigneshvar A.S
 
Introduction openstack-meetup-nov-28
Introduction openstack-meetup-nov-28Introduction openstack-meetup-nov-28
Introduction openstack-meetup-nov-28
Sadique Puthen
 
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Codemotion
 
Supercharging Content Delivery with Varnish
Supercharging Content Delivery with VarnishSupercharging Content Delivery with Varnish
Supercharging Content Delivery with Varnish
Samantha Quiñones
 
Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...
Jimmy Lai
 
Automation with Ansible and Containers
Automation with Ansible and ContainersAutomation with Ansible and Containers
Automation with Ansible and Containers
Rodolfo Carvalho
 
Openstack study-nova-02
Openstack study-nova-02Openstack study-nova-02
Openstack study-nova-02
Jinho Shin
 
Openstack Overview
Openstack OverviewOpenstack Overview
Openstack Overview
rajdeep
 
Consuming Cinder from Docker
Consuming Cinder from DockerConsuming Cinder from Docker
Consuming Cinder from Docker
Tesora
 
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with Prometheus
Shiao-An Yuan
 

Viewers also liked (15)

MySQL At Yelp
MySQL At YelpMySQL At Yelp
MySQL At Yelp
Yelp Engineering
 
"Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presen...
"Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presen..."Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presen...
"Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presen...
Yelp Engineering
 
Scaling Traffic from 0 to 139 Million Unique Visitors
Scaling Traffic from 0 to 139 Million Unique VisitorsScaling Traffic from 0 to 139 Million Unique Visitors
Scaling Traffic from 0 to 139 Million Unique Visitors
Yelp Engineering
 
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E..."Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...
Yelp Engineering
 
Giving Design Critique
Giving Design CritiqueGiving Design Critique
Giving Design Critique
Yelp Engineering
 
Yelp Academic Dataset
Yelp Academic DatasetYelp Academic Dataset
Yelp Academic Dataset
MandaniKeyur
 
Humans by the hundred
Humans by the hundredHumans by the hundred
Humans by the hundred
Yelp Engineering
 
Building a smarter application Stack by Tomas Doran from Yelp
Building a smarter application Stack by Tomas Doran from YelpBuilding a smarter application Stack by Tomas Doran from Yelp
Building a smarter application Stack by Tomas Doran from Yelp
dotCloud
 
Hybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsHybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS Applications
Steven Francia
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
Brendan Gregg
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
Brendan Gregg
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016
Brendan Gregg
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
Brendan Gregg
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf tools
Brendan Gregg
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
Brendan Gregg
 
"Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presen...
"Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presen..."Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presen...
"Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presen...
Yelp Engineering
 
Scaling Traffic from 0 to 139 Million Unique Visitors
Scaling Traffic from 0 to 139 Million Unique VisitorsScaling Traffic from 0 to 139 Million Unique Visitors
Scaling Traffic from 0 to 139 Million Unique Visitors
Yelp Engineering
 
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E..."Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...
Yelp Engineering
 
Yelp Academic Dataset
Yelp Academic DatasetYelp Academic Dataset
Yelp Academic Dataset
MandaniKeyur
 
Building a smarter application Stack by Tomas Doran from Yelp
Building a smarter application Stack by Tomas Doran from YelpBuilding a smarter application Stack by Tomas Doran from Yelp
Building a smarter application Stack by Tomas Doran from Yelp
dotCloud
 
Hybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsHybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS Applications
Steven Francia
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
Brendan Gregg
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
Brendan Gregg
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016
Brendan Gregg
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
Brendan Gregg
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf tools
Brendan Gregg
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
Brendan Gregg
 
Ad

Similar to How Yelp does Service Discovery (20)

Introduction to apache zoo keeper
Introduction to apache zoo keeper Introduction to apache zoo keeper
Introduction to apache zoo keeper
Omid Vahdaty
 
NetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talksNetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talks
Ruslan Meshenberg
 
Troubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issuesTroubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issues
Michael Klishin
 
Open stack HA - Theory to Reality
Open stack HA -  Theory to RealityOpen stack HA -  Theory to Reality
Open stack HA - Theory to Reality
Sriram Subramanian
 
Comparison between zookeeper, etcd 3 and other distributed coordination systems
Comparison between zookeeper, etcd 3 and other distributed coordination systemsComparison between zookeeper, etcd 3 and other distributed coordination systems
Comparison between zookeeper, etcd 3 and other distributed coordination systems
Imesha Sudasingha
 
Scaling Up Logging and Metrics
Scaling Up Logging and MetricsScaling Up Logging and Metrics
Scaling Up Logging and Metrics
Ricardo Lourenço
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
javier ramirez
 
Practice and challenges from building IaaS
Practice and challenges from building IaaSPractice and challenges from building IaaS
Practice and challenges from building IaaS
Shawn Zhu
 
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE PlatformsFIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
FIWARE
 
Uber Real Time Data Analytics
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data Analytics
Ankur Bansal
 
Docker Swarm secrets for creating great FIWARE platforms
Docker Swarm secrets for creating great FIWARE platformsDocker Swarm secrets for creating great FIWARE platforms
Docker Swarm secrets for creating great FIWARE platforms
Federico Michele Facca
 
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Monal Daxini
 
Elasticsearch on Kubernetes
Elasticsearch on KubernetesElasticsearch on Kubernetes
Elasticsearch on Kubernetes
Joerg Henning
 
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITThings You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
OpenStack
 
OpenStack Nova Scheduler
OpenStack Nova Scheduler OpenStack Nova Scheduler
OpenStack Nova Scheduler
Peeyush Gupta
 
Highly Available Load Balanced Galera MySql Cluster
Highly Available Load Balanced  Galera MySql ClusterHighly Available Load Balanced  Galera MySql Cluster
Highly Available Load Balanced Galera MySql Cluster
Amr Fawzy
 
Kubernetes: My BFF
Kubernetes: My BFFKubernetes: My BFF
Kubernetes: My BFF
Jonathan Yu
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
Allen (Xiaozhong) Wang
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
Steven Wu
 
Developing Microservices using Spring - Beginner's Guide
Developing Microservices using Spring - Beginner's GuideDeveloping Microservices using Spring - Beginner's Guide
Developing Microservices using Spring - Beginner's Guide
Mohanraj Thirumoorthy
 
Introduction to apache zoo keeper
Introduction to apache zoo keeper Introduction to apache zoo keeper
Introduction to apache zoo keeper
Omid Vahdaty
 
NetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talksNetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talks
Ruslan Meshenberg
 
Troubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issuesTroubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issues
Michael Klishin
 
Open stack HA - Theory to Reality
Open stack HA -  Theory to RealityOpen stack HA -  Theory to Reality
Open stack HA - Theory to Reality
Sriram Subramanian
 
Comparison between zookeeper, etcd 3 and other distributed coordination systems
Comparison between zookeeper, etcd 3 and other distributed coordination systemsComparison between zookeeper, etcd 3 and other distributed coordination systems
Comparison between zookeeper, etcd 3 and other distributed coordination systems
Imesha Sudasingha
 
Scaling Up Logging and Metrics
Scaling Up Logging and MetricsScaling Up Logging and Metrics
Scaling Up Logging and Metrics
Ricardo Lourenço
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
javier ramirez
 
Practice and challenges from building IaaS
Practice and challenges from building IaaSPractice and challenges from building IaaS
Practice and challenges from building IaaS
Shawn Zhu
 
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE PlatformsFIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
FIWARE
 
Uber Real Time Data Analytics
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data Analytics
Ankur Bansal
 
Docker Swarm secrets for creating great FIWARE platforms
Docker Swarm secrets for creating great FIWARE platformsDocker Swarm secrets for creating great FIWARE platforms
Docker Swarm secrets for creating great FIWARE platforms
Federico Michele Facca
 
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Monal Daxini
 
Elasticsearch on Kubernetes
Elasticsearch on KubernetesElasticsearch on Kubernetes
Elasticsearch on Kubernetes
Joerg Henning
 
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITThings You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
OpenStack
 
OpenStack Nova Scheduler
OpenStack Nova Scheduler OpenStack Nova Scheduler
OpenStack Nova Scheduler
Peeyush Gupta
 
Highly Available Load Balanced Galera MySql Cluster
Highly Available Load Balanced  Galera MySql ClusterHighly Available Load Balanced  Galera MySql Cluster
Highly Available Load Balanced Galera MySql Cluster
Amr Fawzy
 
Kubernetes: My BFF
Kubernetes: My BFFKubernetes: My BFF
Kubernetes: My BFF
Jonathan Yu
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
Steven Wu
 
Developing Microservices using Spring - Beginner's Guide
Developing Microservices using Spring - Beginner's GuideDeveloping Microservices using Spring - Beginner's Guide
Developing Microservices using Spring - Beginner's Guide
Mohanraj Thirumoorthy
 
Ad

Recently uploaded (20)

Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 

How Yelp does Service Discovery

  • 1. SmartStack, Docker and Yocalhost How Yelp Does Service Discovery
  • 3. ● This works from (almost) any host in Yelp ● This works from Python, Java, command line etc. ● If a service supports HTTP or TCP then it can be made discoverable. ○ This includes third-party services such as MySQL and scribe ● It’s dynamic: for a given service, if new instances are added then they will automatically become available. Very Important Things to Note
  • 4. ● SmartStack (nerve and synapse) were written by Airbnb ● We’ve added some features ● The work here has been carried out by many people across Yelp Credits
  • 7. Nerve registers service instance in ZooKeeper: /nerve/region:myregion ├── service_1 │ └── server_1_0000013614 ├── service_2 │ └── server_1_0000000959 ├── service_3 │ ├── server_1_0000002468 │ └── server_2_0000002467 [...] ZooKeeper data
  • 8. The data in a znode is all that is required to connect to the corresponding service instance. We’ll shortly see how this is used for discovery. { "host":"10.0.0.123", "port":31337, "name":"server_1", "weight":10, } ZooKeeper data
  • 9. hacheck Normally hacheck just acts as a transparent proxy for our healthchecks: $ curl -s yocalhost:6666/http/service_1/1234/status | jq . { "uptime": 5693819.315988064, "pid": 2595160, "host": "server_1", "version": "b6309e09d71da8f1e28213d251f7c3515878caca", }
  • 10. hacheck We can also use it to fail healthchecks before we shut down a service. This allows us to gracefully shutdown a service. (Also provides a 1s cache to limit healthcheck rate.) $ hadown service_1 $ curl -v yocalhost:6666/http/service_1/1234/status Service service_1 in down state since 1443217910: billings
  • 11. configure_nerve.py How do we know what services to advertise? Every service host periodically runs a script to regenerate the nerve configuration, reading from the following sources: ● yelpsoa-configs runs_on: server_1 server_2 ● puppet nerve_simple::puppet_service {'foo'} ● mesos slave API
  • 14. HAProxy ● By default bind to 0.0.0.0 ● Bind only to yocalhost on public servers. ● HAProxy gives us a lot of goodies for all clients: ○ Redispatch on connection failures ○ Zero-downtime restarts (once you know how :) ○ Easy to insert connection logging ● Each host also exposes an HAProxy status page for easy introspection
  • 15. configure_synapse.py Every client host periodically runs a script to regenerate the synapse configuration, reading service definitions from yelpsoa-configs. For each service reads a smartstack.yaml file. Restarts synapse if configuration has changed.
  • 17. Namespaces main: proxy_port: 20001 mode: http healthcheck_uri: /status timeout_server_ms: 1000 long_timeout: proxy_port: 20002 mode: http healthcheck_uri: /status timeout_server_ms: 3000 Same service, different ports
  • 18. Escape hatch Some client libraries like to do their own load balancing e.g. cassandra, memcached. Use synapse to dump the registration information to disk: $ cat /var/run/synapse/services/devops.demo.json | jq . [ { "host":"10.0.0.123", "port":31337, "name":"server_1", "weight":10, } ]
  • 20. Architecture haproxy docker container 1 lo 127.0.0.1 docker container 2 lo 127.0.0.1 eth0 169.254.14.17 eth0 169.254.14.18 docker0 169.254.1.1 eth0 10.0.1.2 lo:0 169.254.255.254 lo 127.0.0.1
  • 21. yocalhost ● We’d like to run only one nerve / synapse / haproxy per host ● What address should we bind haproxy to? ● 127.0.0.1 won’t work from within a container ● Instead we pick a link-local address 169.254.255.254 (yocalhost) ● This also works on servers without docker
  • 23. Overview We run services in both our own datacenters as well as AWS. We logically group these environments according to latency. Service authors get to decide how ‘widely’ their service instances are advertised. Everything is controlled via smartstack.yaml files.
  • 24. Latency hierarchies habitat region superregion ZooKeepers live here Datacenters or AZs in AWS Habitats within 1ms round-trip e.g. ‘us-west-1’ Regions within 5ms round-trip e.g. ‘pacific north-west’
  • 25. main: proxy_port: 20973 advertise: [habitat] discover: habitat advertise / discover Synapse should look in the habitat directory in its local ZooKeeper Nerve should register this service in the habitat directory of its local ZooKeeper
  • 26. ZooKeeper data, revisited /nerve ├── region:us-west-1 │ └── service_1 │ └── server_1_0000013614 ├── region:us-west-2 │ └── service_2 │ └── server_2_0000000959 [...]
  • 27. Extra advertisements “Wouldn’t it be useful if we could make a service running in datacenter A available in an (arbitrary) datacenter B?” Why? ● Makes it easier to bring up a new datacenter ● Makes it easier to add more capacity to a datacenter in an emergency ● Makes it easier to keep a datacenter going in an emergency if a service fails
  • 30. Unix 4eva ● Lots of little components, each doing doing one thing well ● Very simple interface for clients and services ○ If it speaks TCP or HTTP we can register it ● Easy to independently replace components ○ HAProxy -> NGINX? ● Easy to observe behavior of components
  • 31. It’s OK if ZooKeeper fails ● Nerve and Synapse keep retrying ● HAProxy keeps running but with no updates ● HAProxy performs its own healthchecks against service instances ○ If a service instance becomes unavailable then it will stop receiving traffic after a short period ● The website stays up :)
  • 32. Does it blend scale? ● Used to have scaling issues with internal load balancers, this is not a problem with SmartStack :) ● Hit some scaling issues at 10s of thousands of ZooKeeper connections ○ Addressed this by using just a single ZooKeeper connection from each nerve and synapse ● Used to have lots of HAProxy healthchecks hitting services ○ hacheck insulates services from this ○ We limit HAProxy restart rate
  • 33. What about etcd / consul / …? ● We try to use boring components :) ● We’re already using Zookeeper for Kafka and ElasticSearch so it’s natural to use it for our service discovery system too. ● etcd would probably also work, and is supported by SmartStack ● Conceptually similar to consul / consul-template
  • 34. What about DNS? ● What TTL are you going to use? ● Are you clients even going to honor the TTL? ● Does the DNS resolution happen inline with requests?
  • 35. Conclusions ● We’ve used SmartStack to create a robust service discovery system ● It’s UNIXy: lots of separate components, each doing one thing well ● It’s flexible: locality-aware discovery ● It’s reliable: new devs at Yelp view discovery as a solved problem ● It’s useful: SmartStack is the glue that holds our SOA together