SlideShare a Scribd company logo
The
Automation
  Factory


    nathan@milford.io
    blog.milford.io
    twitter.com/NathanMilford
    github.com/nmilford
This is NOT strictly a
  Cassandra talk.



  ♫ There's no earthly way of knowing ♫
This is an infrastructure talk.




       ♫ How your infrastructure's growing. ♫
Startups move fast.

        Priorities change.

Infrastructure needs to be able to
            pivot, too.


        ♫ Who knows where business is going.
         Or which way the data's flowing. ♫
When you scale up,

so do your problems.




     ♫ Drives imploding?
      IO plateauing? ♫
Not to mention unexpected
           disasters.




We lost a whole data center during
         Hurricane Sandy.
          ♫ Is a hurricane a'blowing? ♫
How do you keep up with growth?




       ♫ There's no earthly way of knowing ♫
How do you deal with failure?




      ♫ Are the status LEDs a 'glowing?
       Is the server reaper mowing? ♫
How do you deal with too much
          success?




      ♫ Yes! The danger must be growing
       For the data keeps on flowing. ♫
What do you do?




♫ And they're certainly not showing
 any signs that they are slowing! ♫
Hold your breath.

  Make a wish.

 Automate!
♫ Come with me
            And you'll be
In a world of systems automation ♫
♫ Take a look
            And you’ll see
      Into my Chef lucubrations

        So login, Install, begin
With the Chef cookbook of my creation
    What you'll see might require
            Explanation ♫
♫ If you want to view paradise
 Simply go to Github and view it
 Pull requests welcome, go to it
    Want to change the code
       A merge will do it ♫




        https://github.com/linkedin/glu/
       https://github.com/octo/collectd/
       https://github.com/opscode/chef/
      https://github.com/saltstack/salt/
    https://github.com/outbrain/onering/
 https://github.com/nmilford/chef-cassandra/
https://github.com/rabbitmq/rabbitmq-server/
def discover_cassandra_schema
  require 'cassandra-cql'
  schema = {}
  server = "#{node[:ipaddress]}:#{node[:Cassandra][:rpc_port]}"

  db = CassandraCQL::Database.new("#{server}") rescue nil
  if db
    db.keyspaces.collect{|s| schema[s.name] =
s.column_families.collect{|cfname, cfobj| cfname } }
    schema.delete("system")
    schema.delete("OpsCenter")
    return schema                     ♫ There is no life I know
  end                            To compare with writing automation
  return nil                               Write it once
end                                       You’ll be free♫
*clickity*

*clickity*

*clickity*

♫ To play Diablo 3 ♫
♫ If you want to scale past a petabyte
 Just install Chef, Salt and Graphite
If you want to sleep the whole night
         Automate the world
          It will be all right♫
♫ There is no life I know
To compare with writing automation
          Write it once
         You’ll be free ♫
♫ If you truly wish to be.♫
The
     Automation
       Factory
     A Journey from Bare Metal
     to Active Cassandra Node


nathan@milford.io
blog.milford.io
twitter.com/NathanMilford
github.com/nmilford
Cassandra NYC 2011




http://www.slideshare.net/nmilford/cassandra-for-sysadmins
2 Years Later
●   80 billion impressions a month.

●   4 clusters for disparate
    use-cases, more in planning.

●   73 Cassandra nodes
    across 3 data centers.
Mo' Servers,
   Mo' Problems

We got multiple cages of servers.


   So... yeah... you can see where
     automation might help :)
Automation Attack Plan




                     ●
                         Provisioning!
●
    Orchestration!           ●
                                 Command and Control!
●
    Config Management! ● Monitoring and Alerting!
Provisioning
●
    Started with Cobbler (which is Awesome!)
●
    High performance infrastructures are snowflakes,
    can get out of hand fast.




●
    No tool that worked completely, end to end, the
    tool won't write itself.
We Built Our Own: Onering




Note: I am only a moderate Lord of the Rings Fan, and the guy who did most of the work on it, Gary Hetzel, is a
Star Trek fan. We are not responsible for any LotR puns.
                                https://github.com/outbrain/onering/
Onering: Provisioning &
    Orchestration
       ●
           Initiates/manages provisioning
           and inventory.
       ●
           Acts as an orchestration layer in
           our automation.
       ●
           Keeps all metadata, which is
           searchable.
       ●
           Has a CLI tool and REST API to
           work with.
       ●
           Acts as our single point of truth
           & final authority on state.
The Automation Factory
Onering Provisioning Workflow
➔
 Developers put in machine requests by role for
quarterly order.
➔
    Machines show up, get racked and powered on.
➔
 Machines boot into the Razor microkernel and report to
Onering.
➔
  Appropriate nodes get kickstarted & bootstrapped into
roles specified.
➔
    Additional nodes sit idle in 'allocatable' state.
➔
    Once OS is installed, configuration is handed off to...
Config Management: Chef
●
  Onering bootstraps into a Chef run.
●
  Chef installs all the system stuff.
●
  Chef sets up Java and tunes the OS how we like.
●
  Chef runs the Cassandra Cookbook.
include_recipe "java"

package "apache-cassandra1" do
  action :install
end

template "/etc/cassandra/conf/cassandra.yaml" do
  owner "cassandra"
  group "cassandra"
  mode "0755"
  source "cassandra.yaml.erb"
end



                        https://github.com/opscode/chef/
Cassandra Cookbook does it all!
                          ●
                              Builds/mounts disks.
                          ●
                              Handles multiple clusters,
                              different versions.
                          ●
                              Generates configs (in some
                              cases automatically based
                              on hardware profile).
                          ●
                              Connects to local instance
                              and gets the schema.
                          ●
                              Generates collectd config
                              and maintenance script.
                          ●
                              Schedules maintenance.
         https://github.com/nmilford/chef-cassandra
Glu: Continuous Deployment
                ●   Not related to getting a C* node
                    to production, but it's how we get
                    apps there.
                ●   Built at Linkedin.
                ●   Onering talks to it!
●
  Holds deployment metadata.
●
  Maven Builds an RPM, dumps to a repo.
●
  Glu-Agent yum installs it and performs checks.


                    https://github.com/linkedin/glu
The Automation Factory
The Automation Factory
Command & Control:
Distributed commands:
salt '*ny*' cassandra.column_families
salt 'cass*' cassandra.compactionstats
salt '*stg*' cassandra.info
salt 'cass1.ny.*' cassandra.keyspaces
salt -E 'cass1-(stg|prod)' cassandra.netstats
salt '*' cassandra.tpstats

Scary commands:
salt '*' --batch-size 25% service.restart cassandra
salt '*' -b2 cmd.run "nodetool -h $(hostname) -p 7199 snapshot"

We actually wrap salt in Onering to provide AAA, as well to allow use of Onering
metadata for node targeting.

                              https://github.com/saltstack/salt
Monitoring




 Is Hard...
Common Monitoring & Events Bus
●
    A single infrastructure-wide bus for systems
    data:
    –   Metrics
    –   Events
    –   Metadata
●
    Collectd as systems agent.
●
    RabbitMQ as message bus.
●
    Graphite as metrics endpoint.
●
    Working on an events mechanism.
●
    Each layer should be interchangeable.
Collectd
 ●
     Been around forever.
 ●
     Had to rebuild the JMX plugin to not use OpenJDK.
 ●
     Easy to write plugins and extend.
 ●
     Writes to RabbitMQ out of the box.
 ●
     Easy to templatize config for Chef.
<% @node[:Cassandra][:Keyspaces].each do |ks| -%>
<%    ks[1].each do |cf| -%>
       Collect "<%= ks[0] %>.<%= cf %>"
       Collect "KeyCache.<%= ks[0] %>.<%= cf %>"
       Collect "RowCache.<%= ks[0] %>.<%= cf %>"
<%    end -%>
<% end -%>
                        https://github.com/octo/collectd
RabbitMQ
●
    Lots of apps support AMPQ.
●
    Shovel plugin for multi-site.
●
    Pretty stable.
●
    I'm not mad at it.




                https://github.com/rabbitmq/rabbitmq-server
Graphite




●
    Plays well with RabbitMQ.
●
    Easy to get metrics into.
●
    Scads of functions.
●
    Easy to get meaningful data out of.


                      https://launchpad.net/graphite
Graphite Render, Activate!
http://graphite/render?
Width=800
&height=600
&from=-2hours
&until=now
&target=sortByMaxima(highestCurrent(collectd.machines
.*.cass2*.GenericJMX.ReadStage.PendingTasks,5))
&target=sortByMaxima(highestCurrent(collectd.machines
.*.cass2*.GenericJMX.MutationStage.PendingTasks,5))
&hideLegend=false
The Automation Factory
Alerting: Nagios Self Serve
●
    Uses Onering for new node discovery.
●
    Developers add their own alerts based off of
    Graphite data.
●
    Ops get fewer alerts and are not a bottleneck.
●
    Devs are more engaged.
●
    Everyone is happy.
Questions?
Ad

More Related Content

What's hot (20)

Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
DataStax
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
DataStax
 
C* Summit 2013: Cassandra at Instagram by Rick Branson
C* Summit 2013: Cassandra at Instagram by Rick BransonC* Summit 2013: Cassandra at Instagram by Rick Branson
C* Summit 2013: Cassandra at Instagram by Rick Branson
DataStax Academy
 
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
DataStax
 
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
DataStax
 
Introduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyIntroduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and Consistency
Benjamin Black
 
Understanding AntiEntropy in Cassandra
Understanding AntiEntropy in CassandraUnderstanding AntiEntropy in Cassandra
Understanding AntiEntropy in Cassandra
Jason Brown
 
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
DataStax
 
Brisk hadoop june2011
Brisk hadoop june2011Brisk hadoop june2011
Brisk hadoop june2011
srisatish ambati
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMX
zznate
 
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Spark Summit
 
Bulk Loading into Cassandra
Bulk Loading into CassandraBulk Loading into Cassandra
Bulk Loading into Cassandra
Brian Hess
 
Cassandra Community Webinar: Apache Cassandra Internals
Cassandra Community Webinar: Apache Cassandra InternalsCassandra Community Webinar: Apache Cassandra Internals
Cassandra Community Webinar: Apache Cassandra Internals
DataStax
 
Load testing Cassandra applications
Load testing Cassandra applicationsLoad testing Cassandra applications
Load testing Cassandra applications
Ben Slater
 
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsLeveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
Julien Anguenot
 
Why your Spark job is failing
Why your Spark job is failingWhy your Spark job is failing
Why your Spark job is failing
Sandy Ryza
 
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark Summit
 
Real-time data analytics with Cassandra at iland
Real-time data analytics with Cassandra at ilandReal-time data analytics with Cassandra at iland
Real-time data analytics with Cassandra at iland
Julien Anguenot
 
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
DataStax
 
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
DataStax
 
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
DataStax
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
DataStax
 
C* Summit 2013: Cassandra at Instagram by Rick Branson
C* Summit 2013: Cassandra at Instagram by Rick BransonC* Summit 2013: Cassandra at Instagram by Rick Branson
C* Summit 2013: Cassandra at Instagram by Rick Branson
DataStax Academy
 
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
DataStax
 
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
DataStax
 
Introduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyIntroduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and Consistency
Benjamin Black
 
Understanding AntiEntropy in Cassandra
Understanding AntiEntropy in CassandraUnderstanding AntiEntropy in Cassandra
Understanding AntiEntropy in Cassandra
Jason Brown
 
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
DataStax
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMX
zznate
 
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Spark Summit
 
Bulk Loading into Cassandra
Bulk Loading into CassandraBulk Loading into Cassandra
Bulk Loading into Cassandra
Brian Hess
 
Cassandra Community Webinar: Apache Cassandra Internals
Cassandra Community Webinar: Apache Cassandra InternalsCassandra Community Webinar: Apache Cassandra Internals
Cassandra Community Webinar: Apache Cassandra Internals
DataStax
 
Load testing Cassandra applications
Load testing Cassandra applicationsLoad testing Cassandra applications
Load testing Cassandra applications
Ben Slater
 
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsLeveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
Julien Anguenot
 
Why your Spark job is failing
Why your Spark job is failingWhy your Spark job is failing
Why your Spark job is failing
Sandy Ryza
 
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark Summit
 
Real-time data analytics with Cassandra at iland
Real-time data analytics with Cassandra at ilandReal-time data analytics with Cassandra at iland
Real-time data analytics with Cassandra at iland
Julien Anguenot
 
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
DataStax
 
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
DataStax
 

Viewers also liked (7)

Hadoop - Splitting big problems into manageable pieces.
Hadoop - Splitting big problems into manageable pieces.Hadoop - Splitting big problems into manageable pieces.
Hadoop - Splitting big problems into manageable pieces.
Nathan Milford
 
SF ElasticSearch Meetup 2013.04.06 - Monitoring
SF ElasticSearch Meetup 2013.04.06 - MonitoringSF ElasticSearch Meetup 2013.04.06 - Monitoring
SF ElasticSearch Meetup 2013.04.06 - Monitoring
Sushant Shankar
 
Elasticsearch in Production (London version)
Elasticsearch in Production (London version)Elasticsearch in Production (London version)
Elasticsearch in Production (London version)
foundsearch
 
LogStash - Yes, logging can be awesome
LogStash - Yes, logging can be awesomeLogStash - Yes, logging can be awesome
LogStash - Yes, logging can be awesome
James Turnbull
 
Down and dirty with Elasticsearch
Down and dirty with ElasticsearchDown and dirty with Elasticsearch
Down and dirty with Elasticsearch
clintongormley
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in Netflix
Danny Yuan
 
Cassandra+Hadoop
Cassandra+HadoopCassandra+Hadoop
Cassandra+Hadoop
Jeremy Hanna
 
Hadoop - Splitting big problems into manageable pieces.
Hadoop - Splitting big problems into manageable pieces.Hadoop - Splitting big problems into manageable pieces.
Hadoop - Splitting big problems into manageable pieces.
Nathan Milford
 
SF ElasticSearch Meetup 2013.04.06 - Monitoring
SF ElasticSearch Meetup 2013.04.06 - MonitoringSF ElasticSearch Meetup 2013.04.06 - Monitoring
SF ElasticSearch Meetup 2013.04.06 - Monitoring
Sushant Shankar
 
Elasticsearch in Production (London version)
Elasticsearch in Production (London version)Elasticsearch in Production (London version)
Elasticsearch in Production (London version)
foundsearch
 
LogStash - Yes, logging can be awesome
LogStash - Yes, logging can be awesomeLogStash - Yes, logging can be awesome
LogStash - Yes, logging can be awesome
James Turnbull
 
Down and dirty with Elasticsearch
Down and dirty with ElasticsearchDown and dirty with Elasticsearch
Down and dirty with Elasticsearch
clintongormley
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in Netflix
Danny Yuan
 
Ad

Similar to The Automation Factory (20)

Capybara with Rspec
Capybara with RspecCapybara with Rspec
Capybara with Rspec
Omnia Helmi
 
Oracle goldengate and RAC12c
Oracle goldengate and RAC12cOracle goldengate and RAC12c
Oracle goldengate and RAC12c
Siraj Ahmed
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Docker, Inc.
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
dotCloud
 
Deploying Rails Apps with Chef and Capistrano
 Deploying Rails Apps with Chef and Capistrano Deploying Rails Apps with Chef and Capistrano
Deploying Rails Apps with Chef and Capistrano
SmartLogic
 
Riak add presentation
Riak add presentationRiak add presentation
Riak add presentation
Ilya Bogunov
 
Building a serverless company on AWS lambda and Serverless framework
Building a serverless company on AWS lambda and Serverless frameworkBuilding a serverless company on AWS lambda and Serverless framework
Building a serverless company on AWS lambda and Serverless framework
Luciano Mammino
 
Kettunen, miaubiz fuzzing at scale and in style
Kettunen, miaubiz   fuzzing at scale and in styleKettunen, miaubiz   fuzzing at scale and in style
Kettunen, miaubiz fuzzing at scale and in style
DefconRussia
 
Toolbox of a Ruby Team
Toolbox of a Ruby TeamToolbox of a Ruby Team
Toolbox of a Ruby Team
Arto Artnik
 
Os Wilhelm
Os WilhelmOs Wilhelm
Os Wilhelm
oscon2007
 
infra-as-code
infra-as-codeinfra-as-code
infra-as-code
Itamar Hassin
 
Virtualization and Cloud Computing with Elastic Server On Demand
Virtualization and Cloud Computing with Elastic Server On DemandVirtualization and Cloud Computing with Elastic Server On Demand
Virtualization and Cloud Computing with Elastic Server On Demand
Yan Pritzker
 
Methods of Sharding MySQL
Methods of Sharding MySQLMethods of Sharding MySQL
Methods of Sharding MySQL
Laine Campbell
 
FreeBSD: Dev to Prod
FreeBSD: Dev to ProdFreeBSD: Dev to Prod
FreeBSD: Dev to Prod
Sean Chittenden
 
Критика "библиотечного" подхода в разработке под Android. UA Mobile 2016.
Критика "библиотечного" подхода в разработке под Android. UA Mobile 2016.Критика "библиотечного" подхода в разработке под Android. UA Mobile 2016.
Критика "библиотечного" подхода в разработке под Android. UA Mobile 2016.
UA Mobile
 
Upgrade Kubernetes the boring way
Upgrade Kubernetes the boring wayUpgrade Kubernetes the boring way
Upgrade Kubernetes the boring way
Oleksandr Slynko
 
Adventures in Thread-per-Core Async with Redpanda and Seastar
Adventures in Thread-per-Core Async with Redpanda and SeastarAdventures in Thread-per-Core Async with Redpanda and Seastar
Adventures in Thread-per-Core Async with Redpanda and Seastar
ScyllaDB
 
Fisl - Deployment
Fisl - DeploymentFisl - Deployment
Fisl - Deployment
Fabio Akita
 
Analyze Virtual Machine Overhead Compared to Bare Metal with Tracing
Analyze Virtual Machine Overhead Compared to Bare Metal with TracingAnalyze Virtual Machine Overhead Compared to Bare Metal with Tracing
Analyze Virtual Machine Overhead Compared to Bare Metal with Tracing
ScyllaDB
 
Gdb basics for my sql db as (percona live europe 2019)
Gdb basics for my sql db as (percona live europe 2019)Gdb basics for my sql db as (percona live europe 2019)
Gdb basics for my sql db as (percona live europe 2019)
Valerii Kravchuk
 
Capybara with Rspec
Capybara with RspecCapybara with Rspec
Capybara with Rspec
Omnia Helmi
 
Oracle goldengate and RAC12c
Oracle goldengate and RAC12cOracle goldengate and RAC12c
Oracle goldengate and RAC12c
Siraj Ahmed
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Docker, Inc.
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
dotCloud
 
Deploying Rails Apps with Chef and Capistrano
 Deploying Rails Apps with Chef and Capistrano Deploying Rails Apps with Chef and Capistrano
Deploying Rails Apps with Chef and Capistrano
SmartLogic
 
Riak add presentation
Riak add presentationRiak add presentation
Riak add presentation
Ilya Bogunov
 
Building a serverless company on AWS lambda and Serverless framework
Building a serverless company on AWS lambda and Serverless frameworkBuilding a serverless company on AWS lambda and Serverless framework
Building a serverless company on AWS lambda and Serverless framework
Luciano Mammino
 
Kettunen, miaubiz fuzzing at scale and in style
Kettunen, miaubiz   fuzzing at scale and in styleKettunen, miaubiz   fuzzing at scale and in style
Kettunen, miaubiz fuzzing at scale and in style
DefconRussia
 
Toolbox of a Ruby Team
Toolbox of a Ruby TeamToolbox of a Ruby Team
Toolbox of a Ruby Team
Arto Artnik
 
Virtualization and Cloud Computing with Elastic Server On Demand
Virtualization and Cloud Computing with Elastic Server On DemandVirtualization and Cloud Computing with Elastic Server On Demand
Virtualization and Cloud Computing with Elastic Server On Demand
Yan Pritzker
 
Methods of Sharding MySQL
Methods of Sharding MySQLMethods of Sharding MySQL
Methods of Sharding MySQL
Laine Campbell
 
Критика "библиотечного" подхода в разработке под Android. UA Mobile 2016.
Критика "библиотечного" подхода в разработке под Android. UA Mobile 2016.Критика "библиотечного" подхода в разработке под Android. UA Mobile 2016.
Критика "библиотечного" подхода в разработке под Android. UA Mobile 2016.
UA Mobile
 
Upgrade Kubernetes the boring way
Upgrade Kubernetes the boring wayUpgrade Kubernetes the boring way
Upgrade Kubernetes the boring way
Oleksandr Slynko
 
Adventures in Thread-per-Core Async with Redpanda and Seastar
Adventures in Thread-per-Core Async with Redpanda and SeastarAdventures in Thread-per-Core Async with Redpanda and Seastar
Adventures in Thread-per-Core Async with Redpanda and Seastar
ScyllaDB
 
Fisl - Deployment
Fisl - DeploymentFisl - Deployment
Fisl - Deployment
Fabio Akita
 
Analyze Virtual Machine Overhead Compared to Bare Metal with Tracing
Analyze Virtual Machine Overhead Compared to Bare Metal with TracingAnalyze Virtual Machine Overhead Compared to Bare Metal with Tracing
Analyze Virtual Machine Overhead Compared to Bare Metal with Tracing
ScyllaDB
 
Gdb basics for my sql db as (percona live europe 2019)
Gdb basics for my sql db as (percona live europe 2019)Gdb basics for my sql db as (percona live europe 2019)
Gdb basics for my sql db as (percona live europe 2019)
Valerii Kravchuk
 
Ad

Recently uploaded (20)

Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 

The Automation Factory

  • 1. The Automation Factory [email protected] blog.milford.io twitter.com/NathanMilford github.com/nmilford
  • 2. This is NOT strictly a Cassandra talk. ♫ There's no earthly way of knowing ♫
  • 3. This is an infrastructure talk. ♫ How your infrastructure's growing. ♫
  • 4. Startups move fast. Priorities change. Infrastructure needs to be able to pivot, too. ♫ Who knows where business is going. Or which way the data's flowing. ♫
  • 5. When you scale up, so do your problems. ♫ Drives imploding? IO plateauing? ♫
  • 6. Not to mention unexpected disasters. We lost a whole data center during Hurricane Sandy. ♫ Is a hurricane a'blowing? ♫
  • 7. How do you keep up with growth? ♫ There's no earthly way of knowing ♫
  • 8. How do you deal with failure? ♫ Are the status LEDs a 'glowing? Is the server reaper mowing? ♫
  • 9. How do you deal with too much success? ♫ Yes! The danger must be growing For the data keeps on flowing. ♫
  • 10. What do you do? ♫ And they're certainly not showing any signs that they are slowing! ♫
  • 11. Hold your breath. Make a wish. Automate!
  • 12. ♫ Come with me And you'll be In a world of systems automation ♫
  • 13. ♫ Take a look And you’ll see Into my Chef lucubrations So login, Install, begin With the Chef cookbook of my creation What you'll see might require Explanation ♫
  • 14. ♫ If you want to view paradise Simply go to Github and view it Pull requests welcome, go to it Want to change the code A merge will do it ♫ https://github.com/linkedin/glu/ https://github.com/octo/collectd/ https://github.com/opscode/chef/ https://github.com/saltstack/salt/ https://github.com/outbrain/onering/ https://github.com/nmilford/chef-cassandra/ https://github.com/rabbitmq/rabbitmq-server/
  • 15. def discover_cassandra_schema require 'cassandra-cql' schema = {} server = "#{node[:ipaddress]}:#{node[:Cassandra][:rpc_port]}" db = CassandraCQL::Database.new("#{server}") rescue nil if db db.keyspaces.collect{|s| schema[s.name] = s.column_families.collect{|cfname, cfobj| cfname } } schema.delete("system") schema.delete("OpsCenter") return schema ♫ There is no life I know end To compare with writing automation return nil Write it once end You’ll be free♫
  • 17. ♫ If you want to scale past a petabyte Just install Chef, Salt and Graphite If you want to sleep the whole night Automate the world It will be all right♫
  • 18. ♫ There is no life I know To compare with writing automation Write it once You’ll be free ♫
  • 19. ♫ If you truly wish to be.♫
  • 20. The Automation Factory A Journey from Bare Metal to Active Cassandra Node [email protected] blog.milford.io twitter.com/NathanMilford github.com/nmilford
  • 22. 2 Years Later ● 80 billion impressions a month. ● 4 clusters for disparate use-cases, more in planning. ● 73 Cassandra nodes across 3 data centers.
  • 23. Mo' Servers, Mo' Problems We got multiple cages of servers. So... yeah... you can see where automation might help :)
  • 24. Automation Attack Plan ● Provisioning! ● Orchestration! ● Command and Control! ● Config Management! ● Monitoring and Alerting!
  • 25. Provisioning ● Started with Cobbler (which is Awesome!) ● High performance infrastructures are snowflakes, can get out of hand fast. ● No tool that worked completely, end to end, the tool won't write itself.
  • 26. We Built Our Own: Onering Note: I am only a moderate Lord of the Rings Fan, and the guy who did most of the work on it, Gary Hetzel, is a Star Trek fan. We are not responsible for any LotR puns. https://github.com/outbrain/onering/
  • 27. Onering: Provisioning & Orchestration ● Initiates/manages provisioning and inventory. ● Acts as an orchestration layer in our automation. ● Keeps all metadata, which is searchable. ● Has a CLI tool and REST API to work with. ● Acts as our single point of truth & final authority on state.
  • 29. Onering Provisioning Workflow ➔ Developers put in machine requests by role for quarterly order. ➔ Machines show up, get racked and powered on. ➔ Machines boot into the Razor microkernel and report to Onering. ➔ Appropriate nodes get kickstarted & bootstrapped into roles specified. ➔ Additional nodes sit idle in 'allocatable' state. ➔ Once OS is installed, configuration is handed off to...
  • 30. Config Management: Chef ● Onering bootstraps into a Chef run. ● Chef installs all the system stuff. ● Chef sets up Java and tunes the OS how we like. ● Chef runs the Cassandra Cookbook. include_recipe "java" package "apache-cassandra1" do action :install end template "/etc/cassandra/conf/cassandra.yaml" do owner "cassandra" group "cassandra" mode "0755" source "cassandra.yaml.erb" end https://github.com/opscode/chef/
  • 31. Cassandra Cookbook does it all! ● Builds/mounts disks. ● Handles multiple clusters, different versions. ● Generates configs (in some cases automatically based on hardware profile). ● Connects to local instance and gets the schema. ● Generates collectd config and maintenance script. ● Schedules maintenance. https://github.com/nmilford/chef-cassandra
  • 32. Glu: Continuous Deployment ● Not related to getting a C* node to production, but it's how we get apps there. ● Built at Linkedin. ● Onering talks to it! ● Holds deployment metadata. ● Maven Builds an RPM, dumps to a repo. ● Glu-Agent yum installs it and performs checks. https://github.com/linkedin/glu
  • 35. Command & Control: Distributed commands: salt '*ny*' cassandra.column_families salt 'cass*' cassandra.compactionstats salt '*stg*' cassandra.info salt 'cass1.ny.*' cassandra.keyspaces salt -E 'cass1-(stg|prod)' cassandra.netstats salt '*' cassandra.tpstats Scary commands: salt '*' --batch-size 25% service.restart cassandra salt '*' -b2 cmd.run "nodetool -h $(hostname) -p 7199 snapshot" We actually wrap salt in Onering to provide AAA, as well to allow use of Onering metadata for node targeting. https://github.com/saltstack/salt
  • 37. Common Monitoring & Events Bus ● A single infrastructure-wide bus for systems data: – Metrics – Events – Metadata ● Collectd as systems agent. ● RabbitMQ as message bus. ● Graphite as metrics endpoint. ● Working on an events mechanism. ● Each layer should be interchangeable.
  • 38. Collectd ● Been around forever. ● Had to rebuild the JMX plugin to not use OpenJDK. ● Easy to write plugins and extend. ● Writes to RabbitMQ out of the box. ● Easy to templatize config for Chef. <% @node[:Cassandra][:Keyspaces].each do |ks| -%> <% ks[1].each do |cf| -%> Collect "<%= ks[0] %>.<%= cf %>" Collect "KeyCache.<%= ks[0] %>.<%= cf %>" Collect "RowCache.<%= ks[0] %>.<%= cf %>" <% end -%> <% end -%> https://github.com/octo/collectd
  • 39. RabbitMQ ● Lots of apps support AMPQ. ● Shovel plugin for multi-site. ● Pretty stable. ● I'm not mad at it. https://github.com/rabbitmq/rabbitmq-server
  • 40. Graphite ● Plays well with RabbitMQ. ● Easy to get metrics into. ● Scads of functions. ● Easy to get meaningful data out of. https://launchpad.net/graphite
  • 43. Alerting: Nagios Self Serve ● Uses Onering for new node discovery. ● Developers add their own alerts based off of Graphite data. ● Ops get fewer alerts and are not a bottleneck. ● Devs are more engaged. ● Everyone is happy.