Using Ansible at Scale to Manage a Public Cloud

Jesse Keating – Linux Systems Engineer IV – Cloud Servers
@iamjkeating
Using Ansible at Scale to Manage
a Public Cloud
06/13/2013 – AnsibleFest

RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Rackspace cares about scale
● Scale of server systems
● Scale of environments
● Scale of engineers

Rackspace Public Cloud
● 4 “Production” regions
– 1 to 8 cells per region
– 250 to 500 nodes per cell
● Nearly 15K “systems” in production
● Another 500~ in CI/pre-production
● Mixed use of copy-pasta pssh scripts, pre-configured
agent actions, jenkins automation, and host-based
config management
● Managed by admins, engineers, developers

Case study: Hotpatch One Production
Environment
● 3900~ compute-nodes
– Spread across 8 cells
– Out of 6000~ total hosts
● Alerting will flood admins
● Output is hard to parse

Ansible Key Features
● Inventory plugin
● Simple process flow
● Reusable playbooks with variable adjustments
● Avoids repeated actions on downed hosts
● Cleaner output

Using Ansible at Scale to Manage a Public Cloud

Ansible Use
● Replacing use of pssh for Random Tasks
● Replacing use of pssh for Expected Tasks (outside
config management)
● Reuse existing inventory content
● Easily bolt together processes such as disabling nagios
alerts prior to execution

Rackspace OpenStack Development
● At least 7 major software projects
– Different feature schedules within each
● One Continuous Integration environment
● One Pre-production environment
● One branch of code that can easily be deployed
● New code deploys every two weeks

Case Study: Create production like
environment to test disruptive product code
change
● 30~ virtual instances
– DB servers
– Rabbit servers
– Service providers
● 40~ capacity nodes
– Hypervisor + nova-compute VM
● Mixed use of fabric, shell scripts, copy-pasta
● No self service

● Intermix local actions and remote actions
● External inventory plugin
● Start from nothing
● API to use directly within another application

Local actions to boot instances

Ansible Use
● Replacing use of fabric, pssh, copy-pasta
● Boot strapping environment to the point where existing
config management can take over
● Freeing up Engineer time by making it self-service
● Freeing up resources by tearing down environments
after use
● Working toward using same process to build out
production environments

Rackspace Engineering
● Between 4K and 6K employees/contractors
● Between 500 and 1K Engineer/Developer types
● Many dozens of summer interns
● Countless groups
● Countless projects
● Rapid team creation / shifting of resources
● Mixed use of Mac OSX and Linux
● Mixed use of automation, configuration, et al tools
● Disjoint ownership of engineering onboarding

Case study: Ozone Onboard
● 30+ git repos
● 5+ utilities w/ configuration
● Permissions to a plethora of services
● Configuration for CI/preprod/prod environments
● Details scattered throughout wiki pages and tribal
knowledge

● Modular Roles
● Minimal dependencies
● OS agnostic
● Idempotent
● Fast
● Easy to use and extend

Ansible Use
● Developer bootstraps their own system by selecting
roles and providing details
● Teams own role definitions within a shared framework
● Repeatable process
– Ansible playbook to clone/update roles
– Second playbook to process roles

Conclusion
● Ansible solves many problems Rackspace faces
● Chip away at edges with Ansible, perhaps one day
replace existing config management systems with
Ansible
● Continue to assist in development of Ansible
modules, plugins, and scale testing
● Launch Ansibox soon!

Using Ansible at Scale to Manage a Public Cloud

Recommended

More Related Content

What's hot (20)

Similar to Using Ansible at Scale to Manage a Public Cloud (20)

Recently uploaded (20)

Using Ansible at Scale to Manage a Public Cloud

Editor's Notes