SlideShare a Scribd company logo
Tips, Tricks and Tactics with Cells
and Scaling OpenStack
●
●
●
OpenStack Summit - Paris 2014
Multi-Cell Openstack: How to Evolve your Cloud to Scale
https://www.openstack.org/summit/openstack-paris-summit-2014/session-videos/presentation/multi-cell-
openstack-how-to-evolve-your-cloud-to-scale
Sam Morrison
sam.morrison@unimelb.edu.au
Australian Research Cloud
● Started in 2011
● Funded by the Australian Government
● 8 institutions around the country
● Production early 2012 -
Openstack Diablo
● Now running a mix of Juno
and Icehouse
● Use Ubuntu 14.04 and KVM
● 100Gbps network connecting most sites (AARNET)
Reasons for using cells
Single API endpoint, compute cells dispersed
around australia.
Simpler from users perspective.
Single set of security groups, keypairs etc.
Less openstack expertise needed as only one
version of some core openstack services.
Size
● 8 sites
● 14 cells
● ~6000 users registered
● ~700 hypervisors
● 30,000+ cores
People
● Core team of 3 devops
● 1-2 operators per site
http://status.rc.nectar.org.au/growth/infrastructure/
Interaction with other services
Each cell also has one or more:
● cinder-volume host, using Ceph, LVM and NetApp
backends
● A globally replicated swift region per site
● Glance-api pointing to local swift proxy for images
● Ceilometer collectors at each cell push up to a central
mongo
● Private L3 network spanning all cells - will be useful for
neutron
Cells Infrastructure
Each compute cell has:
● MariaDB Galera cluster
● RabbitMQ cluster
● nova scheduler/cells/vnc/compute/network/api-metadata
● glance-api
● swift region - proxy and storage nodes
● ceilometer-collectors to forward up to a global collector
● cinder-volume
API cell:
● nova-api, nova-cells
● keystone
● glance-registry
● cinder-api, scheduler
● heat, designate, ceilometer-api
Scheduling
● Have some “private” cells only available to certain tenants. This is usually
determined by funding source.
● Global flavours and cell local flavours
● Cell level aggregates for intra cell scheduling
● Some sites have GPUs, fast IO for their own use.
○ Introduced compute and ram optimised flavours
○ Not all cells support all flavours
● Each cell advertises 1 or more availability zones to use in scheduling.
○ Ties in with cinder availability zones
Bringing on new cells
Want to test in production before opening to public
Don’t want to flood brand new cells
Scheduler filters
● Role based access to cell. Cells advertise what roles can schedule to them
● Direct only - Allow public to select that cell directly but global scheduler
doesn’t count it
Operating cells
Have a small openstack cluster to manage all global infrastructure
Standard environment - use puppet
Upgrade cells one at a time - live upgrades
● upgrade compute conductors
● upgrade API cell
● upgrade compute cells
● upgrade compute nodes
Read access to compute cells RabbitMQs for troubleshooting and monitoring.
Only real interface into each of the cells.
Console-log is a good test of cells functionality - have one in each cell and
monitor
Future plans
Move to Neutron - in planning and testing stage
● Currently have a single public network per cell, want to provide tenant
networks and higher level services
Start off with a global neutron and simple shared flat provider networks per cell.
All hypervisors talking to the same rabbit - scale issues?
Also looking at other higher level openstack services (which there are many!)
Belmiro Moreira
belmiro.moreira@cern.ch
@belmiromoreira
CERN - Large Hadron Collider
CERN - Cloud Infrastructure
●
●
○
●
●
○
●
○
○
○
○
CERN - Cloud Infrastructure
●
○
○
○
○
■
●
○
CERN - Prune Nova DBs
●
●
●
●
○
○
○
CERN - Cells scheduling
●
○
●
○
■
■
■
■
CERN - Flavors management
●
○
●
○
■
●
○
○
CERN - Testing Cells
●
○
○
●
○
○
○ …
CERN - Testing Cells
CERN - Testing Cells
Matt Van Winkle
mvanwink@rackspace.com
@mvanwink
www.rackspace.com
Cells at Rackspace
• Managed Cloud company offering a suite of dedicated and cloud hosting products
• Founded in 1998 in San Antonio, TX
• Home of Fanatical Support
• More than 200,000 customers in 120 countries
Rackspace
24www.rackspace.com
• In production since August 2012
– Currently running: Nova; Glance; Neutron; Ironic; Swift; Cinder
• Regular upgrades from trunk
– Package built on trunk pull from mid March in testing now
• Compute nodes are Debian based
– Run as VMs on hypervisors and manage via XAPI
• 6 Geographic regions around the globe
– DFW; ORD; IAD; LON; SYD; HKG
• Numbers
– 10’s of 1000’s of hypervisors (Over 340,000 Cores, Just over 1.2 Petabytes of RAM)
• All XenServer
– Over 170,000 virtual machines
– API per region with multiple Compute cells (3 – 35+) each
Rackspace – Cloud Infrastructure
25www.rackspace.com
• Cells Infrastructure
– Size between ~100 and ~600 hosts per cell
– Different Flavor Types (General Purpose, HIgh I/O, Compute Optimized, etc)
– Working on exposing maintenance zones or near/far scheduling (host, shared IP space, network aggregation)
– Separate DB cluster for each cell
• Run our Cells infrastructure in cells
– Control Plane exists as instances in small OpenStack deployment
– Multiple Hardware types
– Separate tenants – Control plane instances from other internal users
Rackspace – Cloud Infrastructure - Cells
26www.rackspace.com
• Multiple cells within each flavor class
– Hardware Profile
• Additionally, we group by vendor
• Live migration needs matching CPUs
– Range of flavor size within each cell (eg. General Purpose 1, 2, 4 and 8 Gig)
• Tenant Scheduling
– Custom filter schedules by Flavor class first
• All General Purpose cells, for example
– Scheduled by available RAM afterwards
• Enhancements for spreading out tenant load and max IOPs per host
– In some cases, filters can bind a cell to specific tenants (testing and internal use)
• Work in Cells V2 to enhance scheduling
– https://review.openstack.org/#/c/141486/ as one example
27
Cell Scheduling
www.rackspace.com
• Common control plane nodes deployed by ansible play book
– DB Pair
– Cells service
– Scheduler
– Rabbit
• Playbook Populates flavor info based on hardware type
• Hypervisors bootstrapped once CP exists
– Create Compute Node VM
– Deploy Code and configure
– Update routes, etc
• Provision IP blocks
• Test
• Link via playbook
28
Deploying a Cell
www.rackspace.com
• Larger region has run rate
around 50,000 VMs
• 1000’s of VMs created/deleted
per hour in busiest regions
• Downstream BI and Revenue
assurance teams require deleted
instance records be kept for 90
days
• Current deleted instance counts
range between 132,000 and
900,000
29
Rackspace – Purge Nova DBs
www.rackspace.com
30
Rackspace – Purge Nova DBs
www.rackspace.com
• By Pass URL prior to linking a cell up
– Test API endpoint: http://nova-admin-api01.memory1-0002.XXXX.XXXXXX.XXXX:8774/v2
• Full set of tests
– Instance creates, deletes, resizes
– Overlay network creation
– Volume provisioning
– Integration with other RS products
• Trickier to test hosts being added to an existing cell
– Hosts are either enabled or disabled
– Targeting helps
• --hint target_cell=’<cellname>’
• --hint 0z0ne_target_host=<host_name>
31
Testing Cells
www.rackspace.com
• No formal way of disabling a cell
• Weighting helps – but is not absolute
– Weighting cell can still “win” scheduler calculation based on available RAM
• Solution: custom filter uses specific weight offset value to avoid scheduling (- 42)
32
Managing Cells – “Disable”
www.rackspace.com
class DisableCellFilter(filters.BaseCellFilter):
"""Disable cell filter. Drop cell if weight is -42.
"""
def filter_all(self, cells, filter_properties):
"""Override filter_all() which operates on the full list
of cells...
"""
output_cells = []
for cell in cells:
if cell.db_info.get('weight_offset', 0) == -42:
LOG.debug("cell disabled: %s" % cell)
else:
output_cells.append(cell)
return output_cells
33
Managing Cells – “Disable”
www.rackspace.com
• Rackspace uses Quark Plugin
– https://github.com/rackerlabs/quark
• Borrowed old idea from Quantum/Melange days
– Default tenant for each cell
– Each cell is a segment
– Provider subnets are scoped to a segment
– Nova requests ports on provider network for the segment
• Public
• Private
• MAC addresses too
34
Neutron and Cells
www.rackspace.com
?
●
●
●

More Related Content

What's hot (20)

PDF
CERN OpenStack Cloud Control Plane - From VMs to K8s
Belmiro Moreira
 
PDF
Moving from CellsV1 to CellsV2 at CERN
Belmiro Moreira
 
PDF
Future Science on Future OpenStack
Belmiro Moreira
 
PPTX
CERN User Story
Tim Bell
 
PPTX
OpenStack High Availability
Jakub Pavlik
 
PPTX
20170926 cern cloud v4
Tim Bell
 
PDF
Evolution of Openstack Networking at CERN
Belmiro Moreira
 
PPTX
How to Develop OpenStack
Mehdi Ali Soltani
 
PPTX
OpenStack Nova - Developer Introduction
John Garbutt
 
PPT
Euro ht condor_alahiff
vandersantiago
 
PPTX
Integrating Bare-metal Provisioning into CERN's Private Cloud
Arne Wiebalck
 
PDF
OpenStack Summit Vancouver: Lessons learned on upgrades
Frédéric Lepied
 
PDF
TripleO
Kiran Murari
 
PPTX
Operational War Stories from 5 Years of Running OpenStack in Production
Arne Wiebalck
 
PDF
Sanger OpenStack presentation March 2017
Dave Holland
 
PPTX
Enhancing OpenStack FWaaS for real world application
openstackindia
 
PPTX
How OpenStack is Built - Anton Weiss - OpenStack Day Israel 2016
Cloud Native Day Tel Aviv
 
PPTX
RENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hair
John Constable
 
PPTX
OpenStack HA
Kenneth Hui
 
PDF
OpenStack Data Processing ("Sahara") project update - December 2014
Sergey Lukjanov
 
CERN OpenStack Cloud Control Plane - From VMs to K8s
Belmiro Moreira
 
Moving from CellsV1 to CellsV2 at CERN
Belmiro Moreira
 
Future Science on Future OpenStack
Belmiro Moreira
 
CERN User Story
Tim Bell
 
OpenStack High Availability
Jakub Pavlik
 
20170926 cern cloud v4
Tim Bell
 
Evolution of Openstack Networking at CERN
Belmiro Moreira
 
How to Develop OpenStack
Mehdi Ali Soltani
 
OpenStack Nova - Developer Introduction
John Garbutt
 
Euro ht condor_alahiff
vandersantiago
 
Integrating Bare-metal Provisioning into CERN's Private Cloud
Arne Wiebalck
 
OpenStack Summit Vancouver: Lessons learned on upgrades
Frédéric Lepied
 
TripleO
Kiran Murari
 
Operational War Stories from 5 Years of Running OpenStack in Production
Arne Wiebalck
 
Sanger OpenStack presentation March 2017
Dave Holland
 
Enhancing OpenStack FWaaS for real world application
openstackindia
 
How OpenStack is Built - Anton Weiss - OpenStack Day Israel 2016
Cloud Native Day Tel Aviv
 
RENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hair
John Constable
 
OpenStack HA
Kenneth Hui
 
OpenStack Data Processing ("Sahara") project update - December 2014
Sergey Lukjanov
 

Viewers also liked (8)

ODP
Divide and conquer: resource segregation in the OpenStack cloud
Stephen Gordon
 
PPTX
Ironic - Vietnam OpenStack Technical Meetup #12
Vietnam Open Infrastructure User Group
 
PPTX
Openstack study-nova-02
Jinho Shin
 
PPTX
Openstack Study Nova 1
Jinho Shin
 
PDF
Deep Dive into Openstack Storage, Sean Cohen, Red Hat
Cloud Native Day Tel Aviv
 
PDF
Hacking on OpenStack\'s Nova source code
Zhongyue Luo
 
PDF
OpenStack Cloud Request Flow
Mirantis
 
ODP
Deep Dive: OpenStack Summit (Red Hat Summit 2014)
Stephen Gordon
 
Divide and conquer: resource segregation in the OpenStack cloud
Stephen Gordon
 
Ironic - Vietnam OpenStack Technical Meetup #12
Vietnam Open Infrastructure User Group
 
Openstack study-nova-02
Jinho Shin
 
Openstack Study Nova 1
Jinho Shin
 
Deep Dive into Openstack Storage, Sean Cohen, Red Hat
Cloud Native Day Tel Aviv
 
Hacking on OpenStack\'s Nova source code
Zhongyue Luo
 
OpenStack Cloud Request Flow
Mirantis
 
Deep Dive: OpenStack Summit (Red Hat Summit 2014)
Stephen Gordon
 
Ad

Similar to Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015 (20)

PPTX
Capacity Management/Provisioning (Cloud's full, Can't build here)
andyhky
 
PDF
Lessons Learned Running The Largest OpenStack Clouds
Kenneth Hui
 
PDF
[Rakuten TechConf2014] [F-4] At Rakuten, The Rakuten OpenStack Platform and B...
Rakuten Group, Inc.
 
PPTX
Intro to OpenStack - WAJUG
Kevin Jackson
 
PPTX
Learning to Scale OpenStack: Juno Update from the Rackspace Public Cloud
Rainya Mosher
 
PDF
OSCON 2013 - Planning an OpenStack Cloud - Tom Fifield
OSCON Byrum
 
PDF
CloudStack - LinuxFest NorthWest
ke4qqq
 
PPTX
The OpenStack Cloud at CERN - OpenStack Nordic
Tim Bell
 
PPTX
Operating OpenStack - Case Study in the Rackspace Cloud
Rainya Mosher
 
PPTX
OpenStack: Toward a More Resilient Cloud
Mark Voelker
 
PPTX
Sanger, upcoming Openstack for Bio-informaticians
Peter Clapham
 
PPTX
Flexible compute
Peter Clapham
 
PDF
All about open stack
DataCentred
 
PPTX
Getting Started with Apache CloudStack
Joe Brockmeier
 
PDF
Openstack summit 2015
Andrew Yongjoon Kong
 
PDF
Txlf2012
Joe Brockmeier
 
PPTX
Cloud computing and OpenStack
Edgar Magana
 
PDF
Introduction openstack-meetup-nov-28
Sadique Puthen
 
PPTX
CloudStack Build A Cloud Day (SCaLE 2013)
Clayton Weise
 
PDF
OpenStack Toronto Q2 MeetUp - June 1st 2017
Stacy Véronneau
 
Capacity Management/Provisioning (Cloud's full, Can't build here)
andyhky
 
Lessons Learned Running The Largest OpenStack Clouds
Kenneth Hui
 
[Rakuten TechConf2014] [F-4] At Rakuten, The Rakuten OpenStack Platform and B...
Rakuten Group, Inc.
 
Intro to OpenStack - WAJUG
Kevin Jackson
 
Learning to Scale OpenStack: Juno Update from the Rackspace Public Cloud
Rainya Mosher
 
OSCON 2013 - Planning an OpenStack Cloud - Tom Fifield
OSCON Byrum
 
CloudStack - LinuxFest NorthWest
ke4qqq
 
The OpenStack Cloud at CERN - OpenStack Nordic
Tim Bell
 
Operating OpenStack - Case Study in the Rackspace Cloud
Rainya Mosher
 
OpenStack: Toward a More Resilient Cloud
Mark Voelker
 
Sanger, upcoming Openstack for Bio-informaticians
Peter Clapham
 
Flexible compute
Peter Clapham
 
All about open stack
DataCentred
 
Getting Started with Apache CloudStack
Joe Brockmeier
 
Openstack summit 2015
Andrew Yongjoon Kong
 
Txlf2012
Joe Brockmeier
 
Cloud computing and OpenStack
Edgar Magana
 
Introduction openstack-meetup-nov-28
Sadique Puthen
 
CloudStack Build A Cloud Day (SCaLE 2013)
Clayton Weise
 
OpenStack Toronto Q2 MeetUp - June 1st 2017
Stacy Véronneau
 
Ad

Recently uploaded (20)

PDF
Trading Volume Explained by CIFDAQ- Secret Of Market Trends
CIFDAQ
 
PDF
Lecture A - AI Workflows for Banking.pdf
Dr. LAM Yat-fai (林日辉)
 
PDF
visibel.ai Company Profile – Real-Time AI Solution for CCTV
visibelaiproject
 
PDF
How Current Advanced Cyber Threats Transform Business Operation
Eryk Budi Pratama
 
PDF
Generative AI in Healthcare: Benefits, Use Cases & Challenges
Lily Clark
 
PDF
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
PDF
Novus-Safe Pro: Brochure-What is Novus Safe Pro?.pdf
Novus Hi-Tech
 
PPTX
Simplifying End-to-End Apache CloudStack Deployment with a Web-Based Automati...
ShapeBlue
 
PDF
Arcee AI - building and working with small language models (06/25)
Julien SIMON
 
PDF
Integrating IIoT with SCADA in Oil & Gas A Technical Perspective.pdf
Rejig Digital
 
PPTX
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
PDF
Women in Automation Presents: Reinventing Yourself — Bold Career Pivots That ...
DianaGray10
 
PDF
Shuen Mei Parth Sharma Boost Productivity, Innovation and Efficiency wit...
AWS Chicago
 
PDF
How a Code Plagiarism Checker Protects Originality in Programming
Code Quiry
 
PDF
Novus Safe Lite- What is Novus Safe Lite.pdf
Novus Hi-Tech
 
PPTX
Top Managed Service Providers in Los Angeles
Captain IT
 
PDF
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
PDF
Rethinking Security Operations - Modern SOC.pdf
Haris Chughtai
 
PDF
UiPath vs Other Automation Tools Meeting Presentation.pdf
Tracy Dixon
 
PDF
HR agent at Mediq: Lessons learned on Agent Builder & Maestro by Tacstone Tec...
UiPathCommunity
 
Trading Volume Explained by CIFDAQ- Secret Of Market Trends
CIFDAQ
 
Lecture A - AI Workflows for Banking.pdf
Dr. LAM Yat-fai (林日辉)
 
visibel.ai Company Profile – Real-Time AI Solution for CCTV
visibelaiproject
 
How Current Advanced Cyber Threats Transform Business Operation
Eryk Budi Pratama
 
Generative AI in Healthcare: Benefits, Use Cases & Challenges
Lily Clark
 
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
Novus-Safe Pro: Brochure-What is Novus Safe Pro?.pdf
Novus Hi-Tech
 
Simplifying End-to-End Apache CloudStack Deployment with a Web-Based Automati...
ShapeBlue
 
Arcee AI - building and working with small language models (06/25)
Julien SIMON
 
Integrating IIoT with SCADA in Oil & Gas A Technical Perspective.pdf
Rejig Digital
 
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
Women in Automation Presents: Reinventing Yourself — Bold Career Pivots That ...
DianaGray10
 
Shuen Mei Parth Sharma Boost Productivity, Innovation and Efficiency wit...
AWS Chicago
 
How a Code Plagiarism Checker Protects Originality in Programming
Code Quiry
 
Novus Safe Lite- What is Novus Safe Lite.pdf
Novus Hi-Tech
 
Top Managed Service Providers in Los Angeles
Captain IT
 
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
Rethinking Security Operations - Modern SOC.pdf
Haris Chughtai
 
UiPath vs Other Automation Tools Meeting Presentation.pdf
Tracy Dixon
 
HR agent at Mediq: Lessons learned on Agent Builder & Maestro by Tacstone Tec...
UiPathCommunity
 

Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015

  • 1. Tips, Tricks and Tactics with Cells and Scaling OpenStack ● ● ●
  • 2. OpenStack Summit - Paris 2014 Multi-Cell Openstack: How to Evolve your Cloud to Scale https://www.openstack.org/summit/openstack-paris-summit-2014/session-videos/presentation/multi-cell- openstack-how-to-evolve-your-cloud-to-scale
  • 4. Australian Research Cloud ● Started in 2011 ● Funded by the Australian Government ● 8 institutions around the country ● Production early 2012 - Openstack Diablo ● Now running a mix of Juno and Icehouse ● Use Ubuntu 14.04 and KVM ● 100Gbps network connecting most sites (AARNET)
  • 5. Reasons for using cells Single API endpoint, compute cells dispersed around australia. Simpler from users perspective. Single set of security groups, keypairs etc. Less openstack expertise needed as only one version of some core openstack services.
  • 6. Size ● 8 sites ● 14 cells ● ~6000 users registered ● ~700 hypervisors ● 30,000+ cores People ● Core team of 3 devops ● 1-2 operators per site http://status.rc.nectar.org.au/growth/infrastructure/
  • 7. Interaction with other services Each cell also has one or more: ● cinder-volume host, using Ceph, LVM and NetApp backends ● A globally replicated swift region per site ● Glance-api pointing to local swift proxy for images ● Ceilometer collectors at each cell push up to a central mongo ● Private L3 network spanning all cells - will be useful for neutron
  • 8. Cells Infrastructure Each compute cell has: ● MariaDB Galera cluster ● RabbitMQ cluster ● nova scheduler/cells/vnc/compute/network/api-metadata ● glance-api ● swift region - proxy and storage nodes ● ceilometer-collectors to forward up to a global collector ● cinder-volume API cell: ● nova-api, nova-cells ● keystone ● glance-registry ● cinder-api, scheduler ● heat, designate, ceilometer-api
  • 9. Scheduling ● Have some “private” cells only available to certain tenants. This is usually determined by funding source. ● Global flavours and cell local flavours ● Cell level aggregates for intra cell scheduling ● Some sites have GPUs, fast IO for their own use. ○ Introduced compute and ram optimised flavours ○ Not all cells support all flavours ● Each cell advertises 1 or more availability zones to use in scheduling. ○ Ties in with cinder availability zones
  • 10. Bringing on new cells Want to test in production before opening to public Don’t want to flood brand new cells Scheduler filters ● Role based access to cell. Cells advertise what roles can schedule to them ● Direct only - Allow public to select that cell directly but global scheduler doesn’t count it
  • 11. Operating cells Have a small openstack cluster to manage all global infrastructure Standard environment - use puppet Upgrade cells one at a time - live upgrades ● upgrade compute conductors ● upgrade API cell ● upgrade compute cells ● upgrade compute nodes Read access to compute cells RabbitMQs for troubleshooting and monitoring. Only real interface into each of the cells. Console-log is a good test of cells functionality - have one in each cell and monitor
  • 12. Future plans Move to Neutron - in planning and testing stage ● Currently have a single public network per cell, want to provide tenant networks and higher level services Start off with a global neutron and simple shared flat provider networks per cell. All hypervisors talking to the same rabbit - scale issues? Also looking at other higher level openstack services (which there are many!)
  • 14. CERN - Large Hadron Collider
  • 15. CERN - Cloud Infrastructure ● ● ○ ● ● ○ ● ○ ○ ○ ○
  • 16. CERN - Cloud Infrastructure ● ○ ○ ○ ○ ■ ● ○
  • 17. CERN - Prune Nova DBs ● ● ● ● ○ ○ ○
  • 18. CERN - Cells scheduling ● ○ ● ○ ■ ■ ■ ■
  • 19. CERN - Flavors management ● ○ ● ○ ■ ● ○ ○
  • 20. CERN - Testing Cells ● ○ ○ ● ○ ○ ○ …
  • 21. CERN - Testing Cells
  • 22. CERN - Testing Cells
  • 24. • Managed Cloud company offering a suite of dedicated and cloud hosting products • Founded in 1998 in San Antonio, TX • Home of Fanatical Support • More than 200,000 customers in 120 countries Rackspace 24www.rackspace.com
  • 25. • In production since August 2012 – Currently running: Nova; Glance; Neutron; Ironic; Swift; Cinder • Regular upgrades from trunk – Package built on trunk pull from mid March in testing now • Compute nodes are Debian based – Run as VMs on hypervisors and manage via XAPI • 6 Geographic regions around the globe – DFW; ORD; IAD; LON; SYD; HKG • Numbers – 10’s of 1000’s of hypervisors (Over 340,000 Cores, Just over 1.2 Petabytes of RAM) • All XenServer – Over 170,000 virtual machines – API per region with multiple Compute cells (3 – 35+) each Rackspace – Cloud Infrastructure 25www.rackspace.com
  • 26. • Cells Infrastructure – Size between ~100 and ~600 hosts per cell – Different Flavor Types (General Purpose, HIgh I/O, Compute Optimized, etc) – Working on exposing maintenance zones or near/far scheduling (host, shared IP space, network aggregation) – Separate DB cluster for each cell • Run our Cells infrastructure in cells – Control Plane exists as instances in small OpenStack deployment – Multiple Hardware types – Separate tenants – Control plane instances from other internal users Rackspace – Cloud Infrastructure - Cells 26www.rackspace.com
  • 27. • Multiple cells within each flavor class – Hardware Profile • Additionally, we group by vendor • Live migration needs matching CPUs – Range of flavor size within each cell (eg. General Purpose 1, 2, 4 and 8 Gig) • Tenant Scheduling – Custom filter schedules by Flavor class first • All General Purpose cells, for example – Scheduled by available RAM afterwards • Enhancements for spreading out tenant load and max IOPs per host – In some cases, filters can bind a cell to specific tenants (testing and internal use) • Work in Cells V2 to enhance scheduling – https://review.openstack.org/#/c/141486/ as one example 27 Cell Scheduling www.rackspace.com
  • 28. • Common control plane nodes deployed by ansible play book – DB Pair – Cells service – Scheduler – Rabbit • Playbook Populates flavor info based on hardware type • Hypervisors bootstrapped once CP exists – Create Compute Node VM – Deploy Code and configure – Update routes, etc • Provision IP blocks • Test • Link via playbook 28 Deploying a Cell www.rackspace.com
  • 29. • Larger region has run rate around 50,000 VMs • 1000’s of VMs created/deleted per hour in busiest regions • Downstream BI and Revenue assurance teams require deleted instance records be kept for 90 days • Current deleted instance counts range between 132,000 and 900,000 29 Rackspace – Purge Nova DBs www.rackspace.com
  • 30. 30 Rackspace – Purge Nova DBs www.rackspace.com
  • 31. • By Pass URL prior to linking a cell up – Test API endpoint: http://nova-admin-api01.memory1-0002.XXXX.XXXXXX.XXXX:8774/v2 • Full set of tests – Instance creates, deletes, resizes – Overlay network creation – Volume provisioning – Integration with other RS products • Trickier to test hosts being added to an existing cell – Hosts are either enabled or disabled – Targeting helps • --hint target_cell=’<cellname>’ • --hint 0z0ne_target_host=<host_name> 31 Testing Cells www.rackspace.com
  • 32. • No formal way of disabling a cell • Weighting helps – but is not absolute – Weighting cell can still “win” scheduler calculation based on available RAM • Solution: custom filter uses specific weight offset value to avoid scheduling (- 42) 32 Managing Cells – “Disable” www.rackspace.com class DisableCellFilter(filters.BaseCellFilter): """Disable cell filter. Drop cell if weight is -42. """ def filter_all(self, cells, filter_properties): """Override filter_all() which operates on the full list of cells... """ output_cells = [] for cell in cells: if cell.db_info.get('weight_offset', 0) == -42: LOG.debug("cell disabled: %s" % cell) else: output_cells.append(cell) return output_cells
  • 33. 33 Managing Cells – “Disable” www.rackspace.com
  • 34. • Rackspace uses Quark Plugin – https://github.com/rackerlabs/quark • Borrowed old idea from Quantum/Melange days – Default tenant for each cell – Each cell is a segment – Provider subnets are scoped to a segment – Nova requests ports on provider network for the segment • Public • Private • MAC addresses too 34 Neutron and Cells www.rackspace.com