Private Cloud Reference Architecture Tech Note PDF
Private Cloud Reference Architecture Tech Note PDF
Nutanix Private
Cloud Reference
Architecture
1. INTRODUCTION................................................................................................................................................................................................................................................5
1.1. Vision for Private, Hybrid and Multi-Cloud................................................................................................................................................................................5
1.2. Design Objectives.......................................................................................................................................................................................................................................................... 7
1.3. Audience..................................................................................................................................................................................................................................................................................... 9
1.4. Design Decisions............................................................................................................................................................................................................................................................ 9
1.5. How to Use This Design Guide...............................................................................................................................................................................................................10
2. ARCHITECTURE OVERVIEW....................................................................................................................................................................................... 11
2.1. Physical layer........................................................................................................................................................................................................................................................................12
2.1.1 Hyperconverged Infrastructure.........................................................................................................................................12
2.1.2 Hardware Choice.................................................................................................................................................................... 13
2.1.3 Compute..................................................................................................................................................................................... 13
2.1.4 Storage........................................................................................................................................................................................ 13
2.1.5 Networking................................................................................................................................................................................ 14
2.1.6 Cluster Design............................................................................................................................................................................................................................................................... 14
Management clusters.......................................................................................................................................................... 14
Workload Clusters................................................................................................................................................................. 14
Storage clusters...................................................................................................................................................................... 14
Edge/ROBO clusters............................................................................................................................................................15
2.2 Virtualization Layer...................................................................................................................................................................15
2.3 Management layer................................................................................................................................................................... 16
2.3.1 Automated IT Operations...................................................................................................................................................17
2.4 Business Continuity layer......................................................................................................................................................17
2.5 Automation layer...................................................................................................................................................................... 18
2.6 Security & Compliance layer...................................................................................................................................................................................................................... 18
5. CONCLUSION................................................................................................................................................................................................................................................. 129
APPENDIX I - TABLE OF DESIGN DECISIONS..................................................................................................................130
1. Introduction
1.1 VISION FOR PRIVATE, HYBRID AND MULTI-CLOUD
The Nutanix vision for Cloud computing environments originally started in the
datacenter with innovative solutions for Private Cloud, which greatly reduced the
complexity and effort required to deploy and manage software-defined storage,
compute, and networking solutions. In recent years, Nutanix’s vision has expanded to
include innovative architectures for hybrid and multi-cloud that offer more choices and
alternatives to optimize costs.
The decision on which cloud to use for a specific use case often depends on a variety of
characteristics of multi-tiered applications. These can be summarized as follows:
These characteristics ultimately shape their cloud architectures and drive the decision
where to host each application tier in a particular service.
Historically, the hosting architecture for multi-tiered applications didn’t change without a
labor intensive migration. Nutanix innovation arms IT Organizations with the flexibility to
dynamically place workloads. The Nutanix distributed architecture natively includes the
ability to provision the same application blueprints in four different cloud configurations:
5
Nutanix further extends these cloud configurations by enabling the following four
architectural principles:
Policy Based Security Governance Hosted in AWS, Azure, or GCP as a native public
cloud service offering.
All of these principles contribute to an advanced hybrid and multi-cloud strategy that
simplifies IT, stretches budgets, and accelerates time to value. This document provides
the architecture and design driven decisions to help our customers realize this strategic
vision.
In order to lay the foundation, the first release of this document will focus on the design
for Private Clouds based on Nutanix. Nutanix will actively work on extending the design
to include all relevant architectures to complete the above vision.
6
1.2 DESIGN OBJECTIVES
The objective of this document is to define, explore, and develop key design decisions
required when implementing private, hybrid, or multi-cloud solutions based on the
Nutanix platform. The objective can be further broken down as follows:
• To identify and enumerate the key design decisions that need to be documented in
order to support a robust design methodology and practice.
• To explore each design decision, evaluating key viable options, tools, and methods for
implementation and management so that organization can make informed decisions
relating to their specific design requirements.
Since simplicity is a key principle of all Nutanix products, some requirements may be
met through native platform architecture without the need for superfluous design
decisions, which are often required by competing platforms. The objective of this
document is therefore not to educate readers about Nutanix features and functions,
even though this may naturally occur as a side benefit. In these cases, this document
will document how these requirements are addressed by Nutanix-native features.
7
Maximum number of Unlimited number of workloads dependent on the pod
workloads based constructs described herein.
8
1.3 AUDIENCE
This document for Single Datacenter is intended for infrastructure architects,
infrastructure administrators, and infrastructure operators who want to deploy and
manage data centers based on Nutanix Enterprise Cloud, to address requirements for
availability, capacity, performance, scalability, business continuity, and disaster recovery.
Example:
Appendix I: Table of Design Decisions includes a list of all the design decisions described throughout
in this document.
9
1.5 HOW TO USE THIS DESIGN GUIDE
This document is subdivided into three major sections as follows:
1. Architectural Overview
Introduces key Architecture concepts that will be discussed throughout this design.
2. Design Considerations
Discusses key design considerations that will vary for each customer. Customers will be
required to make decisions that will influence the design and build of their end-solution.
3. Detailed Design
Identifies key design decisions and in most cases determines the optimal configuration
and decision that will be used for validating the design. For design decision, alternate
options may be discussed along with pro’s and con’s. For good reasons, Customers
may choose to deviate from the decisions made in this document. The decisions made
in this section are recommended by Nutanix, however they are by no means meant to
be the only method, and Nutanix recognizes that alternate decisions may be
appropriate depending on specific requirements.
4. Operations
Articulates high level operational considerations that influence the design of the
solution.
NOTE: This section does not provide detailed operational guides or runbooks.
10
2. Architecture Overview
This section describes the high-level Nutanix architecture, including major concepts
and design elements that anyone designing a Nutanix deployment should understand.
If you are already familiar with Nutanix hyperconverged infrastructure (HCI) and Nutanix
software, you can skip this section.
The diagram below shows the high-level architecture covered in this document.
This overview explains the elements of each layer. Later sections will explore the
design decisions necessary for each layer.
11
2.1. PHYSICAL LAYER
Because the Nutanix Enterprise Cloud architecture is based on hyperconverged infra-
structure, the physical layer is significantly different than it would be in a traditional
datacenter architecture. Understanding the differences will allow you to make the best
hardware choices for your Nutanix deployment.
12
2.1.2 HARDWARE CHOICE
Nutanix Enterprise Cloud provides significant choice when it comes to hardware
platform selection. Available options include:
• Nutanix NX appliances.
• OEM appliances from leading vendors such as Dell, Lenovo, HPE, IBM, and Fujitsu.
• Other third-party servers from a wide range of vendors.
The Nutanix hardware compatibility list (HCL) contains the most up-to-date
information on supported systems.
2.1.3 COMPUTE
Sizing systems to meet compute needs in a Nutanix environment is similar to other
architectures. However, it’s important to ensure that your design provides enough
compute (CPU/RAM) to support the CVM.
2.1.4 STORAGE
Nutanix nodes offer a range of storage configurations:
• Hybrid nodes combine flash SSDs for performance and HDDs for capacity.
• All-flash nodes utilize flash SSDs.
• NVMe nodes utilize NVMe SSDs.
Different node types can be mixed in the same cluster. More information can be provid-
ed in the document Product Mixing Restrictions.
For data resiliency, Nutanix uses replication factor (RF), maintaining 2 or 3 data copies.
This approach enables a Nutanix cluster to be self-healing in the event of a drive, node,
block, or rack failure. In a Nutanix cluster consisting of multiple blocks, RF can enable
block awareness. Data copies are distributed across blocks to protect against the failure
of an entire block. In configurations spanning multiple racks, RF can similarly provide
rack awareness with resilience to a rack outage. For more information on Nutanix data
resiliency, please refer to the Nutanix Bible.
13
Compression, deduplication and erasure coding (EC-X) can be enabled to increase data
efficiency for capacity saving purposes.
Data locality and intelligent tiering ensure that the data associated with a VM is
preferentially stored on that VM’s local node. Active data is stored on the fastest media,
delivering performance and eliminating the need for ongoing performance tuning.
2.1.5 NETWORKING
Fast, low-latency and highly available networking is a key element of this document. The
distributed storage architecture relies on the performance and resilience of the physical
network. A good design provides high performance while maintaining simplicity.
For designs that choose dedicated clusters, separate workload domains and may
include the following cluster types:
Management clusters
Designed to run VMs that support datacenter management such as:
• Nutanix Prism Central.
• VMware vCenter.
• Active Directory.
• Other management workloads, such as DNS, DHCP, NTP, Syslog.
Workload clusters
Reside in a virtual infrastructure workload domain and run tenant virtual machines.
You can mix different types of compute clusters and provide separate compute pools
to address varying SLAs for availability and performance.
Storage clusters
Storage-only clusters provide dedicated data services to tenants. These are typically
deployed for use cases that are specifically focused on Object, Files, or Block level
storage.
14
Edge/ROBO clusters
Reside at an edge and/or ROBO deployment to run virtual machines or ROBO
workloads.
NOTE: Nutanix HCI supports Microsoft Hyper-V, however this document does not
include considerations for deployment of this hypervisor option.
15
2.3. MANAGEMENT LAYER
The management layer is a key differentiator for Nutanix and this document.
The Prism family consists of three products that extend core capabilities:
1. Prism Element
The core Nutanix management platform enables management and monitoring at
the cluster level for all infrastructure (compute, storage, networks) and virtualization.
Key functionality of Prism includes:
• Full VM, storage, and hypervisor management.
• Network visualization.
• Role-based access control (RBAC).
• Nutanix 1-click upgrades - Orchestrates and streamlines platform upgrades, keeping
track of the changes. Can upgrade all Nutanix software and firmware running in a
Nutanix environment plus the ESXi hypervisor.
2. Prism Central
Enables management and monitoring of multiple Nutanix Prism Element clusters
from a central interface.
3. Prism Pro
Adds advanced capabilities to the Prism platform, including performance anomaly
detection, capacity planning, custom dashboards, reporting, advanced search
capabilities, and task automation.
The Prism family of products are an integral part of a Nutanix cluster and do not require
separate infrastructure. Prism Central runs as a separate VM, or as a cluster of 3 VMs for
additional scale and resilience. More details on Nutanix management are provided later.
Nutanix deployments that use the AHV hypervisor can be fully managed by Prism.
Nutanix deployments that use VMware vSphere should also utilize VMware vCenter
Server. This is the centralized monitoring and resource management software for
VMware virtual infrastructure. It performs a number of tasks, including resource
provisioning and allocation, performance monitoring, workflow automation, and
user privilege management.
16
2.3.1 AUTOMATED IT OPERATIONS
Prism Pro allows administrators to automate routine operational tasks, reducing
administrator effort and time while increasing the quality of results. To provide this
automation, Nutanix X-Play enables “if-this-then that” (IFTT) features that allows admins
to create Playbooks that define automation actions that run when a particular trigger
occurs.
17
2.5. AUTOMATION LAYER
Automation and orchestration are increasingly recognized as critical to IT success. By
simplifying infrastructure management across the entire lifecycle, automating operations,
and enabling self-service, Nutanix helps you deploy datacenter infrastructure that
delivers a high degree of scalability, availability, and flexibility.
• Simplified development.
Nutanix eliminates the complexity of test and development automation, allowing
developers and administrators to work more efficiently. Your team can deploy and
maintain a fully automated CI/CD pipeline with continuous application deployment
across on-premises and cloud locations.
18
The Nutanix security approach is not predicated on having hundreds of different knobs
that you must turn to achieve a secure environment. Nutanix takes a security-first
approach including a secure platform, extensive automation, and a robust partner
ecosystem. There are configuration options available if you need to add an extra layer
of security based on business and or technical requirements.
Nutanix provides customers with the ability to evolve from point-in-time security
baseline checking to a continuous monitoring/self-remediating baseline to ensure all
CVM/AHV hosts in a cluster remain baseline compliant throughout the deployment
lifecycle. This new innovation checks all components of the documented security
baselines (STIGs), and if found to be non-compliant, sets it back to the supported
security settings without customer intervention.
These valuable security features are discussed later in the Detailed Design section,
and you can find more information online in the Nutanix Bible.
19
Nutanix incorporates security into every step of its software development process, from
design and development to testing and hardening. Nutanix security significantly reduces
zero-day risks. With one-click automation and a self-healing security model, ensuring
ongoing security requires much less effort.
• 508 Compliant
• FIPS 140-2 Level 1
• National Institute of Standards and Technology (NIST) 800-53
• TAA Compliant
20
3. High-Level Design
Considerations
A key feature of Nutanix Enterprise Cloud is choice. Nutanix customers have the flexibility
to choose their preferred hardware vendor, CPU architecture, hypervisor, and more. This
design is intended to help guide you to the best choices for your organizational needs.
There are a number of high-level design decisions that must be made before proceeding
to a detailed technical design. This section provides the information necessary to help you
make the following decisions:
• Choosing a hypervisor.
• Choosing a cluster deployment model.
• Choosing a hardware platform.
Nutanix AHV is included with AOS and delivers everything you’d expect from an
enterprise virtualization solution: high performance, flexible migrations, integrated
networking, security hardening, automated data protection and disaster recovery, and
rich analytics. With robust, integrated management features, AHV is a lean virtualization
solution.
AHV enables:
• Checkbox high availability configuration (vs. complex percentage or slot size config)
• No virtual SCSI devices required (vs. manually configured multiple SCSI devices for
maximum performance).
• Distributed networking by default (vs. deciding between Standard or Distributed
Switch).
21
• Automatic CPU Masking (vs. manual in vSphere).
• No need for Host Profiles.
• Fewer design choices.
• Simple control plane lifecycle.
• Simplified 1-click upgrades.
• Single support path.
VMware vSphere is a proven virtualization platform used by many organizations and has
a robust ecosystem. It is a complex platform with many design choices and settings to
tune; it often requires the purchase of additional licenses.
When deciding which hypervisor to use in your deployment, the choice comes down to
which solution best meets your technical and business requirements and your budget.
Key areas to consider when choosing a hypervisor include:
Freedom of choice is a key tenet of Nutanix. After considering these factors, it is likely
that one hypervisor platform stands out as the best option for your deployment. No
matter which hypervisor you choose, the solution is backed by world-class Nutanix
support and a full ecosystem of node types. (Hardware selection is discussed in the
section Platform Considerations).
Impact
Justification
22
3.2. CHOOSING A CLUSTER DEPLOYMENT MODEL
When designing Nutanix clusters for your deployment, there are several important
design considerations:
However, the infrastructure applications and services that run on a management cluster
are critical. At a larger scale, there are several reasons to separate these management
workloads:
NOTE: You do not need a separate management cluster for each use case or
environment unless you have strict security or compliance requirements that make it
necessary. This means that a single management cluster can support multiple use cases
such as: EUC deployments, private cloud, and general server virtualization environments.
23
Should You Deploy Two Management Clusters?
In large-scale deployments, the management cluster can be split to create separate
failure domains. With two management clusters, you can place redundant components
into each cluster, enabling a higher level of availability. Should there be an issue with
one of the management clusters, the other remains available to service requests.
Impact
Justification
24
Operating large-scale mixed environments creates a number of unique challenges. You
have to decide whether you are willing to manage the challenges of mixed workloads
or dedicate clusters for each workload. Here are the main factors to consider:
• Performance and capacity. The resource demands of different applications can vary
widely. You need to understand the needs of each application when mixing work
loads since the chance for conflicts to occur is increased. There may also be wildly
different performance and capacity needs between applications which could require
different node configurations within a single cluster. Unless you are going to isolate
certain workloads to particular nodes, each node within a cluster needs to be able to
handle the average daily mix of applications that might be running on it.
• VM resource-sizing requirements. The CPU and memory sizing for general server
VMs, VDI VMs, business-critical application VMs, etc. can vary widely. While it’s fairly
easy to account for memory sizing, CPU sizing is more complex. Each of these
workloads consumes different amounts of CPU, and may require much different
levels of CPU overcommit, if any.
If you have large groups of VMs with widely different resource requirements, it’s
typically better to build clusters to contain more uniformly sized VMs. From a
hypervisor HA standpoint, you may require additional resources within a mixed
cluster to ensure the ability to failover. This can also increase the day 2 operational
support effort, since it may require manually tuning HA settings and increased
monitoring to ensure HA resources remain in compliance.
• Software licensing. The most common reason for dedicated clusters is software
licensing. There are a variety of reasons why using dedicated clusters make sense
from a licensing standpoint. Here are two common examples:
• Operating system licensing. Windows and Linux vendors may offer “all-you-can-
eat” license models for licensing at the host level. If you license all the nodes in a
cluster with Windows datacenter licensing it may be more cost effective to run
Linux VMs on this cluster because resources that could be used for
Windows-based instances may not be used.*
• Database licensing. Database licenses are frequently based on either CPU cores
or sockets. These licenses can be expensive, and you often have to license all
the nodes in a cluster to enable DB VMs to run on every node. Once again, you
probably don’t want to run other workloads on the cluster since that reduces
the return on your license investment.
In addition, you may want to run nodes hosting database workloads on different
hardware than nodes hosting general server VMs. For instance having fewer
CPU cores running at a higher clock speed may reduce your overall licensing
costs while still providing the necessary compute power.
*This is provided for informational purposes only, please refer to your licensing agreement with Microsoft for
more information.
25
• Security. In many projects, security constraints are an overriding factor. When it
comes to the security of mixed vs dedicated clusters, there are a few design consid-
erations that are important to consider as you decide whether logical or physical
separation is adequate to address your security requirements:
• Operations. RBAC is the primary means of controlling access and management
of infrastructure and VMs in a mixed cluster. Dedicated clusters prevent non-ap
proved parties from gaining any access whatsoever.
• Networking. A mixed cluster typically relies on separate VLANs and firewall
rules per workload to control access. A dedicated cluster would only have the
required networks presented to it for a single workload and would likely limit
who and what has network access. Both approaches can control and limit
access to cluster resources, but dedicated goes a step further by providing
complete physical network isolation for those that require it.
Impact
Justification
26
3.3. CHOOSING HOW YOU WILL SCALE
This document clearly lays out our opinionated design for deploying HCI clusters and
building a private cloud for any type of workload. A challenge in any design, whether
you’re starting with one cluster or dozens, is how to scale the environment to reach your
goal. To accomplish this you need a master architectural plan that creates a repeatable
process that can be followed by the organization. Defining this in advance removes
confusion and simplifies future deployment. A master architectural plan allows you to
track progress and ensure compliance with the design.
• A control plane.
• A block and pod architecture.
Each of these control planes has a maximum size that dictates the number of VMs,
nodes, and clusters that can be managed by an instance.
What is a pod?
In this design, a pod is a group of resources that are managed by a single Prism Central
instance. The diagram below shows a single pod containing four building blocks. A pod
is not bound by physical location limitations; all of its resources could be at a single site
or at multiple sites. Examples of multiple sites include: a traditional multi-datacenter
design, a hybrid cloud design,or a ROBO architecture.
Building block
In this design, a build block is equivalent to a Nutanix cluster. Each of the clusters can
run a single dedicated workload or mixed workloads. There is no need for building
blocks to be uniform within your design.
27
RECOMMENDATION:
For each workload:
• Establish whether it will have a dedicated cluster or share a mixed cluster with
other applications.
• Define the maximum size of this type of building block. While building blocks for
different workloads do not need to be the same, they certainly can be.
• Determine the maximum size of a building block based on:
• The scale at which you will deploy workloads.
• The need for failure domains.
• Operational considerations such as upgrade timing.
These topics are discussed later in the Detailed Technical Design section of this
document.
28
If any of these limits are reached, a pod is considered full. For example with an EUC
deployment, the VM limit will likely be reached first since you typically have high VM
density resulting in large numbers of VMs with fewer nodes and clusters. A large
ROBO environment might hit the cluster count limit because you tend to have
many sites each with a small cluster and a few VMs.
At very large scale (e.g. thousands of nodes), it can make sense to have pods
dedicated to each workload, but for environments that have Nutanix deployed for
multiple workloads, a pod will typically contain multiple applications.
Once a pod reaches a scaling limit, start a new pod with at least one building block.
The new pod scales until it also reaches a scaling limit, and so on.
The building blocks within a pod scale in a similar fashion. A building block is started
and workloads are migrated onto it until it reaches its determined max size, and a new
building block is started. New building blocks can be as small as 3 nodes, the minimum
to start a Nutanix cluster, or any size up to the max size you’ve specified for that
building block.
The starting size for each building block and the increments for scaling them are
organizational decisions:
• For smaller or more agile organizations, starting small and scaling incrementally
often makes sense.
• Larger organizations may prefer to deploy a new building block fully populated and
migrate workloads onto it as schedules dictate.
• Although 3 nodes is the minimum size for a cluster, using 4 nodes provides a higher
level of redundancy during maintenance and failure conditions.
• For ROBO and edge use cases, the starting size can be as small as one, two, or
three nodes depending on requirements.
The diagram below illustrates a simple VDI building block example. With VDI it is easy
to think in terms of number of users and the node count in a cluster. In this example,
the building block is a 16-node cluster supporting 1,500 users. This works out to 100
users per node plus an additional node for HA. With the first building block full, a
request for an additional 500 users requires a new building block to be started.
This building block is then scaled up to its max size before starting a third.
29
By having well established and documented design decisions for pod size and building
block size, the architecture and operational teams are free to keep scaling without the
need to revisit decisions to satisfy each resource expansion request.
• Appliances are available directly from Nutanix or through our OEM relationships.
• Appliance based licensing is referred to as “life of device licensing”, meaning it’s
only applicable to the appliance it was purchased with.
• The manufacturer of the appliance takes all support calls for software and
hardware issues. For example, if you choose Dell appliances, Dell will take all
support calls and escalate to Nutanix for software support as needed. With
Nutanix NX appliance all support calls go directly to Nutanix.
30
• The Software Only option de-couples software and support licensing from the
underlying hardware. This enables:
• License portability. The same license can continue to be used when the
underlying hardware is changed, such as a hardware vendor change or a node
refresh. (note: licenses are portable for like-for-like hardware replacements. If
the hardware specification of the nodes change, then there might be the need
for additional licenses.).
• Deployment on additional supported hardware platforms. Hardware is per our
qualified list of platforms, which can be found on the Hardware Compatibility
List (HCL).
• Software support is direct from Nutanix while the server vendor provides
hardware support.
• Another type of Software Only Licensing is the Core-Licensing option, which is
best for those customers who prefer the benefits of the software-only model
but want hardware from a specific OEM server vendor. Core licensing enables
customers to purchase software only licenses and buy hardware from any of
the appliance vendors. For example, XC-Core utilizes Dell XC OEM appliances
but de-couples software and hardware support.
Impact
Justification
• Brand loyalty. This can be a strong factor in an IT buying decision. There may be
purchasing commitments or discounts in place at an organizational level driving this
loyalty, or you may simply have been happy with past experiences.
• Support Quality. The quality of the support experience can also be a factor in your
hardware evaluation. For hardware failures, makes sure the hardware vendor
responds quickly and can provide parts reliably within the contracted response time.
The overall support experience is important. A vendor should be easy to contact, be
responsive to requests, and provide resolution in a timely manner.
31
• Operational Expereince. When it comes to the ongoing operational experience, what
does it take to carry out day 2 operations to support the lifecycle of the physical
server and the vendor toolset (if any). This includes: monitoring server health,
reporting and upgrading firmware, and monitoring for component issues and
failures. Virtually all of the server vendors offer tools for these activities and when
combined with the power of AOS and Prism, the experience is pretty similar.
Nutanix Lifecycle Manager (LCM) offers firmware reporting and management for all
Nutanix appliance options.
While there is generally a level of parity between server vendors, a particular vendor
may offer something the others do not, or one vendor may offer the latest options
more quickly when new components are released. You may require or prefer a
specific type of network card or need nodes with a large number of storage bays.
• Physical form factor. This is another common decision point since it can affect the
amount of space consumed in a rack, the power draw, the number of network
connections, and the number of internal expansion slots. Availability of internal
expansion slots may limit the number and type of network cards that can be
deployed, and whether a node is capable of accepting GPU cards and the number
of GPU cards it’s capable of supporting. When it comes to different form factors,
there are:
• High-density chassis that offer either four or two physical nodes in 2U of rack
space. These are popular options for a variety of workloads that do not require
extensive internal expansion or a large number of storage bays.
• Standard rackmount servers, typically with a 1U or 2U chassis and one physical
server per chassis. These provide much wider capabilities in the number of
storage bays available, the number of internal expansion slots available, and may
also support more memory.
Impact
Justification
32
3.5.3 MIXED CONFIGURATIONS AND NODE TYPES
Nutanix clusters allow significant flexibility in terms of the node types and configurations
you can utilize in a single cluster. This allows clusters to be operated and expanded over
time without artificial constraints. A cluster can be expanded with different node
configurations to accommodate new workloads or when previous nodes are no longer
available.
Considerations include:
• Node models. Mixing node models within a cluster is a fairly regular occurrence.
While it’s possible to have the same CPU, memory and storage configuration in two
different node types, it’s not required in order to mix them in the same cluster.
• CPU configurations. Mixing nodes within a cluster with different CPU configurations
such as core count, clock speed, or CPU generation is supported. This can be to
address changing application requirements, inventory availability, financial con
straints, time of purchase, or other factors.
While there is no limit to the drift between configurations, it’s a commonly accepted
best practice to keep the core counts and memory configuration of nodes within a
cluster at similar levels. Using different CPU generations in the same cluster can limit
the feature set / functionality of newer CPUs. The lowest common denominator is the
level of the oldest CPU generation within a cluster. Having mostly uniformly configured
nodes in a cluster makes it easier for humans to double check the HA and capacity
planning recommendations of automated tools.
• Storage media. Having nodes with different storage configurations is also supported.
Variations at the storage layer can include varying the size or number of SSDs or
adding all-flash nodes to a hybrid cluster.
33
Once you know the requirements of your workload(s) you can use the Nutanix Sizer to
determine the best configuration for your cluster(s).
Nutanix Sizer is a web based application that is available to Employees, partners and
select customers. Sizer allows for the architect to input application and workload
requirements and the node and cluster configurations are automatically calculated.
Generally speaking, there are four different groups that servers fall into, translating to
different use cases:
• General workloads and EUC. These are by far the most popular nodes deployed in
Nutanix clusters. They can handle the vast majority of workloads including: general
server virtualization, business-critical applications, VDI, and most others. There are a
mix of form factors available.
• ROBO and Edge. These are similar to the general workload options, with the
exception that they may offer few options for CPU and storage as they are optimized
for these edge use cases.
• Storage dense. For workloads that require large amounts of storage capacity,
storage-dense nodes offer a larger number of storage bays and dense media
options, possibly with fewer CPU options. The physical configuration of these nodes
is optimized for workloads such as Nutanix Files, Nutanix Objects, or to be utilized
as storage-only nodes.
• High performance. The most demanding workloads and business-critical applications
(BCA) may require additional CPU resources and storage performance. For these
workloads there are nodes offering additional CPU configurations in terms of core
count and clock speed, as well as quad-socket configurations. It’s common for these
models to offer as many as 24 storage bays to allow for more flash devices or hard
drives for workloads that can utilize the added performance characteristics.
Each of the above model alternatives offer one or several of the available physical
form factors along with the density and performance characteristics discussed.
Nutanix software does not require any complex tuning or configuration to support the
different workloads, but there are plenty of hardware options to tailor your selections
to different use cases.
34
3.6.2 PERFORMANCE CONSIDERATIONS
Nutanix and AOS meet the performance demands of different workloads without
continuous performance tuning. The Nutanix HCI storage fabric is powerful and
intelligent enough to handle nearly any type of workload.
However, different cluster design and configuration options still yield performance
benefits. Selecting the appropriate node model and configuration to meet application
and solutions requirements is an important design decision. The primary design
considerations for performance are:
• Number of drives. The number of hard disk drives (HDDs) or flash devices in a node
can dramatically affect its performance characteristics. However, simply picking the
node with the most device bays won’t improve the performance of every workload.
• Write-heavy workloads benefit from additional storage devices to provide
performance and consistency. Other workload characteristics such as
read/write ratio and I/O size should also be considered.
• Workloads such as VDI typically have minimal capacity requirements but higher
IOPS demands. It’s common to utilize nodes with partially populated storage
bays and as few as 2 flash devices per node, providing the right amount of
storage capacity while still exceeding performance demands.
• All Flash. All-flash configurations are available from Nutanix, OEMs, and supported
third-party server vendors. All-flash clusters utilize only SSDs, and these configurations
provide higher IOPS and a more consistent I/O profile. While all flash configurations
have become common, they are not absolutely necessary for every workload and
use case.
• NVMe. There are a number of new technologies available now and coming soon that
offer additional performance capabilities. NVMe is the first of these to be widely
available and offers a number of benefits over SSD. New flash technology allows
NVMe devices to deliver higher levels of I/O with lower latencies.
• RDMA. To realize the full benefits of NVMe, nodes are typically configured with
remote direct memory access (RDMA). RDMA allows one node to write directly to
the memory of another node. This is done by allowing a VM running in the user
space to directly access a NIC, which avoids TCP and kernel overhead resulting in
CPU savings and performance gains.
• Size of flash tier. In hybrid configurations containing SSD and HDD devices, the bulk
of the performance comes from the flash tier. Therefore, it’s important to understand
the workload being deployed on a hybrid cluster. The data an application accesses
frequently is typically referred to as the working set. The flash tier in hybrid clusters
should be sized to meet or exceed the size of the working set for all of the applications
that will run on the cluster. There is no penalty for having too much flash in a cluster but
not having enough can result in inconsistent performance.
35
PFM-004 SELECT NODE MODEL(S) PER USE CASE
Impact
Justification
NOTE: As your organization works through the Detailed Technical Design elements in
the following section, be prepared to revisit your model decisions to fine-tune CPU,
memory, and storage configurations.
36
4. Detailed Technical
Design
With the necessary high-level design decisions made, including hypervisor, deployment
model, and hardware platform and models, you can now plan the technical design for
your Nutanix deployment.
This section provides technical guidelines for each layer of the design stack. Where
possible, we’ve organized the sections so that you don’t have to spend time reading
material that doesn’t apply. For instance, if you are not deploying VMware, you can skip all
sections that are applicable only to it.
37
4.1. REQUIRED SOFTWARE VERSIONS
This design assume that you will be running the following software versions:
Hypervisor Options
38
The following table provides a comprehensive list of the design considerations
that pertain to cluster size:
Example:
Maintenance window: 12h.
Full single node upgrade: at least
45m (Hardware, FW, AOS, and
hypervisor). Time depends on
hardware vendor.
Maximum cluster size: 12 -15 nodes.
39
Vendor • Hypervisor Each product has limitations and
recommendations limitations. vendor recommendations. Make sure
and limitations • Management plane you do not cross boundaries set by
limitations. the vendor.
• Vendor See vendor limitations/
recommendations. recommendations table
Example:
• RPO 24h
• RTO 48h
Ensure you can recover/restore from
backup and restart workloads within
48h. A cluster where total storage
capacity exceeds technical capabilities
of the backup system could fail to
meet desired RTO.
Example #1:
Oracle or MS SQL licensing
Licensing models are based on
physical core count. Design clusters
for database performance and
capacity requirements to avoid cluster
oversizing and minimize license costs.
Example #2:
Application has its own HA or DR
If application can provide native HA
and/or DR, RPO/RTO considerations
described under Business Continuity
(above) may not apply.
40
Networking • Total available Available physical ports per rack and
network switch rack row is important when choosing
ports. cluster size and number of clusters.
• Available network
switch ports per Example:
rack. • 96 ports (10GbE) available per rack.
• 48 ports (1Gbps) available per rack.
• 2 x 10GbE uplinks per Nutanix host.
• 1 x 1Gbps uplink for Out-of-Band
management.
AHV Deployments
The following table shows the maximum limits for management plane software
components in AHV deployments:
41
The following table provides guidance regarding minimum and maximum number of
nodes supported in a single Nutanix AHV cluster
The following table provides guidance regarding minimum and maximum number
of nodes supported in single VMware cluster
42
PFM-005 NUMBER, <<TYPE, AND SIZE?>> OF CLUSTERS
Impact
Justification
When designing a Nutanix deployment, you can take steps to mitigate risk for each
of the following failure domains:
• Drives
• Nutanix node
• Nutanix block
• Management plane
• Nutanix cluster.
• Datacenter rack and server room
• Datacenter
Nutanix clusters are resilient to a drive, node, block, or rack failure, which is enabled by
Redundancy Factor 2, the default. Redundancy Factor 3 can enable simultaneous drive,
node, block, or rack failures with the right architecture. After drive, node, block or rack
failure, a Nutanix cluster self-heals to reach the desired redundancy factor and rebuilds
resilience for additional subsequent failures.
NOTE: You can configure your Nutanix environment to be fault tolerant to node, block,
and rack failures. This is described later in the section: Data Redundancy and Resiliency.
Mitigating the risks of network failure domains is described in the section: Networking.
43
AREA RISK MITIGATION
DataCenter DataCenter
Nutanix Cluster Nutanix Cluster Nutanix Cluster Nutanix Cluster Nutanix Cluster Nutanix Cluster
Failure Failure
Domain Domain
44
Large clusters result in larger failure domains and potentially higher business impacts,
since they typically host considerably more workloads. To mitigate the risk of data
unavailability or service disruption, design for redundancy at a cluster level to protect
data and services, as described in the following table:
TOR TOR
switches switches
Failure Failure
Domain 1 Cluster 1 Domain 2 Cluster 2
45
Datacenter Rack and Server Room Failure Domains
When considering the datacenter rack failure domain, the primary mitigations are
redundant power from two different power supplies to each rack, redundant TOR
switches, and redundant network uplinks.
When considering the datacenter server room failure domain, it is critical to examine all
datacenter components to ensure they are not shared among multiple server rooms, as
described in the following table:
DataCenter
Nutanix Cluster Nutanix Cluster Nutanix Cluster Nutanix Cluster Nutanix Cluster Nutanix Cluster
Failure Failure
Domain 1 Domain 1
46
Datacenter Building
When considering the datacenter failure domain, it is critical to examine the redundancy
of all connections to the outside to ensure they are not shared, as described in the
following table:
DataCenter
Nutanix Cluster Nutanix Cluster Nutanix Cluster Nutanix Cluster Nutanix Cluster Nutanix Cluster
Failure Failure
Domain 1 Domain 1
47
4.2.3 DESIGNING WORKLOAD DOMAINS
This document uses workload domains as building blocks. Each workload domain
consists of a set of Nutanix nodes that are managed by the same management instance
(Nutanix Prism Central/VMware vCenter) and connected to the same network domain.
Normally, single workload domains occupy a single rack. However, you can aggregate
multiple workload domains in a single rack or span a single workload domain across
multiple racks.
Compute Storage
Cluster Cluster
Mgmt Edge
Cluster Cluster
48
Single Rack, Single Workload Domain
One workload domain can occupy a single datacenter rack. All nodes from a workload
domain are connected to a single pair of TOR switches.
TOR
switches
Mgmt
Cluster
Edge
Cluster
49
TOR TOR TOR TOR
switches switches switches switches
Up to
96 nodes
Compute
cluster
Edge
cluster
Mgmt
Cluster
Edge
Cluster
Mgnt
cluster
Up to
24 nodes
Storage
cluster
50
Pros & Cons
A single workload domain can span multiple racks. For example, to provide an
additional level of data protection (using rack awareness/fault tolerance, see the section:
Data Redundancy and Resiliency) or if a single workload domain is bigger than a
single rack can contain.
PROS CONS
Impact
Justification
51
4.2.4 NETWORKING
Well-designed networks are critical to this document’s resilience and performance.
A Nutanix cluster can tolerate multiple simultaneous failures because it maintains a set
redundancy factor and offers features such as block and rack awareness. However, this
level of resilience requires a highly available, redundant network connecting a cluster’s
nodes. Protecting the cluster’s read and write storage capabilities also requires highly
available connectivity between nodes. Even with intelligent data placement, if network
connectivity between more than the allowed number of nodes breaks down, VMs on
the cluster could experience write failures and enter read-only mode.
To optimize I/O speed, Nutanix clusters choose to send each write to another node in
the cluster. As a result, a fully populated cluster sends storage replication traffic in a
full mesh, using network bandwidth between all Nutanix nodes. Because storage write
latency directly correlates to the network latency between Nutanix nodes, any increase
in network latency adds to storage write latency.
Physical Switches
A Nutanix environment should use datacenter switches designed to handle high-
bandwidth server and storage traffic at low latency. Do not use switches meant for
deployment at the campus access layer. Campus access switches may have 10 Gbps
ports like datacenter switches, but they are not usually designed to transport a large
amount of bidirectional storage replication traffic. Refer to the Nutanix physical
networking best practices guide for more information.
• Line rate: Ensures that all ports can simultaneously achieve advertised throughput.
• Low latency: Minimizes port-to-port latency as measured in microseconds or
nanoseconds.
• Large per-port buffers: Accommodates speed mismatches from uplinks without
dropping frames.
• Nonblocking, with low or no oversubscription: Reduces chance of drops during peak
traffic periods.
• 10 Gbps or faster links for Nutanix CVM traffic: Only use 1 Gbps links for additional
user VM traffic or when 10 Gbps connections are not available, such as in a ROBO
deployment. Limit Nutanix clusters using 1 Gbps links to eight nodes maximum.
Switch manufacturers’ datasheets, specifications, and white papers can help identify
these characteristics. For example, a common datacenter switch datasheet may show
a per-port buffer of 1 MB, while an access layer or fabric extension device has a per-port
buffer of around 150 KB. During periods of high traffic, or when using links with a speed
mismatch (such as 40 Gbps uplinks to 10 Gbps edge ports), a smaller buffer can lead
to frame drops, increasing storage latency.
52
The following table is not exhaustive, but it includes some examples of model lines
that meet the above requirements. Models similar to the ones shown are also generally
good choices.
Examples of switches that do not meet these switch requirements are shown
in the following table:
Cisco Nexus 2000 (Fabric Extender) Highly oversubscribed with small per-port
buffers.
Each Nutanix node also has an out-of-band connection for IPMI, iLO, iDRAC, or similar
management. Because out-of-band connections do not have the same latency or
throughput requirements of VM or storage networking, they can use an access layer
switch.
53
NET-001 USE A LARGE BUFFER DATACENTER SWITCH AT
10GBPS OR FASTER.
Network Topology
In a greenfield environment, Nutanix recommends a leaf-spine network topology
because it is easy to scale, achieves high performance with low latency, and provides
resilience. A leaf-spine topology requires at least two spine switches and two leaf
switches. Every leaf connects to every spine using uplink ports.
There are no connections between the spine switches or between the leaf switches in a
conventional leaf-spine design. To form a Nutanix cluster it’s critical that all nodes are in
the same broadcast domain, thus in any leaf-spine design all leaf switches connecting
nodes in a cluster should carry the Nutanix VLAN. This can be accomplished with
physical connections between switches, using an overlay network, or using a pure layer
2 design. The following example shows a pure layer 2 design or overlay.
54
You may also choose a leaf-spine topology that uses links between switches to
guarantee layer 2 connectivity between Nutanix nodes.
Use uplinks that are a higher speed than the edge ports to reduce uplink
oversubscription. To increase uplink capacity, add spine switches or uplink ports as
needed.
55
In pre-existing environments, you may not have full control over the network topology,
but your design should meet the following requirements:
Guidelines:
• Networks must be highly available and tolerate individual device failures.
• Ensure that each layer of the network topology can tolerate device failure.
• Avoid configurations or technologies that do not maintain system availability during
single device outages or upgrades such as stacked switches.
• Ensure that there are no more than three switches between any two Nutanix nodes in
the same cluster. Nutanix nodes send storage replication traffic to each other in a
distributed fashion over the top-of-rack network. One Nutanix node can therefore send
replication traffic to any other Nutanix node in the cluster.
• The network should provide low and predictable latency for this traffic. Leaf-spine
networks meet this requirement by design. For the core-aggregation-access model,
ensure that all nodes in a Nutanix cluster share the same aggregation layer to meet the
three-switch-hop rule.
Oversubscription occurs when an intermediate network device or link does not have
enough capacity to allow line rate communication between the systems connected to
it. For example, if a 10 Gbps link connects two switches and four hosts connect to each
switch at 10 Gbps, the connecting link is oversubscribed. Oversubscription is often
expressed as a ratio—in this case 4:1, as the environment could potentially attempt to
transmit 40 Gbps between the switches with only 10 Gbps available. Achieving a ratio
of 1:1 is not always feasible.
56
RECOMMENDATION:
• Keep the oversubscription ratio as small as possible based on budget and available
capacity.
RECOMMENDATION:
• To avoid packet loss due to link oversubscription, ensure that the switch uplinks consist
of multiple interfaces operating at a faster speed than the Nutanix host interfaces. For
example, for nodes connected at 10 Gbps, interswitch connections should consist of
multiple 10, 40, or 100 Gbps links.
• Connect all Nutanix nodes that form a cluster to the same switch fabric. Do not stretch
a single Nutanix cluster across multiple, disconnected switch fabrics. A switch fabric is
a single leaf-spine topology or all switches connected to the same switch aggregation
layer. Every Nutanix node in a cluster should therefore be in the same L2 broadcast
domain and share the same IP subnet.
See section “Security -> Network Segmentation” for service placement information and
design decisions.
RECOMMENDATION:
• Use native, or untagged, VLANs for the hypervisor host and CVM for ease of initial
configuration. Ensure that this untagged traffic is mapped into the CVM and
hypervisor VLAN only on the required switch ports to reduce risk.
• Use tagged VLANs for all guest VM traffic and add the required guest VM VLANs to all
connected switch ports for hosts in the Nutanix cluster.
• Limit guest VLANs for guest VM traffic to the smallest number of physical switches
and switch ports possible to reduce broadcast network traffic load.
57
NET-003 POPULATE EACH RACK WITH TWO 10GBE
OR FASTER TOR SWITCHES
58
NET-007 CONFIGURE THE CVM AND HYPERVISOR VLAN
AS NATIVE, OR UNTAGGED ON SERVER FACING
SWITCH PORTS.
59
Broadcast Domains
Performing layer 3 routing at the top of rack, creating a layer 3 and layer 2 domain
boundary within a single rack, is a growing network trend. Each rack is a different IP
subnet and a different layer 2 broadcast domain. This layer 3 design decreases the size
of the layer 2 broadcast domain to remove some common problems of sharing a large
broadcast domain among many racks of servers, but it can add complexity for applica-
tions that require layer 2 connectivity.
In contrast, the traditional layer 2 design shares a single broadcast domain or VLAN
among many racks of switches. In the layer 2 design, a switch and server in one rack
share the same VLANs as a switch and server in another rack. Routing between IP
subnets is performed either in the spine or in the aggregation layer. The endpoints
in the same switch fabric have layer 2 connectivity without going through a router, but
this can increase the number of endpoints that share a noisy broadcast domain.
Broadcast Domain 1
Subnet 1
Nutanix Cluster 1
Broadcast Domain 2
Subnet 2
Nutanix Cluster 2
60
Nutanix recommends a traditional layer 2 network design to ensure that CVMs and
hosts can communicate in the same broadcast domain even if they are in separate racks.
The CVM and host must be in the same broadcast domain and IP subnet.
If a layer 3 network design is chosen, there are two possible ways to make this work
with a Nutanix deployment:
• Keep all Nutanix nodes in a cluster inside the same rack.
• Create an overlay network in the switch fabric that creates a virtual broadcast domain
that is shared across racks for the Nutanix CVMs and hosts.
Keeping all of the Nutanix nodes in a single cluster in a single rack has the limitation
of losing resilience against rack failure. Creating an overlay in the network fabric:
Spreading a broadcast domain across racks virtually may also raise concerns that the
original intent of the layer 3 design, reducing broadcast domain scope, is now violated.
61
Broadcast Domain 1 Broadcast Domain 2 Broadcast Domain 3
Subnet 1 Subnet 2 Subnet 3
Overlay Network X
Subnet X
Nutanix Cluster 1
Overlay Network Y
Subnet Y
Nutanix Cluster 2
Because there are no more than three switch hops between any two Nutanix nodes in
this design, a Nutanix cluster can easily span multiple racks and still use the same switch
fabric. Use a network design that brings the layer 2 broadcast domain to every switch
that connects Nutanix nodes in the same cluster.
62
The following example shows a leaf-spine network using either an overlay or a pure
layer 2 design, with no connections are required between leaf switches.
You may choose a leaf-spine design where a pair of leaf switches are connected using
link aggregation. Discuss scale plans with the switch vendor when scaling beyond two
spine switches in this design.
63
Figure: Rack Level View of Leaf Spine Scale
It’s also important to take into account the throughput capacity of the switch and
the throughput required between each rack.
In the previous example with 4 workload racks, assume you have 2 x 32-port spine
switches with a single 100 Gbps connection to every leaf. Also assume each TOR leaf
switch has 48 x 10 Gbps ports and 6 x 100 Gbps ports.
Spine switch port utilization is at 8 out of 32 ports on each spine, leaving capacity
to grow up to 16 racks at each spine (32 spine ports divided by 2 ports per rack).
Each leaf would be using 1 out of the 6 available uplinks and each leaf can support up to
48 connected servers. However, at this point these two spine switches will be processing
a lot of traffic and may be overloaded depending on their specifications.
Calculating oversubscription ratios, if you assume that each rack has 24 x 10Gbps
dual-connected servers with dual leaf switches, for a total of 240 Gbps bandwidth
capacity at each leaf, oversubscription is 2.4:1. This design is oversubscribed and
would not be recommended.
One way to reduce oversubscription is to add another 100Gbps uplink from each leaf to
the existing spine switches. However, that reduces the total number of supported racks
by half to 8, (32 spine ports divided by 4 ports per rack), so you must carefully consider
which leaf and spine switches are selected and what scale you would like to achieve.
64
Another way to grow spine capacity is to add a spine switch, which would then require
another uplink from every leaf, and greatly increase the total throughput of the switch
fabric without reducing the number of racks supported. If you added another spine
switch to bring the total to 3 spines, each leaf switch would add another 100Gbps uplink
(reducing oversubscription), but each spine could still support 16 racks (32 spine ports
divided by 2 ports per rack). With 6 uplinks on every leaf switch, you can support
designs with up to 6 spines in this example.
Scaling the multi-tier network design may require adding another aggregation and
access layer to the core. In this case, there would be more than three switch hops
between the two access layers.
• Guideline: Ensure that you add Nutanix nodes in separate aggregation and access
layers to separate clusters to keep the number of switch hops between nodes in the
same cluster to three or fewer.
In the following figure, Cluster 1 connects to one aggregation layer and Cluster 2
connects to another.
65
Host Networking
AHV Networking
AHV leverages Open vSwitch (OVS) for all VM networking. The virtual switch is referred
to as a bridge and uses a br prefix in the name. Read the AHV Networking Best
Practices Guide for in depth guidance on any settings not covered here.
Bridge
The default bridge, br0, behaves like a layer 2 learning switch that maintains a MAC
address table. Additional uplink interfaces for separate physical networks are added as
brN, where N is the new bridge number. To ensure that all VM traffic goes through the
same set of firewall rules and the same set of network functions, a bridge chain is
created with all microsegmentation and network function rules being placed in br.
microseg and br.nf respectively.
66
Traffic from VMs enters the bridge at brN.local. Next, traffic is multiplexed onto the
bridge chain, goes through the firewall and network function ruleset, and is then
demultiplexed on the correct brN for physical network forwarding. The reverse path is
followed for traffic flowing from the physical network to VMs.
All traffic between VMs must flow through the entire bridge chain, since brN is the only
bridge that performs switching. All other bridges simply apply rules or pass traffic up
and down the bridge chain.
Nutanix recommends:
• Use only the default bridge, br0, with the two fastest network adapters in the host.
• Converge the management, storage, and workload traffic on this single pair of uplink
adapters.
• Additional brN bridges should only be added when connection to a separate physical
network is required. For example, if the top-of-rack has two pairs of switches, one pair
for storage and management, and another pair for workload traffic, it makes sense to
create another bridge, br1 and place the workloads on this bridge.
• Other valid use cases include separate physical networks for iSCSI Volumes workloads,
backplane intracluster replication, or workloads like virtual firewalls that require access
to a physically separate network. In these cases the additional bridge is used to
connect to the separate top-of-rack network. If there is only a single top-of-rack
physical network, then a single bridge with VLAN separation is sufficient.
AHV also includes a Linux bridge called virbr0. The virbr0 Linux bridge carries
management traffic between the CVM and AHV host. All other storage, host, and
workload network traffic flows through the br0 OVS bridge, or additional brN bridges if
configured.
NOTE: Do not modify the configuration of any bridges inside the AHV host unless
following an official Nutanix guide.
Bond
Bonded ports aggregate the physical interfaces on the AHV host. By default, a bond
named br0-up is created in bridge br0. After the node imaging process, all interfaces are
placed within a single bond, which is a requirement for the foundation imaging process.
A bridge can have only a single bond.
Bonds allow for several load-balancing modes, including active-backup, balance-slb and
balance-tcp. Link Aggregation Control Protocol (LACP) can also be activated for a
bond for link aggregation. The “bond_mode” setting is not specified during installation
and therefore defaults to active-backup, which is the default configuration.
RECOMMENDATION: Ensure that each bond has at least two uplinks for redundancy.
The uplinks in each bond must all be the same speed.
67
Uplink Load Balancing
The following bond modes are available:
• active-backup: Default configuration which transmits all traffic over a single active
adapter. If the active adapter becomes unavailable, another adapter in the bond will
become active. Limits host throughput to the bandwidth of a single network adapter.
No additional switch configuration required.
• balance-tcp with LACP: Distributes VM NIC TCP or UDP session across adapters in the
bond. Limits per-nic throughput to the maximum bond bandwidth (number of physical
uplink adapters * speed). Requires physical switch link aggregation. Used when LACP
negotiation is required by the datacenter team or throughput requirements exceed a
single NIC.
RECOMMENDATIONS:
• To keep network configuration simple, use the standard 1,500 byte MTU in the hosts,
CVMs, and workload VMs. Nutanix does not recommend jumbo frames unless
specifically required by high-performance Nutanix Volumes iSCSI workloads or specific
workload requirements.
• When switching from 1,500-byte frames to 9,000-byte frames, performance
improvements are generally not significant unless the workload uses the maximum
network bandwidth for read traffic.
For more information on when to use jumbo frames, visit the Volumes and AHV
Networking Best Practices guides.
68
NET-011 USE A SINGLE BR0 BRIDGE WITH AT LEAST TWO OF
THE FASTEST UPLINKS OF THE SAME SPEED UNLESS
MULTIPLE PHYSICAL NETWORKS ARE REQUIRED.
Justification Simplifies the design and does not need any additional
configuration.
Justification Simplifies the design and does not need any additional
configuration. Reduces risk and complexity from
non-standard configuration that only increases perfor-
mance in certain use cases.
Implication All network frames are limited to the default 1,500 byte
maximum size for interoperability, potentially creating
more network overhead for high throughput write
workloads.
69
vSphere Networking
VMware vSphere networking follows many of the same design decisions as AHV
networking. The critical design choices for vSphere networking are covered here. See the
Nutanix vSphere Networking Best Practices Guide for more details.
Nutanix hosts with vSphere ESXi use two virtual switches (vSwitches), named
vSwitchNutanix and vSwitch0. vSwitchNutanix is the internal, standard vSwitch used for
management and storage traffic between the CVM and the hypervisor.
RECOMMENDATIONS:
• Do not modify vSwitchNutanix. vSwitch0 is also a standard vSwitch by default, used
for communication between CVMs as well as workload traffic.
• Nutanix recommends converting vSwitch0 to the distributed vSwitch following the
distributed vSwitch migration knowledge base article. Converting to the distributed
vSwitch allows central management of networking for all hosts, instead of host by host
networking configuration. The distributed vSwitch also adds capability for advanced
networking functions such as load based teaming, LACP, and traffic shaping.
• Connect at least two of the fastest adapters of the same speed into vSwitch0 and use
the “Route Based on Physical NIC Load” load balancing method to ensure traffic is
balanced between uplink adapters.
• Connect these adapters to two separate top-of-rack switches to ensure redundancy.
• Do not add more vSwitches unless a connection to another physical network is needed
to meet security or workload requirements.
• All CVM storage, hypervisor host, and workload traffic should flow through vSwitch0
using VLAN separate between the workload traffic and all other traffic.
• Use the default 1,500 byte frame size on all uplinks unless there is a specific
performance or application requirement that would justify 9,000 byte jumbo frames.
70
NET-015 USE VIRTUAL DISTRIBUTED SWITCH (VDS)
71
NET-018 USE ROUTE BASED ON PHYSICAL NIC LOAD
UPLINK LOAD BALANCING
Impact All network frames are limited to the default 1,500 byte
maximum size for interoperability, potentially creating
more network overhead for high throughput write
workloads.
Justification Simplifies the design and does not need any additional
configuration. Reduces risk and complexity from
non-standard configuration that only increases
performance in certain use cases.
72
4.2.5 COMPUTE AND STORAGE DESIGN
Nutanix HCI is a converged storage and compute solution which leverages local
hardware components (CPU, memory, and storage) and creates a distributed platform
for running workloads.
Each node runs an industry-standard hypervisor and the Nutanix CVM. The Nutanix
CVM provides the software intelligence for the platform. It is responsible for serving all
IO to VMs that run on the platform.
73
The CVM controls the storage devices directly and creates a logical construct called a
storage pool of all disks from all nodes in the cluster. For AHV it uses PCI passthrough
and for ESXi it uses VMDirectPathIO.
Compute Design
The amount of memory required by an AHV host varies based on a number of factors.
The host uses from 2GB to 10GB of memory on hosts with 512GB memory.
74
For example, if a host has 2 CPU sockets with 8 cores each and 256GB memory,
the host has 2 NUMA nodes, each with:
• 8 CPU cores
• 128GB memory
75
For example, a VM that has 4 vCPUs and 64GB memory, will fit within a single NUMA
node on the host and achieve the best performance.
76
Virtual Non Uniform Memory Access (vNUMA) and User VMs
The primary purpose of vNUMA is to give large virtual machines or wide VMs (VMs
requiring more CPU or memory capacity than is available on a single NUMA node) the
best possible performance. vNUMA helps wide VMs create multiple vNUMA nodes.
Nutanix supports vNUMA with both AHV and ESXi. vNUMA requires NUMA aware
applications.
Each vNUMA node has virtual CPUs and virtual RAM. Pinning a vNUMA node to a
physical NUMA node ensures that virtual CPUs accessing virtual memory see the
expected NUMA behavior. Low-latency memory access in virtual hardware (within
vNUMA) matches low-latency access in physical hardware (within physical NUMA),
and high-latency accesses in virtual hardware (across vNUMA boundaries) match
high-latency accesses on physical hardware (across physical NUMA boundaries).
In AHV, administrators can configure vNUMA for a VM via the aCLI (Acropolis CLI) or
REST API; this configuration is VM-specific and defines the number of vNUMA nodes.
Memory and compute are divided in equal parts across each vNUMA node. Refer to
AHV admin guide for steps on how to do it.
Building on the previous example, if a host with 16 CPU cores and 256GB memory
has a VM with 12vCPUs and 192GB memory that is not vNUMA configured, the vCPU
and memory assignment will span NUMA boundaries.
77
To ensure the best performance for wide VMs like this, vNUMA must be configured.
vNUMA VM configurations require strict fit. With strict fit, for each VM virtual node,
memory must fit inside a physical NUMA node. Each physical NUMA node can provide
memory for any number of vNUMA nodes. If there is not enough memory within a
NUMA node, the VM does not power on. Strict fit is not required for CPU. To determine
how many vNUMA nodes to use for a user VM, follow application-specific configuration
recommendations provided by Nutanix.
78
CVM CPU and Memory Considerations
Optimal CVM vCPU values depend on the type of system and how many cores per
NUMA node are present in the system. The CVM is pinned to the first physical CPU of
the node. The following table gives an overview of minimum vCPU and memory values
for the CVM (as of AOS 5.15):
RECOMMENDATIONS:
Use Nutanix Foundation to set up a new cluster or node and configure the CVM(s).
Nutanix Foundation software automatically identifies the platform and model type.
Based on that information, Nutanix Foundation configures the appropriate default values
for CVM memory and vCPU.
Storage Design
Nutanix HCI combines highly dense storage and compute into a single platform with a
scale-out, shared-nothing architecture and no single points of failure. The AOS
Distributed Storage Fabric (DSF) appears to the hypervisor like any centralized storage
array, however all of the I/Os are handled locally to provide the highest performance.
Storage planning for this document, requires that you understand some of the high level
concepts for DSF:
Storage Pool
A group of physical storage devices within the cluster. This can include HDDs and both
NAND and NVMe SSDs. The storage pool spans multiple Nutanix nodes and expands
as the cluster expands. A default storage pool is created when a cluster is created.
79
Container
Logical segmentation of the Storage Pool is provided by containers, which contain a
group of VMs or files (vDisks). Configuration options and data management services are
configured at the container level. When a cluster is created, a default storage container
is configured for the cluster.
Container Guidelines:
A key design question is how many containers to configure in a cluster:
Data reduction, including compression, dedupe, erasure coding and replication factor
(RF) values, is configured on a container basis and multiple containers need to be
created when different values are desired.
All Nutanix containers are thin provisioned by default. Thin provisioning is a widely
accepted technology that has been proven over time by multiple storage vendors,
including VMware. As the DSF presents containers to VMware vSphere hosts as NFS
datastores by default, all VMs are also thin provisioned by default. vDisks created within
AHV VMs are thinly provisioned by default. Nutanix also supports thick provisioned
vdisks for VMs in ESXi by using space reservations. Using thick provisioned vdisks might
impact data reduction values on containers since Nutanix reserves the thick provisioned
space and it cannot be oversubscribed.
Implication
80
vDisk
All storage management is VM-centric, and I/O is optimized at the vDisk level.
The software solution runs on nodes from a variety of manufacturers. A vDisk is any
file over 512KB stored in DSF, including vmdks and VM hard disks. vDisks are logically
composed of vBlocks.
RECOMMENDATIONS:
• Use multiple vDisks for an application versus a single large vDisk.
Using multiple vDisks allows for better overall performance due to the ability to leverage
multiple OS threads to do more parallel processing versus a single vDisk configuration.
Refer to the appropriate application solution guide for application-specific configuration
recommendations for how many vDisks to configure.
Nutanix Volumes
In addition to providing storage through the hypervisor, Nutanix also allows supported
operating systems, to access DSF storage capabilities directly using Nutanix Volumes.
Nutanix designed Volumes as a scale-out storage solution where every CVM in a cluster
can present storage volumes via iSCSI. This solution allows an individual application to
access the entire cluster, if needed, to scale out for performance.
Volumes automatically manages high availability to ensure upgrades or failures are non-
disruptive to applications. Storage allocation and assignment for Volumes is done with
volume groups (VGs). A VG is a collection of one or more disks (vDisks) in a Nutanix
storage container. These Volumes disks inherit the properties (replication factor,
compression, erasure coding, and so on) of the container they reside in. With Volumes,
vdisks in a VG are load balanced across all CVMs in the cluster by default.
In addition to connecting volume groups through iSCSI, AHV also supports direct
attachment of VGs to VMs.In this, the vdisks are presented to guest OS over the virtual
SCSI controller. The virtual SCSI controller leverages AHV Turbo and iSCSI under the
covers to connect to the Nutanix DSF. By default the vdisks in a VG directly attached
to a VM are hosted by the local CVM. Load balancing of vdisks on direct attached VGs
can be enabled via acli.
Each vDisk is served by a single CVM. Therefore, to use the storage capabilities of
multiple CVMs, create more than one vDisk for a file system and use OS-level striped
volumes to spread the workload. This improves performance and prevents storage
bottlenecks, but can impact your network bandwidth.
81
GUIDELINE: Core use-cases for Nutanix Volumes:
• Shared Disks
• Oracle RAC, Microsoft Failover Clustering, etc.
• Bare-metal consumers
• Stand alone or clustered physical workload.
DSF automatically tiers data across the cluster to different classes of storage devices
using intelligent lifecycle management (ILM). For best performance, ILM makes sure the
most frequently used data is available in memory or in flash on the node local to the VM.
The default threshold where the algorithm will move data from the hot tier to the cold
tier is 75%. Data for down-migration is chosen based on last access time. In All-Flash
node configurations, the Extent Store consists only of SSDs and no tiering occurs.
RECOMMENDATIONS:
For hybrid configurations, Nutanix recommends:
• Sizing the SSD tier so that the active data set of application fits in 75% of usable
SSD capacity.
• Nutanix has a collector tool that can be run to determine the approximate
active data set size.
• Alternatively, use application-specific tools such as: MAP toolkit for MSSQL;
AWR reports and Nutanix scripts for Oracle.
• A general rule is to check how much data has been backed up during a month.
Assuming a 70%/30% R/W pattern (if the R/W patterns are unknown), multiply
the data you get from backups by 4, which would give an approximate value for
hot data. That amount should fit within 75% of the usable cluster capacity.
82
For All-Flash configurations there is no ILM, so there is no equivalent recommendation.
RECOMMENDATIONS:
For hybrid configurations, Nutanix recommends:
83
STR-03 DO NOT MIX NODE TYPES FROM DIFFERENT
VENDORS IN THE SAME CLUSTER.
84
NODE SELECTION RECOMMENDATIONS:
• If your high-level design decision is to use hybrid nodes, you need to decide the
number of HDDs and SSDs per node.
• For applications that require high performance like DBs and Tier-1 applications,
more drives per node is better.
• OpLog will be spread across up to 8 SSDs, so more SSDs results in overall better
and more consistent performance. If NVMe devices are present, OpLog is placed
on them upto 8 NVMe SSDs.
• More SSDs provides more usable space for tiering data.
• For All-Flash nodes, more SSDs provide better performance by making read and
write access more parallel, especially for non-optimal workload patterns.
• Use Nutanix Sizer to size usable capacity and obtain node recommendations based
on your workload use case.
• Also refer to specific application solution guides and/or best practice guides.
*4 drive slot and 10 drive slot systems from Dell, Lenovo, and Fujitsu can have 2+2
and 4+6 SSD+HDD configurations respectively.
85
Availability Domains and Fault Tolerance
Availability domains are used to determine component and data placement.
Nutanix availability domains are:
• Disk Awareness (always).
• Node Awareness (always).
• Block Awareness (optional).
• Requires a minimum of 3 blocks for FT1 and 5 for FT2, where a block contains
either 1,2 or 4 nodes. With Erasure Coding, minimums are 4 blocks for FT1 and
6 blocks for FT2.
• Rack Awareness(optional).
• Requires a minimum of 3 racks for FT1 and 5 for FT2, and you must define what
constitutes a rack and where your blocks are placed. With Erasure Coding,
minimums are 4 racks for FT1 and 6 blocks for FT2.
FT=1: A cluster can lose any one component and operate normally.
FT=2: A cluster can lose any two components and operate normally.
Depending on the defined awareness level and FT value, data and metadata are
replicated to appropriate locations within a cluster to maintain availability.
86
The following table gives an overview of Data awareness and FT levels supported.
FT GUIDELINES:
• FT=2 implies RF=3 for data and RF=5 for metadata by default.
• Use this setting when you need to withstand two simultaneous failures and have
cluster and VM data still be available.
• FT=1 implies RF=2 for data and RF=3 for metadata by default.
• Use this setting when the application is fine with having only 2 copies of data. This
will keep the cluster and VM data available after one failure.
• Another option is to set FT=2 for a cluster with specific containers set to RF=2.
• With this configuration, data is RF=2, but metadata is RF=5.
• For containers that are RF=2, if there are 2 simultaneous failures there is a
possibility that VM data becomes unavailable, but the cluster remains up.
• This is different than the FT=1, RF=2 case where two simultaneous failures will result
in a cluster being down.
87
Capacity Optimization
Nutanix provides different ways to optimize storage capacity that are intelligent and
adaptive to workloads characteristics. All optimizations are performed at the container
level, so different containers can use different settings.
The following table summarizes the expected overhead for EC-X vs standard RF2/RF3
overhead:
EC-X GUIDELINES:
EC-X provides the same level of data protection while increasing usable storage
capacity.
• Turn on EC-X for non-mission-critical workloads and workloads that have a significant
amount of write cold data, since erasure coding works on write cold data and provides
more usable storage.
• For more information refer to specific application guides.
88
Compression
DSF provides both inline and post-process data compression. Irrespective of inline or
post-process compression, write data coming into oplog that is >4k and shows good
compression, will be written compressed in OpLog.
• Inline: Compresses sequential streams of data or large size I/Os (>64K) when writing
to the Extent Store.
• Post-Process: Compresses data after it is drained from OpLog to the Extent Store
after compression delay is met during the next curator scan.
Nutanix uses the LZ4 and LZ4HC compression algorithms. LZ4 compression is used to
compress normal data, providing a balance between compression ratio and
performance. LZ4HC compression is used to compress cold data to improve the
compression ratio. Cold data is characterized as:
COMPRESSION GUIDELINES:
Compression provides on-disk space savings for applications such as databases,
and results in a lower number of writes being written to storage.
89
Deduplication
When enabled, DSF does capacity-tier and performance-tier deduplication. Data is
fingerprinted on ingest using a SHA-1 hash that is stored as metadata. When duplicate
data is detected based on multiple copies with the same fingerprint, a background
process removes the duplicates. When deduplicated data is read, it is placed in a unified
cache, and any subsequent requests for data with the same fingerprint are satisfied
directly from cache.
DEDUPLICATION GUIDELINES:
Deduplication is recommended to be used for full clones, P2V migrations and
persistent desktops.
Introduction
The Nutanix hypervisor, AHV, is an attractive alternative hypervisor that streamlines
operations and lowers overall costs. Built-in to Nutanix Enterprise Cloud and included at
no additional cost in the AOS license, AHV delivers virtualization capabilities for the most
demanding workloads. It provides an open platform for server virtualization, network
virtualization, security, and application mobility. When combined with comprehensive
operational insights and virtualization management from Nutanix Prism and Prism
Central, AHV provides a complete datacenter solution.
90
Control Plane
The control plane for managing the Nutanix core infrastructure and AHV are
provided by:
91
HA/ADS
Nutanix AHV has built-in VM high availability (VM-HA) and a resource contention
avoidance engine called Acropolis Dynamic Scheduling (ADS). ADS is always on and
does not require any manual tuning. VM-HA can be enabled and disabled via a simple
Prism checkbox. There are two main levels of VM-HA for AHV:
VM-HA considers memory when calculating available resources throughout the cluster
for starting VMs. VM-HA respects Acropolis Distributed Scheduler (ADS) VM-host
affinity rules, but it may not honor VM-VM anti-affinity rules, depending on the scenario.
92
VMware vSphere
Introduction
VMware vSphere is fully supported by Nutanix with many points of integration. vSphere
major and minor releases go through extensive integration and performance testing,
ensuring that both products work well together in even the largest enterprises. Rest
assured that if you deploy vSphere, the solution is fully supported and performant.
Control Plane
When using vSphere as a part of this document, the control plane is vCenter to manage
the vSphere components and Prism Central for managing Nutanix components.
EVC
VMware’s Enhanced vMotion Capability (EVC) allows VMs to move between different
processor generations within a cluster. EVC mode is a manually configured option, unlike
the corresponding feature in AHV. EVC should be enabled at cluster installation time
to negate the possible requirement to reboot VMs in the future when it is enabled.
93
HA/DRS
VMware HA and DRS are core features which should be utilized as a part of this
document. Nutanix best practices dictate that a few HA/DRS configuration settings be
changed from the default. These changes are outlined in the design decisions below.
VRT-005 ENABLE HA
94
VRT-008 CONFIGURE DAS.IGNOREINSUFFICIENTHBDATASTORE
IF ONE NUTANIX CONTAINER IS PRESENTED TO THE
ESXI HOSTS
Impact DRS will not try and move a CVM to another host.
Nor will CVM resources be taken into account for
HA calculations.
95
VRT-011 SET HOST ISOLATION RESPONSE TO “POWER OFF
AND RESTART VMS”
Impact vSphere will not try and restart the CVM on another
host, which is impossible since the CVM uses local
storage.
96
SIOC
Each of these has a maximum size that limits the total number of VMs, nodes,
and clusters that can be managed by an instance.
Prism Central
Prism Central (PC) is a global control plane for Nutanix, which includes managing VMs,
replication, monitoring, and value-add products such as:
These products are installed from Prism Central and managed centrally for
all Prism Element clusters associated to the Prism Central instance.
Prism Central plays an important part in a deployment regardless of the scale. There are
two deployment architectures for Prism Central that can be utilized and scaled
depending on the size and goals of the design.
97
Single-VM Prism Central Deployment
The single VM deployment option for Prism Central offers a reduced footprint option for
designs that do not require high availability for the control plane beyond that provided
by the hypervisor. The single Prism Central VM option can be used with a small or large
VM sizing. The size directly correlates to the size of environment it can manage.
98
The scale-out Prism Central architecture deploys 3 VMs on the same cluster and is
superior in terms of availability and capacity. If scale-out is not the best option from
the beginning, it is fairly simple to go from a single PC VM deployment to a scale-out
architecture at a later time.
Image Templates
Within any environment operating at scale there is a need to keep approved template
images available and in sync between clusters. For AHV clusters, image placement
policies should be utilized for this. Image policies are configured to determine the
clusters each image can or cannot be utilized in. This makes the initial roll out of new
and updated image versions easy.
RECOMMENDATION DESCRIPTION
AREA
The benefits:
• Simplifies capacity planning.
• Simplifies platform life cycle management.
• Simplifies management for virtual networking.
• Reduce management overhead.
The vCenter appliance can be deployed in many different sizes. This ultimately
determines how large an environment you can support in terms of VMs, hosts, and
clusters. Refer to the official VMware documentation for the version you are deploying
to determine the proper sizing for your design.
99
Beyond the size of the environment, the vCenter appliance can be deployed as a single
VM or as a vCenter High Availability (vCenter HA) for additional availability. The option
to deploy vCenter HA does not increase the size of the environment it can manage as
a single virtual appliance is active at any time. The size of the environment that can be
managed is based on the size of the VMs deployed.
To ensure that the control plane is as highly available as possible, the clustered deployment
option is preferred as long as its compatible with other layers of the solution.
RECOMMENDATION DESCRIPTION
AREA
There are a variety of other infrastructure services that are necessary for a successful
Nutanix deployment, such as NTP, DNS, and AD. You may already have these infrastruc-
ture deployed and available for use when you deploy your Nutanix environment. If they
do not exist, then you will probably need to deploy them as part of the new
environment.
100
NTP
Network Time Protocol (NTP) is used to synchronize computer clock times including
network, storage, compute and software. If clock times drift too far apart ,some products
may have trouble communicating across layers of the solution. Keeping time in-sync is
beneficial when examining logs from different layers of the solution to determine the
root cause of an event.
A minimum of three NTP servers should be configured and accessible to all solution
layers with the NTP standard recommendation being five to detect a rogue time source.
These include AOS, AHV, and Prism Central, plus vCenter and ESXi if using VMware
vSphere as the virtualization layer. Use the same NTP servers for all infrastructure
components.
If you are in a Dark site with no internet connectivity, consider using a switch or GPS
time source.
Impact
Justification
DNS
Domain Name System (DNS) is a directory service that translates domain names of the
form domainname.ext to IP addresses. DNS is important to ensure that all layers can
resolve names and communicate. A highly available DNS environment should be utilized
to support this design. At least two DNS servers should be configured and accessible at
all layers to ensure components can reliably resolve addresses at all times.
101
DEP-002 A MINIMUM OF TWO DNS SERVERS
SHOULD BE CONFIGURED ACCESSIBLE TO ALL
INFRASTRUCTURE LAYERS
Impact
Justification
Active Directory
Active Directory (AD) is a directory service developed by Microsoft for Windows domain
networks. AD often serves as the authoritative directory for all applications and
infrastructure within an organization. For this design, all of the consoles and element
managers will utilize RBAC and use AD as the directory service for user and group
accounts. Where possible, AD groups should be utilized to assign privileges for easier
operations. User access can then be controlled by adding or removing a user from the
appropriate group.
The AD design should be highly available to ensure that directory servers are always
available when authentication requests occur.
Logging Infrastructure
Capturing logs at all layers of the infrastructure is very important. For example, if there is
a security incident, logs can be critical for forensics. An example of a robust log collector
is Splunk, but there are other options. Nutanix recommends that you deploy a robust
logging platform that meets your security requirements. All infrastructure logs should
be forwarded to the centralized log repository.
102
4.5. SECURITY LAYER
Nutanix Enterprise Cloud can be used to build private or multi-tenant solutions, and
depending on the use case the security responsibility will vary. The security approach
includes multiple components:
• Physical
• Datacenter access
• Equipment access (firewalls, load balancers, nodes, network switches, racks,
routers)
• Virtual Infrastructure
• Clusters
• Nodes
• Management components
• Network switches
• Firewalls
• Load balancers
• Threat vectors
• Active
• Automated
• Internal
• External
• Workloads
• Applications
• Containers
• Virtual Machines
With SCMA, you can schedule the STIG to run hourly, daily, weekly, or monthly. STIG has
the lowest system priority within the virtual storage controller, ensuring that security
checks do not interfere with platform performance.
103
In addition, Nutanix releases Nutanix Security Advisories which describe potential
security threats and their associated risks plus any identified mitigation(s) or patches.
For more information about please refer to Building Secure Platforms And Services
With Nutanix Enterprise Cloud.
4.5.1 AUTHENTICATION
Maintain as few user and group management systems as possible. A centrally managed
authentication point is preferred to many separately managed systems. Based on that
general recommendation you should at a minimum take advantage of the external LDAP
support provided by Nutanix components.
Prism Central also provides both LDAP and Security Assertion Markup Language
(SAML) and makes it possible for users to authenticate through a qualified Identify
Provider (IdP). If none of the options are available Nutanix also provides local user
management capabilities.
4.5.2 CERTIFICATES
All components facing consumers should be protected with certificate authority (CA)
signed certificates. Internal or external signed certificates depend on consumer classifi-
cation and what kind of service the specific component provides.
104
SEC-003 USE CERTIFICATE AUTHORITY (CA) THAT ARE
CONSIDERED "TRUSTED CA" BY YOUR ORGANIZA-
TION FOR THE COMPONENTS WHERE CERTIFICATES
CAN BE REPLACED. THIS CAN BE EITHER INTERNAL
OR EXTERNAL SIGNED CERTIFICATES.
105
SEC-005 DO NOT USE VSPHERE CLUSTER LOCKDOWN.
4.5.5 HARDENING
There are certain hardening configurations you can apply for the AHV host and
the CVM if required:
• AHV and CVM
• AIDE. Advanced Intrusion Detection Environment.
• Core. Enable stack traces for cluster issues.
• High Strength Passwords. Enforce complex password policy (Min length 15,
different by at least 8 characters).
• Enable Banner. Get specific login message via SSH.
• CVM only
• Enforce SNMPv3 only.
106
SEC-008 STOP UNUSED ESXI SERVICES AND CLOSE
UNUSED FIREWALL PORTS.
We recommend that you follow the Hardening Guide for the CVM.
If you are using AHV, follow the Hardening Guide For AHV.
For updated Hardening Guides, search portal.nutanix.com for the most recent guide
that relates to your version of AOS and AHV.
Internet-facing services are at constant risk of being attacked. Two common types of
attacks are denial of service (DoS) and distributed denial of service (DDoS). To help
mitigate the potential impact of a DoS or a DDoS attack a design can take advantage of:
There are additional ways to implement protection against these types of attacks.
4.5.7 LOGGING
Logging is critical from an auditing and traceability perspective so making sure the
virtual infrastructure, AOS, Prism Central, AHV, ESXi, any additional software in the
environment send their log files to a highly available log infrastructure is critical.
107
A single centralized activity logging solution for auditing purposes and account security
should be configured and maintained.
Justification Ensures data from all modules are included and search-
able via a logging system. Refer to Nutanix Syslog
documentation for additional information. Modules
can be excluded if needed.
108
SEC-012 USE DEFAULT ESXI LOGGING LEVEL,
LOG ROTATION, AND LOG FILE SIZES.
Nutanix recommends configuring the CVM and hypervisor host VLAN as a native, or
untagged, VLAN on the connected switch ports. This native VLAN configuration allows for
easy node addition and cluster expansion. By default, new Nutanix nodes send and receive
untagged traffic. If you use a tagged VLAN for the CVM and hypervisor hosts instead, you
must configure that VLAN while provisioning the new node, before adding that node to the
Nutanix cluster.
Do not segment Nutanix storage and replication traffic, or iSCSI Volumes traffic, on separate
interfaces (VLAN or physical) unless additional segmentation is required by mandatory
security policy or the use of separate physical networks. The added complexity of
configuring and maintaining separate networks with additional interfaces can not be
justified unless absolutely required.
109
SEC-015 USE VLAN FOR TRAFFIC SEPARATION OF
MANAGEMENT AND USER WORKLOADS.
110
Configure RBAC at the Prism Central level since it provides the overlying management
construct. Via Prism Central, consumers and administrators will be directed to underlying
components, such as Prism Element, when needed. This ensures your least-privilege
configuration stays in place, avoiding common mistakes that occur when RBAC is
configured at multiple different levels.
ROLE PURPOSE
111
When creating custom roles in Prism Central there are multiple entities available,
each with their own set of permission definitions:
• App
• VM
• Blueprint
• Marketplace Item
• Report
• Cluster
• Subnet
• Image
ROLE PURPOSE
In addition, there are pre-defined roles in VMware vCenter Server, with the option to
create custom roles.
112
SEC-019 ALIGN RBAC STRUCTURE AND USAGE OF DEFAULT
PLUS CUSTOM ROLES ACCORDING TO THE
COMPANY REQUIREMENTS DEFINED VIA SEC-018.
• Activities that require a full data copy that will be used outside the platform.
• Failed drives leaving the datacenter.
• Drive or node theft.
Self Encrypting Drives (SEDs) provide FIPS 140-2 Level 2 compliance and can be used
without any performance impact. Nutanix Software Based Encryption and Native Key
Manager are FIPS 140-2 Level 1 Evaluated.
You can use both software-based encryption and SEDs. This requires an external key
management server.
113
NOTE:
• All methods of DaRE are FIPS 140-2 compliant however if levels 2, 3, or 4 are required
a hardware component is necessary.
• Software-based encryption with the Nutanix Native Key Manager (KMS) can encrypt
storage containers (ESXi or Hyper-V) or the entire Cluster (AHV).
Impact N/A
4.5.12 XI BEAM
During the environment lifecycle it’s important to ensure compliance is met and security
configuration meets required standards. Xi Beam is a cost and security optimization
SaaS offering that works across public cloud and on-premises environments.
Beam Security Compliance for Nutanix provides a centralized view of the security
posture of a Nutanix private datacenter and provides visibility into the security of your
on-prem environment based on known compliance standards.
Beam can help you to comply with regulatory and business-specific compliance policies
such as PCI-DSS, HIPAA, and NIST. (note: Xi Beam is not available for dark sites)
You gain deep insights into your on-premises Nutanix deployments based on over
300 audit checks and security best practices according to:
• Audit security checks for access, data, networking and virtual machines.
• Compliance checks against PCI-DSS v3.2.1 for AHV, AOS, Flow and Prism Central.
114
4.6. AUTOMATION LAYER
Nutanix supports intelligent IT operations and advanced automation that enable you to
streamline operations, enable IT as a Service, and support the needs of developers and
business teams. This section covers automation and orchestration of virtual
infrastructure, focusing on provisioning and maintenance which are important aspects
of the overall solution. Nutanix tools reduce the time required to perform initial setup
plus software and firmware upgrades.
4.6.1 UPGRADES
Upgrades can occur across a variety of components. The process to upgrade many of
these components is primarily executed using LCM, which is able to understand
dependencies without operator intervention.
• AOS. When upgrading, each CVM is individually upgraded in a rolling fashion. While a
CVM is rebooting, the host it is running on is redirected to a remote CVM to deliver
storage IO. This is invisible to the VM. However, it does result in a loss of locality, which
can potentially impact IO throughput and latency if the load on the system is high.
• Hypervisor. When upgrading the hypervisor on a node, each VM must be migrated off
the host for the update to be performed so that a reboot of the host can occur. For
vSphere, this requires vCenter integration to be configured to allow for a host to be put
into maintenance mode.
AHV live migration and vSphere vMotion are normally non-disruptive, however certain
applications that utilize polling-style device drivers or that have near-real-time kernel
operations cannot tolerate being migrated during the final cutover step. Hypervisor
updates will require downtime for these apps if they don’t offer native failover
functionality.
Firmware
Firmware can be updated for a variety of devices including:
115
Software
Software updates can be performed for multiple components, including:
LCM supports environments with internet connectivity as well as dark site deployments.
116
4.6.3 FOUNDATION
Foundation manages the initial setup and configuration of a cluster. Nutanix nodes may
come pre-installed with AHV and the Controller Virtual Machine (CVM) and you can:
• Add the Nodes to an existing Nutanix cluster.
• Create a new Nutanix cluster.
• Re-image the nodes with different AHV and or AOS version or different hypervisor
and create a Nutanix cluster.
NOTE: The Foundation process may vary for different hardware vendors.
117
Nutanix provides a Foundation pre-configuration service which is accessible via install.
nutanix.com. Via the service, you can define and download the Nutanix cluster
configuration to be used during the Foundation process. The downloaded file contains
all configuration required to perform the foundation operation and comes in json format.
This makes it easy to keep track of configurations, document them, and share with
your peers.
118
The following table provides an overview of the functions available for the different
Foundation software options.
Bare-metal nodes
Dell (XC)
119
4.7. BUSINESS CONTINUITY
4.7.1. BACKUP AND RECOVERY
Nutanix provides native support for snapshots and clones. Both snapshots and clones
leverage a redirect-on-write algorithm which is effective and efficient. Offloaded
snapshots can be leveraged via VAAI, ncli, ODX, REST APIs and Prism.
There are two different forms of snapshots to support different modes of replication:
Full snapshots are efficient at keeping system resource usage low when you are creating
many snapshots over an extended period of time. LWS reduces metadata management
overhead and increases storage performance by decreasing the high number of storage
I/O operations that long snapshot chains can cause.
Nutanix provides APIs that backup vendors such as HYCU, Veeam, and Commvault
leverage to take native snapshots and backups on Nutanix.
Local Backup
Nutanix native snapshots provide data protection at the VM level, and our crash-
consistent snapshot implementation is the same across hypervisors. The implementation
varies for application-consistent snapshots due to differences in the hypervisor layer.
Nutanix can create local backups and recover data instantly to meet a wide range of
data protection requirements. These local snapshots should not take the place of a
comprehensive traditional backup and disaster recovery solution.
VM snapshots are by default crash-consistent, which means that the vDisks and volume
groups captured are consistent to a single point in time and represent the on-disk data.
Application-consistent snapshots capture the same data as crash-consistent snapshots,
plus all data in memory and all transactions in process. The Nutanix application-
consistent snapshot uses Nutanix Volume Shadow Copy Service (VSS) to quiesce the file
system prior to taking a snapshot for both ESXi and AHV.
120
RECOMMENDATIONS: Protection Domains
• No more than 200 VMs per PD.
• No more than 10 VMs per PD with NearSync.
• Group VMs with similar RPO requirements.
Consistency groups. Administrators can create a consistency group for VMs and volume
groups that are part of a protection domain where you want to snapshot all members of
the group in a crash-consistent manner.
121
LWS and NearSync Replication
Nutanix offers NearSync with a telescopic schedule (time-based retention). When the
RPO is set to be ≤ 15 minutes and ≥ one minute, you have the option to save your
snapshots for a specified number of weeks or months. Multiple schedules cannot be
created with NearSync.
The following table represents the schedule to save recovery points for 1 month:
122
4.8. OPERATIONS
4.8.1. CAPACITY & RESOURCE PLANNING
After initial installation and migration of workloads to the platform, long term capacity
planning should be enabled to avoid running out of resources. Many of the features
discussed are integrated into Prism Pro.
Cluster capacity
Native runway calculations built into Prism Central will automatically calculate the
remaining capacity of the system as soon as the cluster Prism Element is brought
under management by Prism Central.
123
Expansion planning
When a new workload is identified to be onboarded, planning needs to occur if the new
workload size or requirements are outside of established patterns. Prism Central offers
a scenario simulation function that shows how the available capacity runway would
change if this new workload was accommodated. Utilizing this planning functionality
helps avoid unplanned capacity constraints.
Right-Sizing VMs
While general system capacity planning is useful, accurate and efficient system level
planning requires accurate sizing for individual workloads. Machine learning in Prism
Central provides anomaly detection for VMs when the workload crosses learned
thresholds.
124
In addition, based on a number of thresholds, the system will categorize VMs based
on their behavior. These categories include:
In addition, custom alert policies can be created that match VMs by conditions
When upgrading, AOS offers the choice of release trains to apply. Each of these is
denoted with a major.minor.maintenance.patch numbering scheme, for example:
5.10.7.1 or 5.11.1.1.
A release train is based on the major and minor components. There are two types
of release trains:
• Short Term Support (STS) releases which include new features and provide a regular
and frequent upgrade path. These releases are maintained for shorter durations.
125
• Long Term Support (LTS) releases which provide bug fixes for features that have been
generally available for a longer period of time. After features have been generally
available in an STS for some time, they are included in and LTS, which are maintained for
a longer duration.
Knowledge base article 5505 on the Nutanix Support Portal covers the differences
in greater detail.
Updates to an existing train are released on a regular basis and should be applied
on a standard cadence.
4.8.3. TESTING
Environments which seek to achieve high overall uptime greater than 99.9% should
build a pre-production environment to mimic production so that configuration changes
can be tested before being pushed into production.
126
OPS-006 MAINTAIN A PRE-PRODUCTION ENVIRONMENT
FOR TESTING ANY CHANGES NEEDED (FIRMWARE,
SOFTWARE, HARDWARE) PRIOR TO EXECUTING
THE CHANGE IN PRODUCTION.
4.8.4. MONITORING
Nutanix includes a variety of built-in, system-level monitoring functions. The relevant
metrics for built-in monitoring are automatically gathered and stored without user
intervention required.
Native cluster alerts can be sent from either the individual cluster or Prism Central.
When alerts are generated, in addition to raising the alert in Prism, the system can
generate an outbound message. The two options available for sending alerts are
SNMP and SMTP.
127
The following screen shows how to configure Prism to send alerts to a specified email
account via SMTP.
128
5. Conclusion
This document is intended to demonstrate valuable methods and practices that
organizations, both large and small, can use to implement Nutanix Solutions to solve their
IT and business problems. There is no one-size-fits-all solution for Nutanix
Hyperconverged Infrastructure, and the contents of this document are informational only
and intended to provide customers with best practices suggestions from which they can
elaborate and evolve their private, hybrid, and multi-cloud solutions.
For more information on any details of this document, please visit our website at
www.nutanix.com or reach out to our sales team. You may also contact one of our
many global support phone numbers listed on our website.
129
Appendix — Table of
Design Decisions
The following table summarizes all the design decisions described in this document.
130
REFERENCE SECTION DECISION NAME STATUS
131
NET-011 Networking Use a single br0 bridge with Complete
at least two of the fastest
uplinks of the same speed.
132
STR-04 Storage Do not mix nodes that Complete
contain NVMe SSDs in same
cluster with hybrid SSD/
HDD nodes.
133
VRT-008 Virtualization Configure das.ignoreInsuffi- Complete
cientHbDatastore if one
Nutanix container is pre-
sented to the ESXi hosts.
134
SEC-005 Security Do not use vSphere Cluster Complete
lockdown.
135
SEC-019 Security Use the least privileged Complete
access approach when
providing access. and Align
RBAC structure and usage
of default plus customer
roles according to the
company requirement.
136
OPS-006 Operations Maintain a pre-production Complete
environment for testing any
changes needed (firmware,
software, hardware) prior to
executing the change actual
production.
137
Nutanix makes infrastructure invisible, elevating IT to focus on
the applications and services that power their business. The
Nutanix Enterprise Cloud OS leverages web-scale engineering
and consumer-grade design to natively converge compute,
virtualization, and storage into a resilient, software-defined
solution with rich machine intelligence. The result is predictable
performance, cloud-like infrastructure consumption, robust
security, and seamless application mobility for a broad range
of enterprise applications. Learn more at www.nutanix.com
or follow us on Twitter @nutanix.