0% found this document useful (0 votes)
366 views

Private Cloud Reference Architecture Tech Note PDF

This technical note provides a reference architecture for implementing a Nutanix private cloud. It covers various design considerations and options for the physical, virtualization, management, business continuity, automation, and security layers. Key aspects addressed include choosing a hypervisor, cluster deployment models, scaling approach, control plane architecture, licensing, hardware, and technical implementation details. The document provides guidance for optimizing a private cloud built on Nutanix hyperconverged infrastructure.

Uploaded by

ninodjukic
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
366 views

Private Cloud Reference Architecture Tech Note PDF

This technical note provides a reference architecture for implementing a Nutanix private cloud. It covers various design considerations and options for the physical, virtualization, management, business continuity, automation, and security layers. Key aspects addressed include choosing a hypervisor, cluster deployment models, scaling approach, control plane architecture, licensing, hardware, and technical implementation details. The document provides guidance for optimizing a private cloud built on Nutanix hyperconverged infrastructure.

Uploaded by

ninodjukic
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 138

V1.

0 | FEBRUARY 2020 | TN-2104

NUTANIX TECH NOTE

Nutanix Private
Cloud Reference
Architecture
1. INTRODUCTION................................................................................................................................................................................................................................................5
1.1. Vision for Private, Hybrid and Multi-Cloud................................................................................................................................................................................5
1.2. Design Objectives.......................................................................................................................................................................................................................................................... 7
1.3. Audience..................................................................................................................................................................................................................................................................................... 9
1.4. Design Decisions............................................................................................................................................................................................................................................................ 9
1.5. How to Use This Design Guide...............................................................................................................................................................................................................10

2. ARCHITECTURE OVERVIEW....................................................................................................................................................................................... 11
2.1. Physical layer........................................................................................................................................................................................................................................................................12
2.1.1 Hyperconverged Infrastructure.........................................................................................................................................12
2.1.2 Hardware Choice.................................................................................................................................................................... 13
2.1.3 Compute..................................................................................................................................................................................... 13
2.1.4 Storage........................................................................................................................................................................................ 13
2.1.5 Networking................................................................................................................................................................................ 14
2.1.6 Cluster Design............................................................................................................................................................................................................................................................... 14
Management clusters.......................................................................................................................................................... 14
Workload Clusters................................................................................................................................................................. 14
Storage clusters...................................................................................................................................................................... 14
Edge/ROBO clusters............................................................................................................................................................15
2.2 Virtualization Layer...................................................................................................................................................................15
2.3 Management layer................................................................................................................................................................... 16
2.3.1 Automated IT Operations...................................................................................................................................................17
2.4 Business Continuity layer......................................................................................................................................................17
2.5 Automation layer...................................................................................................................................................................... 18
2.6 Security & Compliance layer...................................................................................................................................................................................................................... 18

3. HIGH-LEVEL DESIGN CONSIDERATIONS


3.1 Choosing a Hypervisor...........................................................................................................................................................21
3.2 Choosing a Cluster Deployment Model...................................................................................................................... 23
3.2.1 Separate Management Clusters.................................................................................................................................... 23
Should You Deploy Two Management Clusters?................................................................................................. 24
3.2.2 Will you Deploy Dedicated or Mixed Clusters?.................................................................................................... 24
3.2.4 Including Storage-Only Nodes In Clusters............................................................................................................. 26
3.3 Choosing How You Will Scale........................................................................................................................................... 27
3.4 Choosing a Control Plane................................................................................................................................................... 27
3.4.1 Block and Pod architecture............................................................................................................................................. 27
What is a pod?....................................................................................................................................................................... 27
Building block......................................................................................................................................................................... 27
Scaling the Block and Pod Architecture.................................................................................................................. 28
3.5 Choosing the Right Licensing and Support..............................................................................................................30
3.5.1 Software Licensing and Support Considerations................................................................................................ 31
3.5.2 Vendor Considerations..................................................................................................................................................... 32
3.5.3 Mixed configurations and node types...................................................................................................................... 33
3.6 Choosing the Right Hardware for Your Workload Types.................................................................................. 33
3.6.1 Model Types...........................................................................................................................................................................34
3.6.2 Performance Considerations......................................................................................................................................... 35

4. DETAILED TECHNICAL DESIGN


4.1. Required Software Versions.............................................................................................................................................. 37
4.2 Physical Layer........................................................................................................................................................................... 38
4.2.1 Choosing the Optimum Cluster Size.......................................................................................................................... 38
Maximums and Minimums................................................................................................................................................ 41
AHV Deployments............................................................................................................................................................ 41
VMware vSphere Deployments...................................................................................................................................................................................................... 42
4.2.2 Failure domain considerations.....................................................................................................................................43
The Management Plane....................................................................................................................................................43
The Nutanix Cluster as a Failure Domain.................................................................................................................44
Datacenter Rack and Server Room Failure Domains........................................................................................46
Datacenter Building............................................................................................................................................................ 47
4.2.3 Designing Workload Domains......................................................................................................................................48
Workload Domain Architecture....................................................................................................................................48
Workload Domain Rack Mapping................................................................................................................................48
Nutanix Cluster Layout......................................................................................................................................................48
Single Rack, Single Workload Domain..................................................................................................................48
Single Rack, Multiple Workload Domains............................................................................................................49
Multiple Racks, Single Workload Domain............................................................................................................49
Pros & Cons...........................................................................................................................................................................51
4.2.4 Networking.................................................................................................................................................................................................................................................................... 52
Physical Switches................................................................................................................................................................. 52
Network Topology...............................................................................................................................................................54
Broadcast Domains.............................................................................................................................................................60
Scaling the Network............................................................................................................................................................ 62
Host Networking...................................................................................................................................................................66
AHV Networking...............................................................................................................................................................66
Bridge...................................................................................................................................................................................66
Bond...................................................................................................................................................................................... 67
Uplink Load Balancing................................................................................................................................................68
vSphere Networking.......................................................................................................................................................70
4.2.5 Compute and Storage Design...................................................................................................................................... 73
Compute Design.................................................................................................................................................................... 74
AHV CPU and Memory Planning............................................................................................................................. 74
NUMA and VMs................................................................................................................................................................. 74
Virtual Non Uniform Memory Access (vNUMA) and User VMs.............................................................. 77
CVM CPU and Memory Considerations................................................................................................................ 79
Storage Design........................................................................................................................................................................ 79
Storage Pool........................................................................................................................................................................ 79
Container...............................................................................................................................................................................80
vDisk......................................................................................................................................................................................... 81
Nutanix Volumes................................................................................................................................................................ 81
Hybrid and All-Flash Nodes........................................................................................................................................ 82
Availability Domains and Fault Tolerance............................................................................................................86
Capacity Optimization...................................................................................................................................................88
Erasure Coding (EC-X)..................................................................................................................................................88
Compression.......................................................................................................................................................................89
Deduplication......................................................................................................................................................................90
4.3 Virtualization layer..................................................................................................................................................................90
4.3.1 Nutanix AHV............................................................................................................................................................................90
Introduction..............................................................................................................................................................................90
Control Plane............................................................................................................................................................................. 91
HA/ADS................................................................................................................................................................................. 92
AHV CPU Generation Compatibility....................................................................................................................... 92
VMware vSphere.................................................................................................................................................................... 93
Introduction......................................................................................................................................................................... 93
Control Plane...................................................................................................................................................................... 93
EVC........................................................................................................................................................................................... 93
HA/DRS.................................................................................................................................................................................94
SIOC......................................................................................................................................................................................... 97
4.4. Management Layer............................................................................................................................................................... 97
4.4.1 Control Plane.......................................................................................................................................................................... 97
Prism Central............................................................................................................................................................................ 97
Single-VM Prism Central Deployment...................................................................................................................98
Scale-Out Prism Central Deployment...................................................................................................................98
Image Templates...............................................................................................................................................................99
Prism Central Recommendations.................................................................................................................................99
4.4.2 VMware vCenter Server...................................................................................................................................................99
vCenter Server Recommendations............................................................................................................................100
4.4.3 Dependent Infrastructure.............................................................................................................................................100
NTP................................................................................................................................................................................................101
DNS...............................................................................................................................................................................................101
Active Directory.................................................................................................................................................................... 102
Logging Infrastructure...................................................................................................................................................... 102
4.5 Security Layer......................................................................................................................................................................... 103
4.5.1 Authentication.................................................................................................................................................................... 104
4.5.2 Certificates............................................................................................................................................................................ 104
4.5.3 Cluster Lockdown............................................................................................................................................................. 105
4.5.4 VMware Cluster Lockdown.......................................................................................................................................... 105
4.5.5 Hardening.............................................................................................................................................................................. 106
4.5.6 Internet Facing Services.................................................................................................................................................107
4.5.7 Logging....................................................................................................................................................................................107
4.5.8 Network Segmentation.................................................................................................................................................. 109
4.5.9 Role Based Access Control (RBAC).........................................................................................................................110
4.5.10 Data At Rest Encryption................................................................................................................................................113
4.5.11 Key Management Server.................................................................................................................................................114
4.5.12 Xi Beam...................................................................................................................................................................................114
4.6 Automation Layer...................................................................................................................................................................115
4.6.1 Upgrades...................................................................................................................................................................................115
Firmware.....................................................................................................................................................................................115
Software......................................................................................................................................................................................116
4.6.2 Life Cycle Manager (LCM)..............................................................................................................................................116
4.6.3 Foundation............................................................................................................................................................................. 117
4.7 Business Continuity.............................................................................................................................................................. 120
4.7.1 Backup and Recovery....................................................................................................................................................... 120
Local Backup.......................................................................................................................................................................... 120
Protection Domain and Consistency Groups....................................................................................................... 120
Snapshot Schedule and Retention Policy................................................................................................................ 121
Full Snapshots and Async Replication.................................................................................................................. 121
LWS and NearSync Replication...............................................................................................................................122
4.7.2 Additional Business Continuity Options.................................................................................................................122
4.8 Operations.................................................................................................................................................................................123
4.8.1 Capacity & Resource Planning.....................................................................................................................................123
Cluster capacity.....................................................................................................................................................................123
Expansion planning.............................................................................................................................................................124
Right-Sizing VMs...................................................................................................................................................................124
4.8.2 Upgrade methodology....................................................................................................................................................125
4.8.3 Testing......................................................................................................................................................................................126
4.8.4 Monitoring..............................................................................................................................................................................127

5. CONCLUSION................................................................................................................................................................................................................................................. 129
APPENDIX I - TABLE OF DESIGN DECISIONS..................................................................................................................130
1. Introduction
1.1 VISION FOR PRIVATE, HYBRID AND MULTI-CLOUD
The Nutanix vision for Cloud computing environments originally started in the
datacenter with innovative solutions for Private Cloud, which greatly reduced the
complexity and effort required to deploy and manage software-defined storage,
compute, and networking solutions. In recent years, Nutanix’s vision has expanded to
include innovative architectures for hybrid and multi-cloud that offer more choices and
alternatives to optimize costs.

The decision on which cloud to use for a specific use case often depends on a variety of
characteristics of multi-tiered applications. These can be summarized as follows:

• Use Case- web servers, application servers, databases, etc.


• Workload Type - virtual machines (VMs) and/or containers.
• Storage Format - block, object, file.
• Security Policies - Requirements that could rule out the option to host workloads on
particular public clouds.).
• Unique Requirements - that can only be met by the unique features of one specific
Public Cloud offering (e.g. serverless computing, analytics, PaaS, etc).

These characteristics ultimately shape their cloud architectures and drive the decision
where to host each application tier in a particular service.

Historically, the hosting architecture for multi-tiered applications didn’t change without a
labor intensive migration. Nutanix innovation arms IT Organizations with the flexibility to
dynamically place workloads. The Nutanix distributed architecture natively includes the
ability to provision the same application blueprints in four different cloud configurations:

CLOUD CONFIGURATION DESCRIPTION

Private Hosted on prem on AHV and/or VMware.

Hybrid Some hosted on prem on AHV and/or


VMware and some hosted in AWS, Azure,
or GCP.

Public (natively) Hosted in AWS, Azure, or GCP as a native


public cloud service offering.

Public (bare-metal) Hosted as a bare metal in AWS running


Nutanix Acropolis Operating System(AOS).

5
Nutanix further extends these cloud configurations by enabling the following four
architectural principles:

Application Mobility Multi-tiered applications can be aligned across


all cloud providers to maximize architectural
symmetry and promote application mobility.

Increased Standardization Use of the same hardened OS gold image and


business process orchestration (e.g. ITSM
integration, approvals, emails, showback/
chargeback, etc.) across all cloud configurations.

Policy Based Security Governance Hosted in AWS, Azure, or GCP as a native public
cloud service offering.

Policy Based Cost Governance Optimize cost by hosting workloads on the


platform that meets the requirements of the
service and has the lowest Total Cost of
Ownership (TCO).

All of these principles contribute to an advanced hybrid and multi-cloud strategy that
simplifies IT, stretches budgets, and accelerates time to value. This document provides
the architecture and design driven decisions to help our customers realize this strategic
vision.

In order to lay the foundation, the first release of this document will focus on the design
for Private Clouds based on Nutanix. Nutanix will actively work on extending the design
to include all relevant architectures to complete the above vision.

6
1.2 DESIGN OBJECTIVES
The objective of this document is to define, explore, and develop key design decisions
required when implementing private, hybrid, or multi-cloud solutions based on the
Nutanix platform. The objective can be further broken down as follows:

• To identify and enumerate the key design decisions that need to be documented in
order to support a robust design methodology and practice.
• To explore each design decision, evaluating key viable options, tools, and methods for
implementation and management so that organization can make informed decisions
relating to their specific design requirements.

Since simplicity is a key principle of all Nutanix products, some requirements may be
met through native platform architecture without the need for superfluous design
decisions, which are often required by competing platforms. The objective of this
document is therefore not to educate readers about Nutanix features and functions,
even though this may naturally occur as a side benefit. In these cases, this document
will document how these requirements are addressed by Nutanix-native features.

Design Objectives of this document for Private Cloud Implementations

DESIGN OBJECTIVES DESCRIPTION

Key objective The platform is capable of hosting and provisioning


workloads.

Workload Types Virtual Machines.

Scope of deployment Greenfield deployment that is adaptable to brownfield


deployments with workload migrations.

Cloud type Private cloud.

Number of regions Single region/site.

Availability This document caters to a minimum of 99.9%


availability. Nutanix can support higher levels by
adjusting the design decisions and configurations to
meet specific requirements. Customers commonly
experience 99.999% availability in practice with the
right architecture and support.

Disaster Recovery Support for multiple regions/sites with disaster


recovery will be provided in a subsequent version
of this document.

Minimum number of nodes 3 nodes.


per cluster

7
Maximum number of Unlimited number of workloads dependent on the pod
workloads based constructs described herein.

Types of Clusters Management, Workload, Storage Heavy, Edge Clusters.

Virtualization • Nutanix AHV and VMware ESXi hypervisors.


• Nutanix supports Microsoft Hyper-V, however it is not
considered for this design.

Management Plane Nutanix Prism Element, Nutanix Prism Central,


and VMware vCenter.

Scope • Sizing recommendations and methodology for


the amount of software-defined storage, compute
and networking.
• Physical implementation of storage, compute,
and networking.
• Logical configuration of clusters.
• Scalability methods and recommendations.
• Automation relating to cluster build, expansion,
and lifecycle management.
• Management and operations aspects such as capacity
management, reporting, upgrades, backup and
restore, monitoring and alerting, and logging.

Authentication, Microsoft Active Directory users and groups tied to


authorization, and Role Based Access Control.
access control

Security Policy & • Least privilege access policies will be implemented so


Enforcement that end-users and administrators will need to be
members of groups in order to be able to perform
secure aspects of their job function.
• Certificates are signed by a trusted certificate
authority (CA).
• Hardening of platform associated with hypervisor,
control plane, and data plane.
• Security policy definition and enforcement of drift
away from defined policies and checksum verification.
• Separation of traffic classes such as management and
application.

8
1.3 AUDIENCE
This document for Single Datacenter is intended for infrastructure architects,
infrastructure administrators, and infrastructure operators who want to deploy and
manage data centers based on Nutanix Enterprise Cloud, to address requirements for
availability, capacity, performance, scalability, business continuity, and disaster recovery.

1.4 DESIGN DECISIONS


This document makes recommendations and guides the reader to appropriate decisions
where possible. In cases where a design decision is required, this document provides
available options. The icon to the right is used to denote points in the document where
a customer decision is required.

Example:

NET-001 TITLE OF DESIGN DECISION

Justification Justification to support why the decision was made.

Implication Additional implications as a result of the design


decision.

Appendix I: Table of Design Decisions includes a list of all the design decisions described throughout
in this document.

9
1.5 HOW TO USE THIS DESIGN GUIDE
This document is subdivided into three major sections as follows:

1. Architectural Overview
Introduces key Architecture concepts that will be discussed throughout this design.

2. Design Considerations
Discusses key design considerations that will vary for each customer. Customers will be
required to make decisions that will influence the design and build of their end-solution.

3. Detailed Design
Identifies key design decisions and in most cases determines the optimal configuration
and decision that will be used for validating the design. For design decision, alternate
options may be discussed along with pro’s and con’s. For good reasons, Customers
may choose to deviate from the decisions made in this document. The decisions made
in this section are recommended by Nutanix, however they are by no means meant to
be the only method, and Nutanix recognizes that alternate decisions may be
appropriate depending on specific requirements.

4. Operations
Articulates high level operational considerations that influence the design of the
solution.
NOTE: This section does not provide detailed operational guides or runbooks.

10
2. Architecture Overview
This section describes the high-level Nutanix architecture, including major concepts
and design elements that anyone designing a Nutanix deployment should understand.
If you are already familiar with Nutanix hyperconverged infrastructure (HCI) and Nutanix
software, you can skip this section.

The diagram below shows the high-level architecture covered in this document.
This overview explains the elements of each layer. Later sections will explore the
design decisions necessary for each layer.

11
2.1. PHYSICAL LAYER
Because the Nutanix Enterprise Cloud architecture is based on hyperconverged infra-
structure, the physical layer is significantly different than it would be in a traditional
datacenter architecture. Understanding the differences will allow you to make the best
hardware choices for your Nutanix deployment.

2.1.1 HYPERCONVERGED INFRASTRUCTURE


Nutanix HCI converges the datacenter stack including compute, storage, storage net-
working, and virtualization, replacing the separate servers, storage systems, and storage
area networks (SANs) found in conventional datacenter architectures and reducing
complexity. Each node in a Nutanix cluster includes compute, memory, and storage, and
nodes are pooled into a cluster. The Nutanix Acropolis Operating System (AOS) software
running on each node pools storage across nodes and distributes operating functions
across all nodes in the cluster for performance, scalability, and resilience.

A Nutanix node runs an industry-standard hypervisor and the Nutanix Controller


VM (CVM). The Nutanix CVM provides the software intelligence for the platform
and is responsible for serving IO to running VMs.

12
2.1.2 HARDWARE CHOICE
Nutanix Enterprise Cloud provides significant choice when it comes to hardware
platform selection. Available options include:

• Nutanix NX appliances.
• OEM appliances from leading vendors such as Dell, Lenovo, HPE, IBM, and Fujitsu.
• Other third-party servers from a wide range of vendors.

The Nutanix hardware compatibility list (HCL) contains the most up-to-date
information on supported systems.

Hardware is available in a variety of chassis configurations from various vendors. Options


range from multi-node chassis for high density to single-node rackmount chassis.

Nutanix commonly refers to all of these chassis configurations as a block.

2.1.3 COMPUTE
Sizing systems to meet compute needs in a Nutanix environment is similar to other
architectures. However, it’s important to ensure that your design provides enough
compute (CPU/RAM) to support the CVM.

2.1.4 STORAGE
Nutanix nodes offer a range of storage configurations:

• Hybrid nodes combine flash SSDs for performance and HDDs for capacity.
• All-flash nodes utilize flash SSDs.
• NVMe nodes utilize NVMe SSDs.

Different node types can be mixed in the same cluster. More information can be provid-
ed in the document Product Mixing Restrictions.

For data resiliency, Nutanix uses replication factor (RF), maintaining 2 or 3 data copies.
This approach enables a Nutanix cluster to be self-healing in the event of a drive, node,
block, or rack failure. In a Nutanix cluster consisting of multiple blocks, RF can enable
block awareness. Data copies are distributed across blocks to protect against the failure
of an entire block. In configurations spanning multiple racks, RF can similarly provide
rack awareness with resilience to a rack outage. For more information on Nutanix data
resiliency, please refer to the Nutanix Bible.

13
Compression, deduplication and erasure coding (EC-X) can be enabled to increase data
efficiency for capacity saving purposes.

Data locality and intelligent tiering ensure that the data associated with a VM is
preferentially stored on that VM’s local node. Active data is stored on the fastest media,
delivering performance and eliminating the need for ongoing performance tuning.

2.1.5 NETWORKING
Fast, low-latency and highly available networking is a key element of this document. The
distributed storage architecture relies on the performance and resilience of the physical
network. A good design provides high performance while maintaining simplicity.

In the detailed design section of this document we address common network


topologies, selection of physical switches, and recommended connections between
hosts and the physical network.

2.1.6 CLUSTER DESIGN


A Nutanix cluster is the management boundary of the storage provided to a group
of workloads. A Nutanix deployment can be architected to support either (a) mixed
workloads in a single Nutanix cluster; or (b) dedicated clusters for each workload
type in a block and pod design.

For designs that choose dedicated clusters, separate workload domains and may
include the following cluster types:

Management clusters
Designed to run VMs that support datacenter management such as:
• Nutanix Prism Central.
• VMware vCenter.
• Active Directory.
• Other management workloads, such as DNS, DHCP, NTP, Syslog.

Management clusters reside in the management workload domain. In this document,


a management cluster occupies a separate rack or is distributed across multiple racks.

Workload clusters
Reside in a virtual infrastructure workload domain and run tenant virtual machines.
You can mix different types of compute clusters and provide separate compute pools
to address varying SLAs for availability and performance.

Storage clusters
Storage-only clusters provide dedicated data services to tenants. These are typically
deployed for use cases that are specifically focused on Object, Files, or Block level
storage.

14
Edge/ROBO clusters
Reside at an edge and/or ROBO deployment to run virtual machines or ROBO
workloads.

2.2. VIRTUALIZATION LAYER


The virtualization layer sits logically above the physical layer, controlling access to
compute, network, and storage resources. This document provides a choice of two
hypervisors: Nutanix AHV and VMware ESXi. Both are enterprise-grade hypervisors,
filling a similar set of requirements and use cases. Nutanix AHV is included at no
additional cost with every Nutanix node.

NOTE: Nutanix HCI supports Microsoft Hyper-V, however this document does not
include considerations for deployment of this hypervisor option.

15
2.3. MANAGEMENT LAYER
The management layer is a key differentiator for Nutanix and this document.

Nutanix Prism provides simplified end-to-end management for Nutanix environments.


Prism combines multiple aspects of datacenter management into a single
consumer-grade design that provides complete infrastructure and virtualization
management, operational insights, and troubleshooting. Prism largely eliminates the
need for separate management tools. All Prism functionality is also accessible via REST
API.

The Prism family consists of three products that extend core capabilities:

1. Prism Element
The core Nutanix management platform enables management and monitoring at
the cluster level for all infrastructure (compute, storage, networks) and virtualization.
Key functionality of Prism includes:
• Full VM, storage, and hypervisor management.
• Network visualization.
• Role-based access control (RBAC).
• Nutanix 1-click upgrades - Orchestrates and streamlines platform upgrades, keeping
track of the changes. Can upgrade all Nutanix software and firmware running in a
Nutanix environment plus the ESXi hypervisor.

2. Prism Central
Enables management and monitoring of multiple Nutanix Prism Element clusters
from a central interface.

3. Prism Pro
Adds advanced capabilities to the Prism platform, including performance anomaly
detection, capacity planning, custom dashboards, reporting, advanced search
capabilities, and task automation.

The Prism family of products are an integral part of a Nutanix cluster and do not require
separate infrastructure. Prism Central runs as a separate VM, or as a cluster of 3 VMs for
additional scale and resilience. More details on Nutanix management are provided later.
Nutanix deployments that use the AHV hypervisor can be fully managed by Prism.

Nutanix deployments that use VMware vSphere should also utilize VMware vCenter
Server. This is the centralized monitoring and resource management software for
VMware virtual infrastructure. It performs a number of tasks, including resource
provisioning and allocation, performance monitoring, workflow automation, and
user privilege management.

16
2.3.1 AUTOMATED IT OPERATIONS
Prism Pro allows administrators to automate routine operational tasks, reducing
administrator effort and time while increasing the quality of results. To provide this
automation, Nutanix X-Play enables “if-this-then that” (IFTT) features that allows admins
to create Playbooks that define automation actions that run when a particular trigger
occurs.

The most common type of trigger is alert-based, where a system-defined or user-


defined alert causes an action to occur. An alert could be something as simple as
crossing a designated CPU or memory limit. Other triggers can be manual; the
associated playbook does not take action until an admin explicitly tells it to. With a
manual trigger, an admin selects an entity such as a VM, and the specified playbook
executes against it. Manual triggers allow the admin to control when and where the
automation takes place.

2.4. BUSINESS CONTINUITY LAYER


Nutanix provides multiple ways to accomplish backup/restore and disaster recovery (DR):

1. Using Nutanix-native capabilities including space-efficient snapshots, replication


(synchronous, near-synchronous, and asynchronous options), and cloning..
2. Through integration with leading third-party data protection vendors that utilize
Nutanix APIs
3. By adding optional services and products:
• Nutanix Xi Leap provides cloud-based DR-as-a-service (DRaaS).
• Nutanix Mine provides secondary storage integration and works with leading
backup vendors.

This document describes the configuration and use of Nutanix-native capabilities.


Other solutions mentioned may be added to the completed design, but are beyond the
scope of this document.

For a list of supported 3rd party backup solutions see:


Nutanix Technology Alliance Partners

17
2.5. AUTOMATION LAYER
Automation and orchestration are increasingly recognized as critical to IT success. By
simplifying infrastructure management across the entire lifecycle, automating operations,
and enabling self-service, Nutanix helps you deploy datacenter infrastructure that
delivers a high degree of scalability, availability, and flexibility.

Nutanix improves efficiency with meaningful automation, self-service, and integration


with development pipelines.

• Flexible task automation.


Nutanix Prism Pro provides a code-free, visual approach to task automation, enabling
any administrator to build, maintain, and troubleshoot automations. Common admin
tasks, like adjusting resources allocated to a VM in response to a constraint, are easily
automated. Even the most complex, multi-step procedures can be turned into one-
click operations.

• Self-service with no loss of control.


Enterprise teams want self-service access to infrastructure and services to accelerate
time to market. With Nutanix Calm, you can create blueprints that model applications
and tasks and publish them to an internal marketplace or add them to a growing
collection of pre-integrated blueprints on the Nutanix Marketplace. Application owners
and developers can request IT services from the marketplace whenever needed.

• Simplified development.
Nutanix eliminates the complexity of test and development automation, allowing
developers and administrators to work more efficiently. Your team can deploy and
maintain a fully automated CI/CD pipeline with continuous application deployment
across on-premises and cloud locations.

DevOps is a way to standardize processes and improving communication and


collaboration between development and operations teams in an enterprise organization.
DevOps methodology is not discussed in this design, but the constructs and the logical
components included are elements that will help you develop/improve your DevOps
strategy.

2.6. SECURITY & COMPLIANCE LAYER


Designing for your security requirements is paramount to delivering robust solutions.
This is another area where this document is well differentiated from conventional
datacenter architectures.

18
The Nutanix security approach is not predicated on having hundreds of different knobs
that you must turn to achieve a secure environment. Nutanix takes a security-first
approach including a secure platform, extensive automation, and a robust partner
ecosystem. There are configuration options available if you need to add an extra layer
of security based on business and or technical requirements.

Nutanix enables you to maintain a continuous security baseline to meet regulatory


requirements more easily. Powerful security automation, called Security Configuration
Management Automation (SCMA), monitors the health of your storage and VMs,
automatically healing any deviations from this baseline.

Nutanix provides customers with the ability to evolve from point-in-time security
baseline checking to a continuous monitoring/self-remediating baseline to ensure all
CVM/AHV hosts in a cluster remain baseline compliant throughout the deployment
lifecycle. This new innovation checks all components of the documented security
baselines (STIGs), and if found to be non-compliant, sets it back to the supported
security settings without customer intervention.

In addition, data-at-rest encryption features and a built-in Key Management Server


(KMS) further add our robust security capabilities.

These valuable security features are discussed later in the Detailed Design section,
and you can find more information online in the Nutanix Bible.

19
Nutanix incorporates security into every step of its software development process, from
design and development to testing and hardening. Nutanix security significantly reduces
zero-day risks. With one-click automation and a self-healing security model, ensuring
ongoing security requires much less effort.

Nutanix provides the following international security standards:

• 508 Compliant
• FIPS 140-2 Level 1
• National Institute of Standards and Technology (NIST) 800-53
• TAA Compliant

20
3. High-Level Design
Considerations
A key feature of Nutanix Enterprise Cloud is choice. Nutanix customers have the flexibility
to choose their preferred hardware vendor, CPU architecture, hypervisor, and more. This
design is intended to help guide you to the best choices for your organizational needs.

There are a number of high-level design decisions that must be made before proceeding
to a detailed technical design. This section provides the information necessary to help you
make the following decisions:

• Choosing a hypervisor.
• Choosing a cluster deployment model.
• Choosing a hardware platform.

3.1. CHOOSING A HYPERVISOR


Nutanix supports a range of server virtualization options including: Nutanix AHV,
VMware vSphere, and Microsoft Hyper-V. This design covers the deployment of either
Nutanix AHV or VMware vSphere.

Nutanix AHV is included with AOS and delivers everything you’d expect from an
enterprise virtualization solution: high performance, flexible migrations, integrated
networking, security hardening, automated data protection and disaster recovery, and
rich analytics. With robust, integrated management features, AHV is a lean virtualization
solution.

A significant advantage of AHV as part of this design is the elimination of virtualization


as a separate management silo and all of the complexity that entails. When designing a
vSphere environment, for example, decisions must be made about the number of
vCenters, the type of deployment mechanism, how to enable HA, and so on. Extensive
ongoing training is often required for internal staff or consultants in order to effectively
design, deploy, and upgrade the environment. AHV avoids these challenges.

AHV enables:
• Checkbox high availability configuration (vs. complex percentage or slot size config)
• No virtual SCSI devices required (vs. manually configured multiple SCSI devices for
maximum performance).
• Distributed networking by default (vs. deciding between Standard or Distributed
Switch).

21
• Automatic CPU Masking (vs. manual in vSphere).
• No need for Host Profiles.
• Fewer design choices.
• Simple control plane lifecycle.
• Simplified 1-click upgrades.
• Single support path.

VMware vSphere is a proven virtualization platform used by many organizations and has
a robust ecosystem. It is a complex platform with many design choices and settings to
tune; it often requires the purchase of additional licenses.

When deciding which hypervisor to use in your deployment, the choice comes down to
which solution best meets your technical and business requirements and your budget.
Key areas to consider when choosing a hypervisor include:

• Operating system support (including legacy and current OSes).


• Third-party virtual appliance support.
• Security/hardening baseline.
• Integration with third party products (backup, software-defined networking, anti-virus,
security tools, etc.).
• Availability of automation tools.
• Staff skill set and training (architecture, administration, daily operations).
• Scalability of the solution.
• Licensing costs and model (AHV adds no licensing costs to a Nutanix deployment).
• Integration with the existing environment.
• Migration of existing VMs.
• Hypervisor and management plane technical features and performance.
• Simplicity or complexity of the solution.
• Features or products that support only one hypervisor or the other.
• Satisfaction with existing hypervisor platform(s).
• ROI/TCO of the full stack.
• Simplified support model (i.e. single support vendor vs multiple).
• Time to Deploy/Time to Value.

Freedom of choice is a key tenet of Nutanix. After considering these factors, it is likely
that one hypervisor platform stands out as the best option for your deployment. No
matter which hypervisor you choose, the solution is backed by world-class Nutanix
support and a full ecosystem of node types. (Hardware selection is discussed in the
section Platform Considerations).

VRT-001 CHOOSE NUTANIX AHV OR VMWARE ESXI AS THE


HYPERVISOR FOR YOUR DEPLOYMENT

Impact

Justification

22
3.2. CHOOSING A CLUSTER DEPLOYMENT MODEL
When designing Nutanix clusters for your deployment, there are several important
design considerations:

• Choosing whether to deploy a separate management cluster?


• Choosing whether your cluster(s) will run mixed workloads or be dedicated to a
single workload?

3.2.1 SEPARATE MANAGEMENT CLUSTERS


A separate management cluster is not a requirement of this design. If you are planning a
smaller-scale deployment, it doesn’t always make sense to design, purchase, and operate
a separate cluster for a limited set of management VMs.

However, the infrastructure applications and services that run on a management cluster
are critical. At a larger scale, there are several reasons to separate these management
workloads:

• Availability. Separating management workloads on their own cluster(s) makes it


easier to ensure they are always available.
• Security. you may wish to more strictly control access to management clusters and
workloads, including additional controls such as firewalls, dedicated networks, more
stringent role-based access (RBAC), and possibly others. While this can be accom
plished on mixed clusters, it is easier to monitor and manage these additional
security controls on a separate physical cluster.
• Operations. RBAC prevents those that are not authorized from interacting with
important management services. Physical separation prevents any accidental or
malicious actions that could compromise the availability or performance of
important infrastructure services.
• Performance. Performance is just as important for infrastructure services as any
other workload. Having a dedicated management cluster(s) simplifies troubleshooting
of performance issues and reduces the potential for hard to diagnose workload
conflicts. It eliminates the possibility that changes in the management space will
affect application workloads and vice versa. In a highly elastic cloud environment,
there may be workload expansions that occur via self-service or automated events.
Depending on whether resource control policies are in effect, this can cause resource
contention in a shared cluster.

NOTE: You do not need a separate management cluster for each use case or
environment unless you have strict security or compliance requirements that make it
necessary. This means that a single management cluster can support multiple use cases
such as: EUC deployments, private cloud, and general server virtualization environments.

23
Should You Deploy Two Management Clusters?
In large-scale deployments, the management cluster can be split to create separate
failure domains. With two management clusters, you can place redundant components
into each cluster, enabling a higher level of availability. Should there be an issue with
one of the management clusters, the other remains available to service requests.

PFM-001 MANAGEMENT CLUSTER ARCHITECTURE: DEPLOY


A SEPARATE MANAGEMENT CLUSTER OR SHARE
A CLUSTER WITH OTHER WORKLOADS. WHEN
CHOOSING A SEPARATE MANAGEMENT CLUSTER,
CONSIDER A REDUNDANT CONFIGURATION.

Impact

Justification

3.2.2 WILL YOU DEPLOY DEDICATED OR MIXED CLUSTERS?


The decision whether to mix workloads within a cluster or to dedicate a cluster for each
type of workload is usually a question of scale. For example, if you have 200 general
server VMs, a small Exchange deployment, and 10 average-sized database VMs, mixing
the workloads in a single cluster is common and can be easily managed. However, if any
or all of these workloads increase by 5-10x, the complexity of sizing and operating the
mixed environment goes up dramatically.

24
Operating large-scale mixed environments creates a number of unique challenges. You
have to decide whether you are willing to manage the challenges of mixed workloads
or dedicate clusters for each workload. Here are the main factors to consider:

• Performance and capacity. The resource demands of different applications can vary
widely. You need to understand the needs of each application when mixing work
loads since the chance for conflicts to occur is increased. There may also be wildly
different performance and capacity needs between applications which could require
different node configurations within a single cluster. Unless you are going to isolate
certain workloads to particular nodes, each node within a cluster needs to be able to
handle the average daily mix of applications that might be running on it.
• VM resource-sizing requirements. The CPU and memory sizing for general server
VMs, VDI VMs, business-critical application VMs, etc. can vary widely. While it’s fairly
easy to account for memory sizing, CPU sizing is more complex. Each of these
workloads consumes different amounts of CPU, and may require much different
levels of CPU overcommit, if any.
If you have large groups of VMs with widely different resource requirements, it’s
typically better to build clusters to contain more uniformly sized VMs. From a
hypervisor HA standpoint, you may require additional resources within a mixed
cluster to ensure the ability to failover. This can also increase the day 2 operational
support effort, since it may require manually tuning HA settings and increased
monitoring to ensure HA resources remain in compliance.
• Software licensing. The most common reason for dedicated clusters is software
licensing. There are a variety of reasons why using dedicated clusters make sense
from a licensing standpoint. Here are two common examples:
• Operating system licensing. Windows and Linux vendors may offer “all-you-can-
eat” license models for licensing at the host level. If you license all the nodes in a
cluster with Windows datacenter licensing it may be more cost effective to run
Linux VMs on this cluster because resources that could be used for
Windows-based instances may not be used.*
• Database licensing. Database licenses are frequently based on either CPU cores
or sockets. These licenses can be expensive, and you often have to license all
the nodes in a cluster to enable DB VMs to run on every node. Once again, you
probably don’t want to run other workloads on the cluster since that reduces
the return on your license investment.
In addition, you may want to run nodes hosting database workloads on different
hardware than nodes hosting general server VMs. For instance having fewer
CPU cores running at a higher clock speed may reduce your overall licensing
costs while still providing the necessary compute power.

*This is provided for informational purposes only, please refer to your licensing agreement with Microsoft for
more information.

25
• Security. In many projects, security constraints are an overriding factor. When it
comes to the security of mixed vs dedicated clusters, there are a few design consid-
erations that are important to consider as you decide whether logical or physical
separation is adequate to address your security requirements:
• Operations. RBAC is the primary means of controlling access and management
of infrastructure and VMs in a mixed cluster. Dedicated clusters prevent non-ap
proved parties from gaining any access whatsoever.
• Networking. A mixed cluster typically relies on separate VLANs and firewall
rules per workload to control access. A dedicated cluster would only have the
required networks presented to it for a single workload and would likely limit
who and what has network access. Both approaches can control and limit
access to cluster resources, but dedicated goes a step further by providing
complete physical network isolation for those that require it.

If you are able to address the performance/capacity, resource-sizing, licensing, and


security constraints discussed above, it is possible to successfully design and operate
clusters of any size to run mixed workloads. The key is to thoroughly understand the
resource and performance requirements of each workload and size with that knowledge.

RECOMMENDATION: At scale, it is our recommendation to use dedicated clusters to the


greatest extent possible for the reasons discussed above. Utilize mixed workload clusters
where it fits.

3.2.4 INCLUDING STORAGE-ONLY NODES IN CLUSTERS


Storage-only nodes contribute storage capacity and I/O performance within a cluster.
Storage only nodes are available for clusters running either AHV or ESXi. Storage-only
nodes can be any node type, but typically are configured with just enough CPU and
memory resources to run the CVM since there aren’t any application VMs running on
these nodes. These nodes are a member of the AOS cluster but are not visible to the
hypervisor cluster for non-storage functions, so the hypervisor won’t schedule other
VMs to run on them.

PFM-002 MIXED OR DEDICATED WORKLOAD PER CLUSTER

Impact

Justification

26
3.3. CHOOSING HOW YOU WILL SCALE
This document clearly lays out our opinionated design for deploying HCI clusters and
building a private cloud for any type of workload. A challenge in any design, whether
you’re starting with one cluster or dozens, is how to scale the environment to reach your
goal. To accomplish this you need a master architectural plan that creates a repeatable
process that can be followed by the organization. Defining this in advance removes
confusion and simplifies future deployment. A master architectural plan allows you to
track progress and ensure compliance with the design.

For this design, we recommend that an at-scale deployment should include:

• A control plane.
• A block and pod architecture.

3.4. CHOOSING A CONTROL PLANE


Within your design there should be a primary control plane where the majority of daily
operational tasks are performed. In this design:

• The primary control plane is Prism Central.


• VMware vCenter is also required for those using VMware ESXi.

Each of these control planes has a maximum size that dictates the number of VMs,
nodes, and clusters that can be managed by an instance.

3.4.1 BLOCK AND POD ARCHITECTURE


A repeatable architecture is needed to ensure safe and efficient scaling. This design uses
the pod and block architecture because it’s both familiar and easily consumed. We have
adapted it to support Nutanix deployments. This section explains the different parts of
the architecture and how they can be used to scale a deployment to any level.

What is a pod?
In this design, a pod is a group of resources that are managed by a single Prism Central
instance. The diagram below shows a single pod containing four building blocks. A pod
is not bound by physical location limitations; all of its resources could be at a single site
or at multiple sites. Examples of multiple sites include: a traditional multi-datacenter
design, a hybrid cloud design,or a ROBO architecture.

Building block
In this design, a build block is equivalent to a Nutanix cluster. Each of the clusters can
run a single dedicated workload or mixed workloads. There is no need for building
blocks to be uniform within your design.

27
RECOMMENDATION:
For each workload:
• Establish whether it will have a dedicated cluster or share a mixed cluster with
other applications.
• Define the maximum size of this type of building block. While building blocks for
different workloads do not need to be the same, they certainly can be.
• Determine the maximum size of a building block based on:
• The scale at which you will deploy workloads.
• The need for failure domains.
• Operational considerations such as upgrade timing.

These topics are discussed later in the Detailed Technical Design section of this
document.

Scaling the Block and Pod Architecture


You can scale an individual pod up to the maximums that Prism Central supports for
the AOS version you are deploying. For the AOS version specified in this document,
a scale out Prism Central deployment with large sized VMs can manage up to:

Prism Central Limits (assumes Scale-Out PC configuration):


• 25,000 VMs
• 200 clusters
• 1,000 nodes

28
If any of these limits are reached, a pod is considered full. For example with an EUC
deployment, the VM limit will likely be reached first since you typically have high VM
density resulting in large numbers of VMs with fewer nodes and clusters. A large
ROBO environment might hit the cluster count limit because you tend to have
many sites each with a small cluster and a few VMs.

At very large scale (e.g. thousands of nodes), it can make sense to have pods
dedicated to each workload, but for environments that have Nutanix deployed for
multiple workloads, a pod will typically contain multiple applications.

Once a pod reaches a scaling limit, start a new pod with at least one building block.
The new pod scales until it also reaches a scaling limit, and so on.

The building blocks within a pod scale in a similar fashion. A building block is started
and workloads are migrated onto it until it reaches its determined max size, and a new
building block is started. New building blocks can be as small as 3 nodes, the minimum
to start a Nutanix cluster, or any size up to the max size you’ve specified for that
building block.

The starting size for each building block and the increments for scaling them are
organizational decisions:
• For smaller or more agile organizations, starting small and scaling incrementally
often makes sense.
• Larger organizations may prefer to deploy a new building block fully populated and
migrate workloads onto it as schedules dictate.
• Although 3 nodes is the minimum size for a cluster, using 4 nodes provides a higher
level of redundancy during maintenance and failure conditions.
• For ROBO and edge use cases, the starting size can be as small as one, two, or
three nodes depending on requirements.

The diagram below illustrates a simple VDI building block example. With VDI it is easy
to think in terms of number of users and the node count in a cluster. In this example,
the building block is a 16-node cluster supporting 1,500 users. This works out to 100
users per node plus an additional node for HA. With the first building block full, a
request for an additional 500 users requires a new building block to be started.
This building block is then scaled up to its max size before starting a third.

29
By having well established and documented design decisions for pod size and building
block size, the architecture and operational teams are free to keep scaling without the
need to revisit decisions to satisfy each resource expansion request.

3.5. CHOOSING THE RIGHT LICENSING AND SUPPORT


Now that you’ve chosen a hypervisor, decided on mixed or dedicated workloads, and
established your block and pod architecture, you can make informed hardware
decisions. When evaluating the different Nutanix platform options available, there a few
key decision points:

• Choosing Your Software licensing and Support.


• Choosing Your Platform Vendor.

3.5.1 SOFTWARE LICENSING AND SUPPORT CONSIDERATIONS


Nutanix nodes are available in two different purchasing/licensing options: appliances or
software-only:

• Appliances are available directly from Nutanix or through our OEM relationships.
• Appliance based licensing is referred to as “life of device licensing”, meaning it’s
only applicable to the appliance it was purchased with.
• The manufacturer of the appliance takes all support calls for software and
hardware issues. For example, if you choose Dell appliances, Dell will take all
support calls and escalate to Nutanix for software support as needed. With
Nutanix NX appliance all support calls go directly to Nutanix.

30
• The Software Only option de-couples software and support licensing from the
underlying hardware. This enables:
• License portability. The same license can continue to be used when the
underlying hardware is changed, such as a hardware vendor change or a node
refresh. (note: licenses are portable for like-for-like hardware replacements. If
the hardware specification of the nodes change, then there might be the need
for additional licenses.).
• Deployment on additional supported hardware platforms. Hardware is per our
qualified list of platforms, which can be found on the Hardware Compatibility
List (HCL).
• Software support is direct from Nutanix while the server vendor provides
hardware support.
• Another type of Software Only Licensing is the Core-Licensing option, which is
best for those customers who prefer the benefits of the software-only model
but want hardware from a specific OEM server vendor. Core licensing enables
customers to purchase software only licenses and buy hardware from any of
the appliance vendors. For example, XC-Core utilizes Dell XC OEM appliances
but de-couples software and hardware support.

PFM-003 SELECT NUTANIX SOFTWARE LICENSING LEVELS

Impact

Justification

3.5.2 VENDOR CONSIDERATIONS


When it comes to selecting a hardware vendor, there are a number of factors to evaluate:

• Brand loyalty. This can be a strong factor in an IT buying decision. There may be
purchasing commitments or discounts in place at an organizational level driving this
loyalty, or you may simply have been happy with past experiences.

• Support Quality. The quality of the support experience can also be a factor in your
hardware evaluation. For hardware failures, makes sure the hardware vendor
responds quickly and can provide parts reliably within the contracted response time.
The overall support experience is important. A vendor should be easy to contact, be
responsive to requests, and provide resolution in a timely manner.

• Hardware Quality. The reliability and quality of hardware is of obvious importance.


Today’s servers are all very similar internally; they use many of the same components
with just a few proprietary components for each vendor. The reliability of the leading
server vendors is pretty similar so your decision may be contingent on your
organization’s past experiences.

31
• Operational Expereince. When it comes to the ongoing operational experience, what
does it take to carry out day 2 operations to support the lifecycle of the physical
server and the vendor toolset (if any). This includes: monitoring server health,
reporting and upgrading firmware, and monitoring for component issues and
failures. Virtually all of the server vendors offer tools for these activities and when
combined with the power of AOS and Prism, the experience is pretty similar.
Nutanix Lifecycle Manager (LCM) offers firmware reporting and management for all
Nutanix appliance options.

• Configuration Options. Physical configuration options may weigh strongly in


choosing among server vendors, and in choosing a model type for each workload.
Factors include:
• Number of sockets.
• CPU options.
• Number of storage bays.
• Network connectivity options.
• Storage media options.

While there is generally a level of parity between server vendors, a particular vendor
may offer something the others do not, or one vendor may offer the latest options
more quickly when new components are released. You may require or prefer a
specific type of network card or need nodes with a large number of storage bays.

Storage considerations, such as the number of SSD/NVMe options available, RDMA


capabilities, and support for large-capacity media may be important for specific
workload requirements.

• Physical form factor. This is another common decision point since it can affect the
amount of space consumed in a rack, the power draw, the number of network
connections, and the number of internal expansion slots. Availability of internal
expansion slots may limit the number and type of network cards that can be
deployed, and whether a node is capable of accepting GPU cards and the number
of GPU cards it’s capable of supporting. When it comes to different form factors,
there are:

• High-density chassis that offer either four or two physical nodes in 2U of rack
space. These are popular options for a variety of workloads that do not require
extensive internal expansion or a large number of storage bays.
• Standard rackmount servers, typically with a 1U or 2U chassis and one physical
server per chassis. These provide much wider capabilities in the number of
storage bays available, the number of internal expansion slots available, and may
also support more memory.

PFM-004 SELECT PHYSICAL NODE VENDOR

Impact

Justification

32
3.5.3 MIXED CONFIGURATIONS AND NODE TYPES
Nutanix clusters allow significant flexibility in terms of the node types and configurations
you can utilize in a single cluster. This allows clusters to be operated and expanded over
time without artificial constraints. A cluster can be expanded with different node
configurations to accommodate new workloads or when previous nodes are no longer
available.

Considerations include:

• Node models. Mixing node models within a cluster is a fairly regular occurrence.
While it’s possible to have the same CPU, memory and storage configuration in two
different node types, it’s not required in order to mix them in the same cluster.

• CPU configurations. Mixing nodes within a cluster with different CPU configurations
such as core count, clock speed, or CPU generation is supported. This can be to
address changing application requirements, inventory availability, financial con
straints, time of purchase, or other factors.

While there is no limit to the drift between configurations, it’s a commonly accepted
best practice to keep the core counts and memory configuration of nodes within a
cluster at similar levels. Using different CPU generations in the same cluster can limit
the feature set / functionality of newer CPUs. The lowest common denominator is the
level of the oldest CPU generation within a cluster. Having mostly uniformly configured
nodes in a cluster makes it easier for humans to double check the HA and capacity
planning recommendations of automated tools.

• Storage media. Having nodes with different storage configurations is also supported.
Variations at the storage layer can include varying the size or number of SSDs or
adding all-flash nodes to a hybrid cluster.

When introducing a storage configuration within a cluster that dramatically increases


the amount of storage, you must ensure there is ample failover capacity to rebuild
the largest node. For example, suppose the existing nodes in a cluster have 10TB of
capacity each and you want to add node(s) with 40TB of capacity. Initially, it’s best
to add a pair of these 40TB nodes. Subsequent large-capacity-node additions can
then be done one at a time.

3.6. CHOOSING THE RIGHT HARDWARE FOR


YOUR WORKLOAD TYPES
When it comes to sizing Nutanix clusters for different application workloads there are
many options. Most workloads can successfully run on any of the Nutanix and OEM
models available, but some models and configurations may be a better fit than others.

33
Once you know the requirements of your workload(s) you can use the Nutanix Sizer to
determine the best configuration for your cluster(s).

Nutanix Sizer is a web based application that is available to Employees, partners and
select customers. Sizer allows for the architect to input application and workload
requirements and the node and cluster configurations are automatically calculated.

Different hardware platforms offer different characteristics to address differences in


workloads. The following sections provide guidance on model selection, and
performance considerations.

3.6.1 MODEL TYPES


The various Nutanix and OEM appliance models and hardware compatibility list (HCL)
servers provide the flexibility to identify the right solutions to meet financial, space,
and performance requirements for different projects. There may be differences in
terms of the number and types of models available, but the same level of flexibility is
generally available from all vendors. Some vendors offer fewer models but allow
greater configuration flexibility.

Generally speaking, there are four different groups that servers fall into, translating to
different use cases:

• General workloads and EUC. These are by far the most popular nodes deployed in
Nutanix clusters. They can handle the vast majority of workloads including: general
server virtualization, business-critical applications, VDI, and most others. There are a
mix of form factors available.
• ROBO and Edge. These are similar to the general workload options, with the
exception that they may offer few options for CPU and storage as they are optimized
for these edge use cases.
• Storage dense. For workloads that require large amounts of storage capacity,
storage-dense nodes offer a larger number of storage bays and dense media
options, possibly with fewer CPU options. The physical configuration of these nodes
is optimized for workloads such as Nutanix Files, Nutanix Objects, or to be utilized
as storage-only nodes.
• High performance. The most demanding workloads and business-critical applications
(BCA) may require additional CPU resources and storage performance. For these
workloads there are nodes offering additional CPU configurations in terms of core
count and clock speed, as well as quad-socket configurations. It’s common for these
models to offer as many as 24 storage bays to allow for more flash devices or hard
drives for workloads that can utilize the added performance characteristics.

Each of the above model alternatives offer one or several of the available physical
form factors along with the density and performance characteristics discussed.

Nutanix software does not require any complex tuning or configuration to support the
different workloads, but there are plenty of hardware options to tailor your selections
to different use cases.

34
3.6.2 PERFORMANCE CONSIDERATIONS
Nutanix and AOS meet the performance demands of different workloads without
continuous performance tuning. The Nutanix HCI storage fabric is powerful and
intelligent enough to handle nearly any type of workload.

However, different cluster design and configuration options still yield performance
benefits. Selecting the appropriate node model and configuration to meet application
and solutions requirements is an important design decision. The primary design
considerations for performance are:

• Number of drives. The number of hard disk drives (HDDs) or flash devices in a node
can dramatically affect its performance characteristics. However, simply picking the
node with the most device bays won’t improve the performance of every workload.
• Write-heavy workloads benefit from additional storage devices to provide
performance and consistency. Other workload characteristics such as
read/write ratio and I/O size should also be considered.
• Workloads such as VDI typically have minimal capacity requirements but higher
IOPS demands. It’s common to utilize nodes with partially populated storage
bays and as few as 2 flash devices per node, providing the right amount of
storage capacity while still exceeding performance demands.

• All Flash. All-flash configurations are available from Nutanix, OEMs, and supported
third-party server vendors. All-flash clusters utilize only SSDs, and these configurations
provide higher IOPS and a more consistent I/O profile. While all flash configurations
have become common, they are not absolutely necessary for every workload and
use case.

• NVMe. There are a number of new technologies available now and coming soon that
offer additional performance capabilities. NVMe is the first of these to be widely
available and offers a number of benefits over SSD. New flash technology allows
NVMe devices to deliver higher levels of I/O with lower latencies.

• RDMA. To realize the full benefits of NVMe, nodes are typically configured with
remote direct memory access (RDMA). RDMA allows one node to write directly to
the memory of another node. This is done by allowing a VM running in the user
space to directly access a NIC, which avoids TCP and kernel overhead resulting in
CPU savings and performance gains.

• Size of flash tier. In hybrid configurations containing SSD and HDD devices, the bulk
of the performance comes from the flash tier. Therefore, it’s important to understand
the workload being deployed on a hybrid cluster. The data an application accesses
frequently is typically referred to as the working set. The flash tier in hybrid clusters
should be sized to meet or exceed the size of the working set for all of the applications
that will run on the cluster. There is no penalty for having too much flash in a cluster but
not having enough can result in inconsistent performance.

35
PFM-004 SELECT NODE MODEL(S) PER USE CASE

Impact

Justification

NOTE: As your organization works through the Detailed Technical Design elements in
the following section, be prepared to revisit your model decisions to fine-tune CPU,
memory, and storage configurations.

36
4. Detailed Technical
Design
With the necessary high-level design decisions made, including hypervisor, deployment
model, and hardware platform and models, you can now plan the technical design for
your Nutanix deployment.

This section provides technical guidelines for each layer of the design stack. Where
possible, we’ve organized the sections so that you don’t have to spend time reading
material that doesn’t apply. For instance, if you are not deploying VMware, you can skip all
sections that are applicable only to it.

37
4.1. REQUIRED SOFTWARE VERSIONS
This design assume that you will be running the following software versions:

NAME VERSION DESCRIPTION

Prism Central Latest version Use latest version available


(5.15 at the time of writing).

Nutanix AOS Latest LTS Use latest LTS available


(5.15 at the time of writing).

Hypervisor Options

Nutanix AHV Latest LTS Use latest LTS available


(5.15 at the time of writing).

VMware vSphere 6.7

VMware vCenter 6.7 Recommended for vSphere


deployments.

4.2. PHYSICAL LAYER


This section guides you through the process of designing all physical aspects
of your Nutanix deployment, including:

• Compute and Storage Design.


• Networking.
• Choosing cluster size.
• Failure domain considerations.
• Designing workload domains and cluster layout.

4.2.1 CHOOSING THE OPTIMUM CLUSTER SIZE


This section provides guidance for cluster sizing. When designing a Nutanix cluster it’s
important to consider more than just technical limits and recommendations. There are
other important considerations including: hardware vendor recommendations, security,
and operational requirements that may dictate the optimal cluster size.

38
The following table provides a comprehensive list of the design considerations
that pertain to cluster size:

NOTE: These considerations apply to cluster sizing. Additional information for


each area is provided in later sections.

AREA LIMITING FACTOR(S) CONSIDERATIONS

Operations/ • Maintenance Define/validate your maintenance


manageability window. window and make sure the cluster
upgrade process fits in the window.

Example:
Maintenance window: 12h.
Full single node upgrade: at least
45m (Hardware, FW, AOS, and
hypervisor). Time depends on
hardware vendor.
Maximum cluster size: 12 -15 nodes.

Security & • Security zones Collect all relevant security and


compliance within an compliance requirements.
organization or
business unit. Example:
Multiple security zones: Internet DMZ,
PROD, test & dev, inner-DMZ
Every security zone may dictate
different cluster sizes:

Internet-facing DMZ clusters usually


have a smaller number of nodes and
smaller number of workloads to
minimize impact of a security breach
or DDoS attack.

For example, Dev/Test clusters that


have lower criticality may also have
less strict change management
policies and can therefore have a
large numbers of hosts.

39
Vendor • Hypervisor Each product has limitations and
recommendations limitations. vendor recommendations. Make sure
and limitations • Management plane you do not cross boundaries set by
limitations. the vendor.
• Vendor See vendor limitations/
recommendations. recommendations table

Business continuity RPO Collect BC/DR requirements, RPO,


RTO RTO, backup and restore time window,
Backup window. backup system performance statistics.

Example:
• RPO 24h
• RTO 48h
Ensure you can recover/restore from
backup and restart workloads within
48h. A cluster where total storage
capacity exceeds technical capabilities
of the backup system could fail to
meet desired RTO.

Workload • Application Verify application architecture with


considerations architecture. the application team/vendor, including
• Application licensing. HA, DR, scale-in vs scale-out, and
• Application performance requirements.
criticality.
Consider licensing model for each
application and its implications.

Example #1:
Oracle or MS SQL licensing
Licensing models are based on
physical core count. Design clusters
for database performance and
capacity requirements to avoid cluster
oversizing and minimize license costs.

Example #2:
Application has its own HA or DR
If application can provide native HA
and/or DR, RPO/RTO considerations
described under Business Continuity
(above) may not apply.

40
Networking • Total available Available physical ports per rack and
network switch rack row is important when choosing
ports. cluster size and number of clusters.
• Available network
switch ports per Example:
rack. • 96 ports (10GbE) available per rack.
• 48 ports (1Gbps) available per rack.
• 2 x 10GbE uplinks per Nutanix host.
• 1 x 1Gbps uplink for Out-of-Band
management.

Maximum nodes per rack is 48 (total


capacity of the TOR switches).

Datacenter • Available server Power and cooling is one of the most


Facility rooms. important factors limiting Nutanix
• Total power and cluster size and node density. You
cooling. have to ensure you do not exceed
• Power and cooling any hard limits when designing your
per rack. cluster layout.
• Available total rack
units. When calculating power consumption
• Available rack units and thermal dissipation use maximum
per rack. values provided by vendor.
• Floor weight
capacity. Typical datacenter rack is 42U. Some
datacenters have racks up to 58U.

Maximums and Minimums

AHV Deployments
The following table shows the maximum limits for management plane software
components in AHV deployments:

MANAGEMENT MAX. # OF MAX # VM’S NOTES


SOFTWARE HOSTS

Nutanix Prism Up to 200 clusters or 1000 nodes Assumes Prism Central


Central or 25000 VMs. scale out is deployed.

41
The following table provides guidance regarding minimum and maximum number of
nodes supported in a single Nutanix AHV cluster

MIN, # OF MAX. # OF HYPERVISOR RECOMMENDATION


NODES HOSTS

1 No Limit. Nutanix Single and dual-node clusters


Cluster size Acropolis are for ROBO only.
based on a Hypervisor For the purpose of this design,
variety of (AHV). you can scale your clusters up
factors. to 32 nodes, but larger clusters
are possible.

VMware vSphere Deployments


The following table shows the maximum limits for management plane software
components in VMware deployments:

MANAGEMENT MAX. # OF MAX # VM’S NOTES


SOFTWARE HOSTS

Nutanix Prism Up to 200 clusters or 1000 nodes or Assumes Prism


Central 25000 VMs. Central Scale-Out.

VMware vSphere 2000 hosts 25000 powered on


vCenter 35000 registered.

The following table provides guidance regarding minimum and maximum number
of nodes supported in single VMware cluster

MIN, # OF MAX. # OF HYPERVISOR RECOMMENDATION


NODES NODES

2 64 VMware vSphere Dual-node clusters are for ROBO


ESXi workloads only. Maximum
number of nodes in single
Nutanix cluster with VMware
ESXi is limited by hypervisor
version. For more details refer to
official VMware documentation.

42
PFM-005 NUMBER, <<TYPE, AND SIZE?>> OF CLUSTERS

Impact

Justification

4.2.2 FAILURE DOMAIN CONSIDERATIONS


Failure domains are physical or logical parts of a computing environment or location
that is adversely affected when a device or service experiences an issue or outage.
The device or services that are affected can greatly affect the size of the failure domain
and its potential impact. For example, a router generally has a bigger failure domain
than a wireless access point since more endpoints rely on a single router than a single
access point. Identifying possible failure domains and keeping the size of failure domains
small or manageable where possible, reduces the chance of widespread disruption.
Building redundancy within and/or across failure domains is an important method to
help mitigate the risks of failure.

When designing a Nutanix deployment, you can take steps to mitigate risk for each
of the following failure domains:

• Drives
• Nutanix node
• Nutanix block
• Management plane
• Nutanix cluster.
• Datacenter rack and server room
• Datacenter

Nutanix clusters are resilient to a drive, node, block, or rack failure, which is enabled by
Redundancy Factor 2, the default. Redundancy Factor 3 can enable simultaneous drive,
node, block, or rack failures with the right architecture. After drive, node, block or rack
failure, a Nutanix cluster self-heals to reach the desired redundancy factor and rebuilds
resilience for additional subsequent failures.

NOTE: You can configure your Nutanix environment to be fault tolerant to node, block,
and rack failures. This is described later in the section: Data Redundancy and Resiliency.
Mitigating the risks of network failure domains is described in the section: Networking.

The Management Plane


One of the most important failure domains, and one that is often overlooked by
architects, is the management plane. The more workload domains managed by a
single management plane, the bigger the impact of a failure. When deploying the
management plane, consider the following risk mitigations to reduce the impact
of a failure.

43
AREA RISK MITIGATION

Availability • Design and deploy management plane to be highly


available.
• At a minimum, design to meet the availability.
requirements of the managed workload or service
with the highest uptime requirement.

Limit the impact • Confine workload domain to a single datacenter


or site.
• Confine workloads domain to a defined security zone.
• Ensure the API gateway is always available because
other 3rd party integrations may rely on it. (e.g. 3rd
party backup vendor integration.).

Access Control • Configure built-in RBAC to restrict access to


management platform resources.

NOTE: The section: Management Layer, provides more details on management


plane configuration.

DataCenter DataCenter

Management Plane Management Plane

Nutanix Cluster Nutanix Cluster Nutanix Cluster Nutanix Cluster Nutanix Cluster Nutanix Cluster

Failure Failure
Domain Domain

The Nutanix Cluster as a Failure Domain


When evaluating each Nutanix cluster as a failure domain, you have to consider the risks
and potential effects in terms of both the size of the cluster (as described above in
Choosing the Optimum Cluster Size) and the workloads running in the cluster, including
whether the cluster is running mixed or dedicated workloads.

44
Large clusters result in larger failure domains and potentially higher business impacts,
since they typically host considerably more workloads. To mitigate the risk of data
unavailability or service disruption, design for redundancy at a cluster level to protect
data and services, as described in the following table:

AREA RISK MITIGATION

Power • Redundant power from two different power supplies.

Networking • Redundant TOR switches.


• Redundant upstream connectivity to TOR switches
from each Nutanix node.

Cluster • Leverage Nutanix scale-out architecture and data


protection capabilities to replicate data to a second
Nutanix cluster in the event that one cluster fails.

Application • Deploy application across multiple clusters.

TOR TOR
switches switches

Failure Failure
Domain 1 Cluster 1 Domain 2 Cluster 2

45
Datacenter Rack and Server Room Failure Domains
When considering the datacenter rack failure domain, the primary mitigations are
redundant power from two different power supplies to each rack, redundant TOR
switches, and redundant network uplinks.

When considering the datacenter server room failure domain, it is critical to examine all
datacenter components to ensure they are not shared among multiple server rooms, as
described in the following table:

AREA RISK MITIGATION

Power • Redundant power.

Cooling • Independent cooling.

Server Room • Provide redundant server room within separate


firezone (in the same datacenter).

Application • Place application in multiple server rooms.


For example, active directory domain controllers
in separate server rooms.

DataCenter

Server Room Server Room

Nutanix Cluster Nutanix Cluster Nutanix Cluster Nutanix Cluster Nutanix Cluster Nutanix Cluster

Failure Failure
Domain 1 Domain 1

46
Datacenter Building
When considering the datacenter failure domain, it is critical to examine the redundancy
of all connections to the outside to ensure they are not shared, as described in the
following table:

AREA RISK MITIGATION

Power • Redundant power supply from different suppliers.

Cooling • Independent\redundant cooling system for each


datacenter room.

Network Connectivity • Redundant networking connectivity between


datacenter buildings from different providers.
• Redundant internet connectivity from different ISP.

Datacenter buildings / Where possible, utilize multiple datacenter buildings


server room and/or server rooms in separate datacenter buildings
so a single server room does not affect all of produc-
tion. Distribute multi-component services equally
across datacenter server rooms.

DataCenter

Server Room Server Room

Nutanix Cluster Nutanix Cluster Nutanix Cluster Nutanix Cluster Nutanix Cluster Nutanix Cluster

Failure Failure
Domain 1 Domain 1

47
4.2.3 DESIGNING WORKLOAD DOMAINS
This document uses workload domains as building blocks. Each workload domain
consists of a set of Nutanix nodes that are managed by the same management instance
(Nutanix Prism Central/VMware vCenter) and connected to the same network domain.

Workload Domain Architecture


A single workload domain can include different hardware and software combinations,
have homogeneous ESXi or AHV nodes, or mix ESXi (running VMs) and AHV (not
running VMs) in a single cluster. These can be configured to satisfy redundancy,
performance, and capacity requirements.

Normally, single workload domains occupy a single rack. However, you can aggregate
multiple workload domains in a single rack or span a single workload domain across
multiple racks.

Workload Domain Rack Mapping


Mapping workload domains to datacenter racks is not one to one. While the workload
domain is a repeatable building block, a rack is a unit of size. Workload domains and
datacenter racks can have different characteristics; you map workload domains to
datacenter racks according to the use case and rack specifications.

Nutanix Cluster Layout


Mapping workload domains to datacenter racks is not one to one. While the workload
domain is a repeatable building block, a rack is a unit of size. Workload domains and
datacenter racks can have different characteristics; you map workload domains to
datacenter racks according to the use case and rack specifications.

Single Rack, Single Workload Domain


One workload domain can occupy a single datacenter rack. All nodes from a workload
domain are connected to a single pair of TOR switches.

TOR TOR TOR TOR


switches switches switches switches

Compute Storage
Cluster Cluster

Mgmt Edge
Cluster Cluster

48
Single Rack, Single Workload Domain
One workload domain can occupy a single datacenter rack. All nodes from a workload
domain are connected to a single pair of TOR switches.

TOR
switches

Mgmt
Cluster

Edge
Cluster

Multiple Racks, Single Workload Domain


A single workload domain can span multiple racks. For example, to provide an additional
level of data protection (using rack awareness/fault tolerance, see the section:
Data Redundancy and Resiliency) or if a single workload domain is bigger than a
single rack can contain.

49
TOR TOR TOR TOR
switches switches switches switches

Up to
96 nodes

Compute
cluster

TOR TOR TOR TOR


switches switches switches switches

Edge
cluster

Mgmt
Cluster

Edge
Cluster

Mgnt
cluster

TOR TOR TOR TOR


switches switches switches switches

Up to
24 nodes

Storage
cluster

50
Pros & Cons
A single workload domain can span multiple racks. For example, to provide an
additional level of data protection (using rack awareness/fault tolerance, see the section:
Data Redundancy and Resiliency) or if a single workload domain is bigger than a
single rack can contain.

PROS CONS

Single Rack/ • Default redundancy level. • May use rack space


Single Workload Domain • Simple to consume. inefficiently.

Single Rack/ • Efficient space usage. • Increases impact of


Multiple Workload • Default redundancy level. rack failure.
Domains • Simple to consume.

Multiple Racks/ • Efficient space usage. • Complexity.


Single Workload Domain • Increases resiliency for
each workload domain.
• Decreases impact of
rack failure.

PFM-006 DECIDE WHICH WORKLOAD DOMAINS WILL SPAN


A SINGLE OR MULTIPLE RACKS

Impact

Justification

51
4.2.4 NETWORKING
Well-designed networks are critical to this document’s resilience and performance.
A Nutanix cluster can tolerate multiple simultaneous failures because it maintains a set
redundancy factor and offers features such as block and rack awareness. However, this
level of resilience requires a highly available, redundant network connecting a cluster’s
nodes. Protecting the cluster’s read and write storage capabilities also requires highly
available connectivity between nodes. Even with intelligent data placement, if network
connectivity between more than the allowed number of nodes breaks down, VMs on
the cluster could experience write failures and enter read-only mode.

To optimize I/O speed, Nutanix clusters choose to send each write to another node in
the cluster. As a result, a fully populated cluster sends storage replication traffic in a
full mesh, using network bandwidth between all Nutanix nodes. Because storage write
latency directly correlates to the network latency between Nutanix nodes, any increase
in network latency adds to storage write latency.

Physical Switches
A Nutanix environment should use datacenter switches designed to handle high-
bandwidth server and storage traffic at low latency. Do not use switches meant for
deployment at the campus access layer. Campus access switches may have 10 Gbps
ports like datacenter switches, but they are not usually designed to transport a large
amount of bidirectional storage replication traffic. Refer to the Nutanix physical
networking best practices guide for more information.

Datacenter switches should have the following characteristics:

• Line rate: Ensures that all ports can simultaneously achieve advertised throughput.
• Low latency: Minimizes port-to-port latency as measured in microseconds or
nanoseconds.
• Large per-port buffers: Accommodates speed mismatches from uplinks without
dropping frames.
• Nonblocking, with low or no oversubscription: Reduces chance of drops during peak
traffic periods.
• 10 Gbps or faster links for Nutanix CVM traffic: Only use 1 Gbps links for additional
user VM traffic or when 10 Gbps connections are not available, such as in a ROBO
deployment. Limit Nutanix clusters using 1 Gbps links to eight nodes maximum.

Switch manufacturers’ datasheets, specifications, and white papers can help identify
these characteristics. For example, a common datacenter switch datasheet may show
a per-port buffer of 1 MB, while an access layer or fabric extension device has a per-port
buffer of around 150 KB. During periods of high traffic, or when using links with a speed
mismatch (such as 40 Gbps uplinks to 10 Gbps edge ports), a smaller buffer can lead
to frame drops, increasing storage latency.

52
The following table is not exhaustive, but it includes some examples of model lines
that meet the above requirements. Models similar to the ones shown are also generally
good choices.

EXAMPLES OF RECOMMENDED SWITCH MODELS

Cisco Nexus 9300 Cisco Nexus 5000 Arista 7100

Mellanox SN2100 Dell S4810 Juniper QFX-5100

Examples of switches that do not meet these switch requirements are shown
in the following table:

SWITCH REASON NOT RECOMMENDED

Cisco Catalyst 9300 Campus access switch.

Cisco Nexus 3000 Shared buffers can be exhausted by bursty


storage replication traffic.

Cisco Nexus 2000 (Fabric Extender) Highly oversubscribed with small per-port
buffers.

Cisco Catalyst 3850 Stackable multigigabit switch.

10 Gbps expansion cards in a 1 Gbps 10 Gbps expansion cards provide uplink


access switch bandwidth for the switch, not server
connectivity.

Each Nutanix node also has an out-of-band connection for IPMI, iLO, iDRAC, or similar
management. Because out-of-band connections do not have the same latency or
throughput requirements of VM or storage networking, they can use an access layer
switch.

NOTE: Nutanix recommends an out-of-band management switch network separate


from the primary network to ensure management availability. Configure server-facing
ports in the management network as access ports and do not use VLAN trunking for
these ports. Access to this critical management network should be restricted.

53
NET-001 USE A LARGE BUFFER DATACENTER SWITCH AT
10GBPS OR FASTER.

Justification Achieves high performance for critical storage and VM


traffic converged on the same network fabric.

Implication Requires data center switches that may be more


expensive than campus or access layer switches.

Network Topology
In a greenfield environment, Nutanix recommends a leaf-spine network topology
because it is easy to scale, achieves high performance with low latency, and provides
resilience. A leaf-spine topology requires at least two spine switches and two leaf
switches. Every leaf connects to every spine using uplink ports.

There are no connections between the spine switches or between the leaf switches in a
conventional leaf-spine design. To form a Nutanix cluster it’s critical that all nodes are in
the same broadcast domain, thus in any leaf-spine design all leaf switches connecting
nodes in a cluster should carry the Nutanix VLAN. This can be accomplished with
physical connections between switches, using an overlay network, or using a pure layer
2 design. The following example shows a pure layer 2 design or overlay.

54
You may also choose a leaf-spine topology that uses links between switches to
guarantee layer 2 connectivity between Nutanix nodes.

Use uplinks that are a higher speed than the edge ports to reduce uplink
oversubscription. To increase uplink capacity, add spine switches or uplink ports as
needed.

The core-aggregation-access (or multi-tier) network design is a modular layout that


allows you to upgrade and scale layers independently. Nutanix clusters perform well
in the core-aggregation-access topology, but extra caution should be taken around
scaling the Nutanix cluster.

55
In pre-existing environments, you may not have full control over the network topology,
but your design should meet the following requirements:

Guidelines:
• Networks must be highly available and tolerate individual device failures.
• Ensure that each layer of the network topology can tolerate device failure.
• Avoid configurations or technologies that do not maintain system availability during
single device outages or upgrades such as stacked switches.
• Ensure that there are no more than three switches between any two Nutanix nodes in
the same cluster. Nutanix nodes send storage replication traffic to each other in a
distributed fashion over the top-of-rack network. One Nutanix node can therefore send
replication traffic to any other Nutanix node in the cluster.
• The network should provide low and predictable latency for this traffic. Leaf-spine
networks meet this requirement by design. For the core-aggregation-access model,
ensure that all nodes in a Nutanix cluster share the same aggregation layer to meet the
three-switch-hop rule.

Oversubscription occurs when an intermediate network device or link does not have
enough capacity to allow line rate communication between the systems connected to
it. For example, if a 10 Gbps link connects two switches and four hosts connect to each
switch at 10 Gbps, the connecting link is oversubscribed. Oversubscription is often
expressed as a ratio—in this case 4:1, as the environment could potentially attempt to
transmit 40 Gbps between the switches with only 10 Gbps available. Achieving a ratio
of 1:1 is not always feasible.

56
RECOMMENDATION:
• Keep the oversubscription ratio as small as possible based on budget and available
capacity.

In a typical deployment where Nutanix nodes connect to redundant top-of-rack switch-


es, storage replication traffic between CVMs traverses multiple devices.

RECOMMENDATION:
• To avoid packet loss due to link oversubscription, ensure that the switch uplinks consist
of multiple interfaces operating at a faster speed than the Nutanix host interfaces. For
example, for nodes connected at 10 Gbps, interswitch connections should consist of
multiple 10, 40, or 100 Gbps links.
• Connect all Nutanix nodes that form a cluster to the same switch fabric. Do not stretch
a single Nutanix cluster across multiple, disconnected switch fabrics. A switch fabric is
a single leaf-spine topology or all switches connected to the same switch aggregation
layer. Every Nutanix node in a cluster should therefore be in the same L2 broadcast
domain and share the same IP subnet.

See section “Security -> Network Segmentation” for service placement information and
design decisions.

RECOMMENDATION:
• Use native, or untagged, VLANs for the hypervisor host and CVM for ease of initial
configuration. Ensure that this untagged traffic is mapped into the CVM and
hypervisor VLAN only on the required switch ports to reduce risk.
• Use tagged VLANs for all guest VM traffic and add the required guest VM VLANs to all
connected switch ports for hosts in the Nutanix cluster.
• Limit guest VLANs for guest VM traffic to the smallest number of physical switches
and switch ports possible to reduce broadcast network traffic load.

NET-002 USE A LEAF-SPINE NETWORK TOPOLOGY FOR


NEW ENVIRONMENTS.

Justification Achieves high performance for critical storage and


VM traffic and is easy to scale.

Implication Requires more network connections between


switches and may require new network design.

57
NET-003 POPULATE EACH RACK WITH TWO 10GBE
OR FASTER TOR SWITCHES

Justification Simplifies the design, follows the leaf/spine model,


and provides high performance and high availability to
the network.

Implication Increases rack space requirements, costs, and the


number of leafs.

NET-004 AVOID SWITCH STACKING TO ENSURE NETWORK


AVAILABILITY DURING INDIVIDUAL DEVICE FAILURE.

Justification Critical network infrastructure must be redundant


between all CVMs in the same Nutanix cluster.

Implication More network devices may be required to provide


high availability and reduce single points of failure.

NET-005 ENSURE THAT THERE ARE NO MORE THAN THREE


SWITCHES BETWEEN ANY TWO NUTANIX NODES
IN THE SAME CLUSTER.

Justification Storage latency is directly related to network latency,


so the network distance between nodes in the same
cluster must be reduced.

Implication A Nutanix cluster cannot span multiple sites or switch


fabrics, instead use data replication technologies
instead to provide high availability between sites.

NET-006 REDUCE NETWORK OVERSUBSCRIPTION TO


ACHIEVE AS CLOSE TO A 1:1 RATIO AS POSSIBLE.

Justification Dropped network packets or a congested network


will immediately impact storage performance and
must be avoided in the design phase.

Implication Higher bandwidth paths must be created using more


uplink paths of faster speed between switches.

58
NET-007 CONFIGURE THE CVM AND HYPERVISOR VLAN
AS NATIVE, OR UNTAGGED ON SERVER FACING
SWITCH PORTS.

Justification Newly added nodes use untagged traffic for discovery


and will work out of the box, reducing manual server
configuration.

Implication Network switches must be configured to accept


untagged frames on ports facing Nutanix servers and
place the traffic into the CVM and hypervisor VLAN.

NET-008 USE TAGGED VLANS ON THE SWITCH PORTS


FOR ALL GUEST WORKLOADS.

Justification Workloads should be separated from each other and


from the CVM and hypervisor network using VLANs.

Implication Multiple VLANs and IP subnets are required.

59
Broadcast Domains
Performing layer 3 routing at the top of rack, creating a layer 3 and layer 2 domain
boundary within a single rack, is a growing network trend. Each rack is a different IP
subnet and a different layer 2 broadcast domain. This layer 3 design decreases the size
of the layer 2 broadcast domain to remove some common problems of sharing a large
broadcast domain among many racks of servers, but it can add complexity for applica-
tions that require layer 2 connectivity.

In contrast, the traditional layer 2 design shares a single broadcast domain or VLAN
among many racks of switches. In the layer 2 design, a switch and server in one rack
share the same VLANs as a switch and server in another rack. Routing between IP
subnets is performed either in the spine or in the aggregation layer. The endpoints
in the same switch fabric have layer 2 connectivity without going through a router, but
this can increase the number of endpoints that share a noisy broadcast domain.

Broadcast Domain 1
Subnet 1
Nutanix Cluster 1

Broadcast Domain 2
Subnet 2
Nutanix Cluster 2

Figure: Layer 2 Network Design

60
Nutanix recommends a traditional layer 2 network design to ensure that CVMs and
hosts can communicate in the same broadcast domain even if they are in separate racks.
The CVM and host must be in the same broadcast domain and IP subnet.

If a layer 3 network design is chosen, there are two possible ways to make this work
with a Nutanix deployment:
• Keep all Nutanix nodes in a cluster inside the same rack.
• Create an overlay network in the switch fabric that creates a virtual broadcast domain
that is shared across racks for the Nutanix CVMs and hosts.

Keeping all of the Nutanix nodes in a single cluster in a single rack has the limitation
of losing resilience against rack failure. Creating an overlay in the network fabric:

• Requires special configuration.


• May require specific network hardware to achieve.
• Increases configuration complexity.

Spreading a broadcast domain across racks virtually may also raise concerns that the
original intent of the layer 3 design, reducing broadcast domain scope, is now violated.

Broadcast Domain 1 Broadcast Domain 2 Broadcast Domain 3


Subnet 1 Subnet 2 Subnet 3
Nutanix Cluster 1 Nutanix Cluster 2 Nutanix Cluster 3

Figure: Layer 3 Network Design

61
Broadcast Domain 1 Broadcast Domain 2 Broadcast Domain 3
Subnet 1 Subnet 2 Subnet 3

Overlay Network X
Subnet X
Nutanix Cluster 1

Overlay Network Y
Subnet Y
Nutanix Cluster 2

Figure: Layer 3 Network Design with Overlay

NET-009 USE A LAYER 2 NETWORK DESIGN

Justification Nutanix CVMs and hypervisor hosts must be in the


same broadcast domain to function properly after
a CVM becomes unavailable.

Implication A broadcast domain failure or storm, can span


multiple racks.

Scaling the Network


To scale the leaf-spine network:
• Add two leaf switches and a management switch for every rack.
• Add spine switches as needed.

Because there are no more than three switch hops between any two Nutanix nodes in
this design, a Nutanix cluster can easily span multiple racks and still use the same switch
fabric. Use a network design that brings the layer 2 broadcast domain to every switch
that connects Nutanix nodes in the same cluster.

62
The following example shows a leaf-spine network using either an overlay or a pure
layer 2 design, with no connections are required between leaf switches.

Figure: Logical View of Leaf Spine Scale

You may choose a leaf-spine design where a pair of leaf switches are connected using
link aggregation. Discuss scale plans with the switch vendor when scaling beyond two
spine switches in this design.

63
Figure: Rack Level View of Leaf Spine Scale

Determining when to add new spine switches is a function of:


• How many ports the spine switch has.
• How many leaf switches (or racks) there are.
• How many connections are formed between each leaf and spine.

It’s also important to take into account the throughput capacity of the switch and
the throughput required between each rack.

In the previous example with 4 workload racks, assume you have 2 x 32-port spine
switches with a single 100 Gbps connection to every leaf. Also assume each TOR leaf
switch has 48 x 10 Gbps ports and 6 x 100 Gbps ports.

Spine switch port utilization is at 8 out of 32 ports on each spine, leaving capacity
to grow up to 16 racks at each spine (32 spine ports divided by 2 ports per rack).
Each leaf would be using 1 out of the 6 available uplinks and each leaf can support up to
48 connected servers. However, at this point these two spine switches will be processing
a lot of traffic and may be overloaded depending on their specifications.

Calculating oversubscription ratios, if you assume that each rack has 24 x 10Gbps
dual-connected servers with dual leaf switches, for a total of 240 Gbps bandwidth
capacity at each leaf, oversubscription is 2.4:1. This design is oversubscribed and
would not be recommended.

One way to reduce oversubscription is to add another 100Gbps uplink from each leaf to
the existing spine switches. However, that reduces the total number of supported racks
by half to 8, (32 spine ports divided by 4 ports per rack), so you must carefully consider
which leaf and spine switches are selected and what scale you would like to achieve.

64
Another way to grow spine capacity is to add a spine switch, which would then require
another uplink from every leaf, and greatly increase the total throughput of the switch
fabric without reducing the number of racks supported. If you added another spine
switch to bring the total to 3 spines, each leaf switch would add another 100Gbps uplink
(reducing oversubscription), but each spine could still support 16 racks (32 spine ports
divided by 2 ports per rack). With 6 uplinks on every leaf switch, you can support
designs with up to 6 spines in this example.

Scaling the multi-tier network design may require adding another aggregation and
access layer to the core. In this case, there would be more than three switch hops
between the two access layers.

• Guideline: Ensure that you add Nutanix nodes in separate aggregation and access
layers to separate clusters to keep the number of switch hops between nodes in the
same cluster to three or fewer.

In the following figure, Cluster 1 connects to one aggregation layer and Cluster 2
connects to another.

65
Host Networking

AHV Networking
AHV leverages Open vSwitch (OVS) for all VM networking. The virtual switch is referred
to as a bridge and uses a br prefix in the name. Read the AHV Networking Best
Practices Guide for in depth guidance on any settings not covered here.

Bridge
The default bridge, br0, behaves like a layer 2 learning switch that maintains a MAC
address table. Additional uplink interfaces for separate physical networks are added as
brN, where N is the new bridge number. To ensure that all VM traffic goes through the
same set of firewall rules and the same set of network functions, a bridge chain is
created with all microsegmentation and network function rules being placed in br.
microseg and br.nf respectively.

66
Traffic from VMs enters the bridge at brN.local. Next, traffic is multiplexed onto the
bridge chain, goes through the firewall and network function ruleset, and is then
demultiplexed on the correct brN for physical network forwarding. The reverse path is
followed for traffic flowing from the physical network to VMs.

All traffic between VMs must flow through the entire bridge chain, since brN is the only
bridge that performs switching. All other bridges simply apply rules or pass traffic up
and down the bridge chain.

Nutanix recommends:
• Use only the default bridge, br0, with the two fastest network adapters in the host.
• Converge the management, storage, and workload traffic on this single pair of uplink
adapters.
• Additional brN bridges should only be added when connection to a separate physical
network is required. For example, if the top-of-rack has two pairs of switches, one pair
for storage and management, and another pair for workload traffic, it makes sense to
create another bridge, br1 and place the workloads on this bridge.
• Other valid use cases include separate physical networks for iSCSI Volumes workloads,
backplane intracluster replication, or workloads like virtual firewalls that require access
to a physically separate network. In these cases the additional bridge is used to
connect to the separate top-of-rack network. If there is only a single top-of-rack
physical network, then a single bridge with VLAN separation is sufficient.

AHV also includes a Linux bridge called virbr0. The virbr0 Linux bridge carries
management traffic between the CVM and AHV host. All other storage, host, and
workload network traffic flows through the br0 OVS bridge, or additional brN bridges if
configured.

NOTE: Do not modify the configuration of any bridges inside the AHV host unless
following an official Nutanix guide.

Bond
Bonded ports aggregate the physical interfaces on the AHV host. By default, a bond
named br0-up is created in bridge br0. After the node imaging process, all interfaces are
placed within a single bond, which is a requirement for the foundation imaging process.
A bridge can have only a single bond.

Bonds allow for several load-balancing modes, including active-backup, balance-slb and
balance-tcp. Link Aggregation Control Protocol (LACP) can also be activated for a
bond for link aggregation. The “bond_mode” setting is not specified during installation
and therefore defaults to active-backup, which is the default configuration.

RECOMMENDATION: Ensure that each bond has at least two uplinks for redundancy.
The uplinks in each bond must all be the same speed.

67
Uplink Load Balancing
The following bond modes are available:
• active-backup: Default configuration which transmits all traffic over a single active
adapter. If the active adapter becomes unavailable, another adapter in the bond will
become active. Limits host throughput to the bandwidth of a single network adapter.
No additional switch configuration required.

• balance-slb: Distributes VM NICs across adapters in the bond and periodically


rebalances for even uplink utilization. Limits VM per-nic throughput to a single network
adapter, but allows utilization of multiple host physical interfaces for multiple VM NICs.
NOTE: Not recommended due to negative interaction with default IGMP snooping and
pruning network settings for multicast.

• balance-tcp with LACP: Distributes VM NIC TCP or UDP session across adapters in the
bond. Limits per-nic throughput to the maximum bond bandwidth (number of physical
uplink adapters * speed). Requires physical switch link aggregation. Used when LACP
negotiation is required by the datacenter team or throughput requirements exceed a
single NIC.

RECOMMENDATIONS:
• To keep network configuration simple, use the standard 1,500 byte MTU in the hosts,
CVMs, and workload VMs. Nutanix does not recommend jumbo frames unless
specifically required by high-performance Nutanix Volumes iSCSI workloads or specific
workload requirements.
• When switching from 1,500-byte frames to 9,000-byte frames, performance
improvements are generally not significant unless the workload uses the maximum
network bandwidth for read traffic.

For more information on when to use jumbo frames, visit the Volumes and AHV
Networking Best Practices guides.

NET-010 CONNECT AT LEAST ONE 10 GBE OR FASTER NIC


TO EACH TOR SWITCH

Justification Maintains high availability in the event of the loss of


one switch.

Implication Requires at least two TOR switches, which will impact


cost and rack space. Increases total bandwidth to each
host to 20 Gbps.

68
NET-011 USE A SINGLE BR0 BRIDGE WITH AT LEAST TWO OF
THE FASTEST UPLINKS OF THE SAME SPEED UNLESS
MULTIPLE PHYSICAL NETWORKS ARE REQUIRED.

Justification Simplifies the design, requiring fewer physical switches


and switch ports, while relying on high bandwidth of
10Gbps and faster adapters.

Implication All traffic is sent over a single virtual switch to a single


pair of TOR switches.

NET-012 USE VLANS TO SEPARATE LOGICAL NETWORKS

Justification Physical hosts have a limited number of network ports


and each port adds complexity. Traffic separation can
be logically accomplished without numerous physical
ports.

Implication Enables multiple networks to be used at once without


additional hardware, saving on costs, and simplifying
the design.

NET-013 USE ACTIVE-BACKUP UPLINK LOAD BALANCING

Justification Simplifies the design and does not need any additional
configuration.

Implication All traffic is transmitted over one adapter, limiting


bandwidth to at least 10 Gbps.

NET-014 USE STANDARD 1,500 BYTE MTU AND DO NOT USE


JUMBO FRAMES.

Justification Simplifies the design and does not need any additional
configuration. Reduces risk and complexity from
non-standard configuration that only increases perfor-
mance in certain use cases.

Implication All network frames are limited to the default 1,500 byte
maximum size for interoperability, potentially creating
more network overhead for high throughput write
workloads.

69
vSphere Networking
VMware vSphere networking follows many of the same design decisions as AHV
networking. The critical design choices for vSphere networking are covered here. See the
Nutanix vSphere Networking Best Practices Guide for more details.

Nutanix hosts with vSphere ESXi use two virtual switches (vSwitches), named
vSwitchNutanix and vSwitch0. vSwitchNutanix is the internal, standard vSwitch used for
management and storage traffic between the CVM and the hypervisor.

RECOMMENDATIONS:
• Do not modify vSwitchNutanix. vSwitch0 is also a standard vSwitch by default, used
for communication between CVMs as well as workload traffic.
• Nutanix recommends converting vSwitch0 to the distributed vSwitch following the
distributed vSwitch migration knowledge base article. Converting to the distributed
vSwitch allows central management of networking for all hosts, instead of host by host
networking configuration. The distributed vSwitch also adds capability for advanced
networking functions such as load based teaming, LACP, and traffic shaping.
• Connect at least two of the fastest adapters of the same speed into vSwitch0 and use
the “Route Based on Physical NIC Load” load balancing method to ensure traffic is
balanced between uplink adapters.
• Connect these adapters to two separate top-of-rack switches to ensure redundancy.
• Do not add more vSwitches unless a connection to another physical network is needed
to meet security or workload requirements.
• All CVM storage, hypervisor host, and workload traffic should flow through vSwitch0
using VLAN separate between the workload traffic and all other traffic.
• Use the default 1,500 byte frame size on all uplinks unless there is a specific
performance or application requirement that would justify 9,000 byte jumbo frames.

70
NET-015 USE VIRTUAL DISTRIBUTED SWITCH (VDS)

Implication Ability to configure port groups and other features at


the cluster switch level vs. individual nodes. Reliance on
vCenter for host and VM network configuration
management.

Justification Reduces the amount of configuration required in the


environment. Makes port group provisioning faster and
less error prone. Provides additional features such as
LACP and bandwidth shaping.

Important: Leave the internal vNetwork Standard


Switch used by ESXi host and CVM for internal
communication in place, including security settings.

NET-016 CONNECT AT LEAST ONE 10 GBE OR FASTER NIC


TO EACH TOP-OF-RACK SWITCH

Justification Maintains high availability in the event of the loss of one


switch.

Implication Requires at least two ToR switches, which will impact


cost and rack space. Increases total bandwidth to each
host to 20 Gbps.

NET-017 USE A SINGLE VSWITCH0 WITH AT LEAST TWO


OF THE FASTEST UPLINKS OF THE SAME SPEED

Impact All traffic is sent over a single virtual switch to a


single pair of top-of-rack switches.

Justification Simplifies the design, requiring fewer physical switches


and switch ports, while relying on high bandwidth of
10Gbps and faster adapters.

71
NET-018 USE ROUTE BASED ON PHYSICAL NIC LOAD
UPLINK LOAD BALANCING

Impact Requires the vSphere Distributed Switch to spread


VM NICs among uplinks to reduce the most utilized
physical network adapter.

Justification Achieves load balancing and use of both adapters


with no switch side configuration required.

NET-019 USE STANDARD 1,500 BYTE MTU AND DO NOT


USE JUMBO FRAMES.

Impact All network frames are limited to the default 1,500 byte
maximum size for interoperability, potentially creating
more network overhead for high throughput write
workloads.

Justification Simplifies the design and does not need any additional
configuration. Reduces risk and complexity from
non-standard configuration that only increases
performance in certain use cases.

72
4.2.5 COMPUTE AND STORAGE DESIGN
Nutanix HCI is a converged storage and compute solution which leverages local
hardware components (CPU, memory, and storage) and creates a distributed platform
for running workloads.

Each node runs an industry-standard hypervisor and the Nutanix CVM. The Nutanix
CVM provides the software intelligence for the platform. It is responsible for serving all
IO to VMs that run on the platform.

The following logical diagram illustrates the relationship of the components of


a Nutanix node:

73
The CVM controls the storage devices directly and creates a logical construct called a
storage pool of all disks from all nodes in the cluster. For AHV it uses PCI passthrough
and for ESXi it uses VMDirectPathIO.

Compute Design

AHV CPU and Memory Planning


AHV lets you configure vCPUs (similar to sockets) and cores per vCPU (similar to CPU
cores) for every VM. Except for the memory used by the CVM and the AHV software, all
physical memory can be allocated for use by VMs. AHV guarantees memory allocated
to each VM and doesn’t overcommit or swap which can impact performance.

The amount of memory required by an AHV host varies based on a number of factors.
The host uses from 2GB to 10GB of memory on hosts with 512GB memory.

NUMA and VMs


Processor sockets have integrated memory controllers. A VM has faster access to
memory attached to the physical CPU where it is running versus memory attached to
another CPU in the same host. This is called Non-Uniform Memory Access (NUMA).
As a result, the best practice is to create VMs that fit within the memory available to
the CPU where the VM is running whenever possible.

CMP-001 IF RUNNING A NON-NUMA-AWARE APPLICATION ON


A VM, CONFIGURE THE VM’S MEMORY AND VCPU
TO FIT WITHIN A NUMA NODE ON AN AHV HOST.

Justification This prevents the VM’s vCPU and memory access


from crossing NUMA boundaries.

Implication Avoiding crossing NUMA boundaries results in better,


more predictable performance.

74
For example, if a host has 2 CPU sockets with 8 cores each and 256GB memory,
the host has 2 NUMA nodes, each with:

• 8 CPU cores
• 128GB memory

75
For example, a VM that has 4 vCPUs and 64GB memory, will fit within a single NUMA
node on the host and achieve the best performance.

76
Virtual Non Uniform Memory Access (vNUMA) and User VMs
The primary purpose of vNUMA is to give large virtual machines or wide VMs (VMs
requiring more CPU or memory capacity than is available on a single NUMA node) the
best possible performance. vNUMA helps wide VMs create multiple vNUMA nodes.
Nutanix supports vNUMA with both AHV and ESXi. vNUMA requires NUMA aware
applications.

Each vNUMA node has virtual CPUs and virtual RAM. Pinning a vNUMA node to a
physical NUMA node ensures that virtual CPUs accessing virtual memory see the
expected NUMA behavior. Low-latency memory access in virtual hardware (within
vNUMA) matches low-latency access in physical hardware (within physical NUMA),
and high-latency accesses in virtual hardware (across vNUMA boundaries) match
high-latency accesses on physical hardware (across physical NUMA boundaries).

In AHV, administrators can configure vNUMA for a VM via the aCLI (Acropolis CLI) or
REST API; this configuration is VM-specific and defines the number of vNUMA nodes.
Memory and compute are divided in equal parts across each vNUMA node. Refer to
AHV admin guide for steps on how to do it.

In ESXi, vNUMA is automatically enabled for a VM with >8 vCPU.

Building on the previous example, if a host with 16 CPU cores and 256GB memory
has a VM with 12vCPUs and 192GB memory that is not vNUMA configured, the vCPU
and memory assignment will span NUMA boundaries.

77
To ensure the best performance for wide VMs like this, vNUMA must be configured.

vNUMA VM configurations require strict fit. With strict fit, for each VM virtual node,
memory must fit inside a physical NUMA node. Each physical NUMA node can provide
memory for any number of vNUMA nodes. If there is not enough memory within a
NUMA node, the VM does not power on. Strict fit is not required for CPU. To determine
how many vNUMA nodes to use for a user VM, follow application-specific configuration
recommendations provided by Nutanix.

78
CVM CPU and Memory Considerations
Optimal CVM vCPU values depend on the type of system and how many cores per
NUMA node are present in the system. The CVM is pinned to the first physical CPU of
the node. The following table gives an overview of minimum vCPU and memory values
for the CVM (as of AOS 5.15):

PLATFORM STORAGE OR TYPE MINIMUM MEMORY MINIMUM VCPU


(GB)

Platforms up to 80TB per node 32 8

Platforms up to 120TB per node 36 12

Platforms up to 120TB per node 36 12


deploying Nutanix Objects

For specific recommendations on CVM sizing relating to specific workloads


(e.g. Microsoft, Oracle, etc) please refer to Nutanix Solution Briefs and Best Practice
Guides for those workloads.

RECOMMENDATIONS:
Use Nutanix Foundation to set up a new cluster or node and configure the CVM(s).
Nutanix Foundation software automatically identifies the platform and model type.
Based on that information, Nutanix Foundation configures the appropriate default values
for CVM memory and vCPU.

Storage Design

Nutanix HCI combines highly dense storage and compute into a single platform with a
scale-out, shared-nothing architecture and no single points of failure. The AOS
Distributed Storage Fabric (DSF) appears to the hypervisor like any centralized storage
array, however all of the I/Os are handled locally to provide the highest performance.
Storage planning for this document, requires that you understand some of the high level
concepts for DSF:

Storage Pool
A group of physical storage devices within the cluster. This can include HDDs and both
NAND and NVMe SSDs. The storage pool spans multiple Nutanix nodes and expands
as the cluster expands. A default storage pool is created when a cluster is created.

79
Container
Logical segmentation of the Storage Pool is provided by containers, which contain a
group of VMs or files (vDisks). Configuration options and data management services are
configured at the container level. When a cluster is created, a default storage container
is configured for the cluster.

NOTE: When VMware ESXi is used, containers correspond to NFS datastores.

Container Guidelines:
A key design question is how many containers to configure in a cluster:

• Nutanix typically recommends minimizing the number of containers created.


For most use cases only 1 container is enough.
• Exceptions can be made if two or more applications running on the same cluster
require different data reduction and replication factor values. In this case, additional
containers can be created for those workloads.

Data reduction, including compression, dedupe, erasure coding and replication factor
(RF) values, is configured on a container basis and multiple containers need to be
created when different values are desired.

All Nutanix containers are thin provisioned by default. Thin provisioning is a widely
accepted technology that has been proven over time by multiple storage vendors,
including VMware. As the DSF presents containers to VMware vSphere hosts as NFS
datastores by default, all VMs are also thin provisioned by default. vDisks created within
AHV VMs are thinly provisioned by default. Nutanix also supports thick provisioned
vdisks for VMs in ESXi by using space reservations. Using thick provisioned vdisks might
impact data reduction values on containers since Nutanix reserves the thick provisioned
space and it cannot be oversubscribed.

STR-01 WHEN CREATING VDISKS IN ESXI, ALWAYS USE


THIN-PROVISIONED VDISKS.*

Justification There is no performance difference between


thin-provisioned and thick-provisioned disks.
Thin provisioning is more space efficient.

Implication

*Multi-writer VMDK requires thick provisioned vdisks.

80
vDisk
All storage management is VM-centric, and I/O is optimized at the vDisk level.
The software solution runs on nodes from a variety of manufacturers. A vDisk is any
file over 512KB stored in DSF, including vmdks and VM hard disks. vDisks are logically
composed of vBlocks.

RECOMMENDATIONS:
• Use multiple vDisks for an application versus a single large vDisk.

Using multiple vDisks allows for better overall performance due to the ability to leverage
multiple OS threads to do more parallel processing versus a single vDisk configuration.
Refer to the appropriate application solution guide for application-specific configuration
recommendations for how many vDisks to configure.

Nutanix Volumes
In addition to providing storage through the hypervisor, Nutanix also allows supported
operating systems, to access DSF storage capabilities directly using Nutanix Volumes.
Nutanix designed Volumes as a scale-out storage solution where every CVM in a cluster
can present storage volumes via iSCSI. This solution allows an individual application to
access the entire cluster, if needed, to scale out for performance.

Volumes automatically manages high availability to ensure upgrades or failures are non-
disruptive to applications. Storage allocation and assignment for Volumes is done with
volume groups (VGs). A VG is a collection of one or more disks (vDisks) in a Nutanix
storage container. These Volumes disks inherit the properties (replication factor,
compression, erasure coding, and so on) of the container they reside in. With Volumes,
vdisks in a VG are load balanced across all CVMs in the cluster by default.

In addition to connecting volume groups through iSCSI, AHV also supports direct
attachment of VGs to VMs.In this, the vdisks are presented to guest OS over the virtual
SCSI controller. The virtual SCSI controller leverages AHV Turbo and iSCSI under the
covers to connect to the Nutanix DSF. By default the vdisks in a VG directly attached
to a VM are hosted by the local CVM. Load balancing of vdisks on direct attached VGs
can be enabled via acli.

Load balancing of vDisks in a VG enables IO-intensive VMs to use the storage


capabilities of multiple CVMs. If load balancing is enabled on a VG, AHV communicates
directly with each CVM hosting a vDisk.

Each vDisk is served by a single CVM. Therefore, to use the storage capabilities of
multiple CVMs, create more than one vDisk for a file system and use OS-level striped
volumes to spread the workload. This improves performance and prevents storage
bottlenecks, but can impact your network bandwidth.

81
GUIDELINE: Core use-cases for Nutanix Volumes:
• Shared Disks
• Oracle RAC, Microsoft Failover Clustering, etc.

• Where execution contexts are ephemeral, and data is critical


• Containers, OpenStack, etc.
• Guest-initiated iSCSI which is required for MS Exchange on vSphere
(for Microsoft support).

• Bare-metal consumers
• Stand alone or clustered physical workload.

Hybrid and All-Flash Nodes


A Nutanix cluster can consist of hybrid nodes, all-flash nodes, or a combination.
Irrespective of the node type, each node has an OpLog that serves as a persistent write
buffer for bursty, random write traffic. The OpLog is similar to a filesystem journal.
An extent store provides persistent bulk storage for DSF. A hybrid node has a
combination of SSDs and HDDs with OpLog stored on the SSDs. All-Flash nodes are
composed of SSDs. If there is a combination of SATA and NVMe SSDs, OpLog is stored
on the fastest medium.

DSF automatically tiers data across the cluster to different classes of storage devices
using intelligent lifecycle management (ILM). For best performance, ILM makes sure the
most frequently used data is available in memory or in flash on the node local to the VM.
The default threshold where the algorithm will move data from the hot tier to the cold
tier is 75%. Data for down-migration is chosen based on last access time. In All-Flash
node configurations, the Extent Store consists only of SSDs and no tiering occurs.

RECOMMENDATIONS:
For hybrid configurations, Nutanix recommends:

• Sizing the SSD tier so that the active data set of application fits in 75% of usable
SSD capacity.
• Nutanix has a collector tool that can be run to determine the approximate
active data set size.
• Alternatively, use application-specific tools such as: MAP toolkit for MSSQL;
AWR reports and Nutanix scripts for Oracle.
• A general rule is to check how much data has been backed up during a month.
Assuming a 70%/30% R/W pattern (if the R/W patterns are unknown), multiply
the data you get from backups by 4, which would give an approximate value for
hot data. That amount should fit within 75% of the usable cluster capacity.

82
For All-Flash configurations there is no ILM, so there is no equivalent recommendation.

STR-02 WHEN SIZING A HYBRID CLUSTER, MAKE SURE TO


HAVE ENOUGH USABLE SSD CAPACITY TO CONTAIN
THE ACTIVE DATA SET OF RUNNING APPLICATIONS.

Justification Keeping active data on SSD ensures the application


achieves the best performance. ILM moves data
between tiers. The best-case scenario is active data
in the SSD tier all the time.

Implication More SSD capacity may need to be sized to keep


active data on the SSD tier.

RECOMMENDATIONS:
For hybrid configurations, Nutanix recommends:

• Use All-Flash node configurations for business-critical applications for consistent


performance. As mentioned before,
• All-Flash configurations can consist of all SATA SSDs or a combination of SATA
and NVMe.
• If the application requires very high IOPS with consistent low latency, there is an
option to use RDMA with NVMe. RDMA requires RoCEv2 to be configured on the
network fabric; Nutanix makes this setup very easy through Prism.

83
STR-03 DO NOT MIX NODE TYPES FROM DIFFERENT
VENDORS IN THE SAME CLUSTER.

Justification Mixing node types from different vendors is


unsupported.

Implication Nodes from different vendors can be setup in separate


clusters and all managed centrally from Prism Central.

STR-04 DO NOT MIX NODES THAT CONTAIN NVME SSDS IN


THE SAME CLUSTER WITH HYBRID SSD/HDD NODES.

Justification The performance of NVMe SSD nodes is significantly


higher compared to hybrid nodes and mixing them
can potentially create performance issues.

Implication NVMe Nodes can be setup in their own clusters or


added to clusters with All SSD nodes and managed
centrally from Prism Central.

84
NODE SELECTION RECOMMENDATIONS:
• If your high-level design decision is to use hybrid nodes, you need to decide the
number of HDDs and SSDs per node.
• For applications that require high performance like DBs and Tier-1 applications,
more drives per node is better.
• OpLog will be spread across up to 8 SSDs, so more SSDs results in overall better
and more consistent performance. If NVMe devices are present, OpLog is placed
on them upto 8 NVMe SSDs.
• More SSDs provides more usable space for tiering data.
• For All-Flash nodes, more SSDs provide better performance by making read and
write access more parallel, especially for non-optimal workload patterns.
• Use Nutanix Sizer to size usable capacity and obtain node recommendations based
on your workload use case.
• Also refer to specific application solution guides and/or best practice guides.

STR-05 A MINIMUM 2:1 HDD TO SSD RATIO IS REQUIRED


FOR HYBRID CLUSTERS*.

Justification This ensures there is enough bandwidth in the slower


storage tier to absorb down migrations triggered by
ILM.

Implication If this rule is not followed, it may result in performance


degradation during down migrations.

*4 drive slot and 10 drive slot systems from Dell, Lenovo, and Fujitsu can have 2+2
and 4+6 SSD+HDD configurations respectively.

STR-06 SIZE FOR N+1 NODE REDUNDANCY FOR STORAGE


AND COMPUTE WHEN SIZING. FOR MISSION
CRITICAL WORKLOADS THAT NEED HIGHER SLAS,
USE N+2 NODE REDUNDANCY.

Justification Having the storage and compute capacity of an


additional node (or two) ensures there is enough
storage and compute capacity available for VMs to
re-start when a node failure occurs.

Implication Additional storage and compute capacity is required


for N+1 or N+2 redundancy.

85
Availability Domains and Fault Tolerance
Availability domains are used to determine component and data placement.
Nutanix availability domains are:
• Disk Awareness (always).
• Node Awareness (always).
• Block Awareness (optional).
• Requires a minimum of 3 blocks for FT1 and 5 for FT2, where a block contains
either 1,2 or 4 nodes. With Erasure Coding, minimums are 4 blocks for FT1 and
6 blocks for FT2.
• Rack Awareness(optional).
• Requires a minimum of 3 racks for FT1 and 5 for FT2, and you must define what
constitutes a rack and where your blocks are placed. With Erasure Coding,
minimums are 4 racks for FT1 and 6 blocks for FT2.

Closely tied to the concepts of awareness and RF is cluster resilience, defined by


the redundancy factor or Fault Tolerance (FT). FT is measured for different entities,
including:
• Data
• Metadata
• Configuration data
• Free space

FT=1: A cluster can lose any one component and operate normally.
FT=2: A cluster can lose any two components and operate normally.

Depending on the defined awareness level and FT value, data and metadata are
replicated to appropriate locations within a cluster to maintain availability.

86
The following table gives an overview of Data awareness and FT levels supported.

DESIRED FT LEVEL MIN. UNITS* SIMULTANEOUS


AWARENESS TYPE FAILURE
TOLERANCE

Node 1 3 nodes 1 node

Node 2 5 nodes 2 nodes

Block 1 3 blocks 1 block

Block 2 5 blocks 2 blocks

Rack 1 3 racks 1 rack

Rack 2 4 racks 2 rack

*The minimum units increase by 1 if EC-X is enabled.

FT GUIDELINES:
• FT=2 implies RF=3 for data and RF=5 for metadata by default.
• Use this setting when you need to withstand two simultaneous failures and have
cluster and VM data still be available.
• FT=1 implies RF=2 for data and RF=3 for metadata by default.
• Use this setting when the application is fine with having only 2 copies of data. This
will keep the cluster and VM data available after one failure.
• Another option is to set FT=2 for a cluster with specific containers set to RF=2.
• With this configuration, data is RF=2, but metadata is RF=5.
• For containers that are RF=2, if there are 2 simultaneous failures there is a
possibility that VM data becomes unavailable, but the cluster remains up.
• This is different than the FT=1, RF=2 case where two simultaneous failures will result
in a cluster being down.

STR-07 USE FT=2 AND RF=3 FOR WORKLOADS AND


CLUSTERS THAT NEED HIGHER SLAS OR FOR
CLUSTER SIZES >32.

Justification Using FT=2 and RF=3 provides additional resilience


and capability to survive 2 simultaneous failures.

Implication Additional storage space is required for FT=2 and


RF=3, reducing usable capacity.

87
Capacity Optimization
Nutanix provides different ways to optimize storage capacity that are intelligent and
adaptive to workloads characteristics. All optimizations are performed at the container
level, so different containers can use different settings.

Erasure Coding (EC-X)


To provide a balance between availability and the amount of storage required, DSF
provides the ability to encode data using erasure codes (EC). Like RAID (levels 4, 5, 6,
etc.) where parity is calculated, EC encodes a strip of data blocks across different nodes
and calculates parity. In the event of a host and/or disk failure, the parity data is used to
calculate any missing data blocks (decoding). In the case of DSF, the data block must
be on a different node and belong to a different vDisk. EC is a post-process operation
and is done on write cold data (Data that hasn’t been overwritten in more than 7 days).
The number of data and parity blocks in a strip is chosen by the system based on
number of nodes and configured failures to tolerate.

The following table summarizes the expected overhead for EC-X vs standard RF2/RF3
overhead:

FT1 (RF2 EQUIVALENT) FT2 (RF3 EQUIVALENT)


3X STORAGE OVERHEAD

# of nodes EC strip size 3 nodes 1 node

(data/parity) EC overhead EC strip size 2 nodes

(data/parity) EC overhead 3 blocks 1 block

4 2/1 1.5X N/A N/A

5 3/1 1.33X N/A N/A

6 4/1 1.25X 2/2 2X

7 4/1 1.25X 3/2 1.6X

8+ 4/1 1.25X 4/2 1.5X

EC-X GUIDELINES:
EC-X provides the same level of data protection while increasing usable storage
capacity.

• Turn on EC-X for non-mission-critical workloads and workloads that have a significant
amount of write cold data, since erasure coding works on write cold data and provides
more usable storage.
• For more information refer to specific application guides.

88
Compression
DSF provides both inline and post-process data compression. Irrespective of inline or
post-process compression, write data coming into oplog that is >4k and shows good
compression, will be written compressed in OpLog.

• Inline: Compresses sequential streams of data or large size I/Os (>64K) when writing
to the Extent Store.
• Post-Process: Compresses data after it is drained from OpLog to the Extent Store
after compression delay is met during the next curator scan.

Nutanix uses the LZ4 and LZ4HC compression algorithms. LZ4 compression is used to
compress normal data, providing a balance between compression ratio and
performance. LZ4HC compression is used to compress cold data to improve the
compression ratio. Cold data is characterized as:

• Regular data: No access for 3 days.


• Immutable data: No access for 1 day.

COMPRESSION GUIDELINES:
Compression provides on-disk space savings for applications such as databases,
and results in a lower number of writes being written to storage.

• Turn ON inline compression for all containers.

STR-08 ENABLE IINLINE COMPRESSION FOR ALL


CONTAINERS.

Justification Since inline compression operates only on sequential


data when writing to extent store, enabling it provides
immediate space efficiency.Write data coming into
oplog is already compressed.

Implication Slightly higher CPU consumption for compressing/


decompressing data.

89
Deduplication
When enabled, DSF does capacity-tier and performance-tier deduplication. Data is
fingerprinted on ingest using a SHA-1 hash that is stored as metadata. When duplicate
data is detected based on multiple copies with the same fingerprint, a background
process removes the duplicates. When deduplicated data is read, it is placed in a unified
cache, and any subsequent requests for data with the same fingerprint are satisfied
directly from cache.

DEDUPLICATION GUIDELINES:
Deduplication is recommended to be used for full clones, P2V migrations and
persistent desktops.

STR-09 ENABLE/DISABLE DEDUPLICATION*

Justification Increases effective capacity of caching layer for


dedup-able data like VDI clones.

Implication Slightly higher CPU consumption for running


deduplication processes.

* Refer to specific application solution guides for determining whether to enable


deduplication or not.

4.3. VIRTUALIZATION LAYER


This section describes technical design considerations for design using either Nutanix AHV
or VMware ESXi.

4.3.1 NUTANIX AHV

Introduction
The Nutanix hypervisor, AHV, is an attractive alternative hypervisor that streamlines
operations and lowers overall costs. Built-in to Nutanix Enterprise Cloud and included at
no additional cost in the AOS license, AHV delivers virtualization capabilities for the most
demanding workloads. It provides an open platform for server virtualization, network
virtualization, security, and application mobility. When combined with comprehensive
operational insights and virtualization management from Nutanix Prism and Prism
Central, AHV provides a complete datacenter solution.

90
Control Plane
The control plane for managing the Nutanix core infrastructure and AHV are
provided by:

• Prism Element (PE)


• Localized cluster manager responsible for local cluster management and
operations. Every Nutanix Cluster includes Prism Element built-in.
• 1-to-1 cluster manager
• Prism Central
• Multi-cluster manager responsible for managing multiple Nutanix Clusters to
provide a single, centralized management interface. Prism Central is an optional
software appliance (VM).
• 1-to-many cluster manager

VRT-001 DEPLOY SCALE-OUT PRISM CENTRAL FOR


ENHANCED CLUSTER MANAGEMENT.

Impact Requires some additional cluster resources to run


the necessary VMs.

Justification Increases the total number of managed objects,


increases management plane availability.

91
HA/ADS
Nutanix AHV has built-in VM high availability (VM-HA) and a resource contention
avoidance engine called Acropolis Dynamic Scheduling (ADS). ADS is always on and
does not require any manual tuning. VM-HA can be enabled and disabled via a simple
Prism checkbox. There are two main levels of VM-HA for AHV:

• Best effort. In this HA configuration, no node or memory reservations are required in


the cluster. In case of a failure, VMs are moved to other nodes based on the resources/
memory available on each node. This is not a preferred method of HA, since if no
resources are available in the cluster after a failure some VMs may not be powered-on.
Best effort is the default configuration.
• Most applicable to non-production environments.
• Guarantee. In this HA configuration, some memory is reserved on each node in the
cluster to enable failover of virtual machines from a failed node. The Acropolis service
in the cluster calculates the memory to be reserved based on virtual machine memory
configuration. All nodes are marked as schedulable with resources available for running
VMs. Recommended for production environments. Enable this mode by checking the
Enable HA Reservation box in Prism settings.

VM-HA considers memory when calculating available resources throughout the cluster
for starting VMs. VM-HA respects Acropolis Distributed Scheduler (ADS) VM-host
affinity rules, but it may not honor VM-VM anti-affinity rules, depending on the scenario.

VRT-002 USE VM-HA GUARANTEE

Impact Reserves enough cluster resources such that an entire


host’s worth of VMs can be successfully restarted
(RF=2), or two hosts (RF=3). RF=3 must be manually
configured, since it is not the default.

Justification Ensuring VM high availability is important in production


environments. Guarantee should be used in production
environments. “Best effort” (default) is optimal in test/
dev environments.

For additional information, see: TN-2132. Virtual Machine High Availability


(login required)

AHV CPU Generation Compatibility


Similar to VMware’s Enhanced vMotion Capability (EVC) which allows VMs to move
between different processor generations, AHV will determine the lowest processor
generation in the cluster and constrain all QEMU domains to that level. This allows mixing
of processor generations within an AHV cluster and ensures the ability to live migrate
between hosts.

92
VMware vSphere

Introduction
VMware vSphere is fully supported by Nutanix with many points of integration. vSphere
major and minor releases go through extensive integration and performance testing,
ensuring that both products work well together in even the largest enterprises. Rest
assured that if you deploy vSphere, the solution is fully supported and performant.

Control Plane
When using vSphere as a part of this document, the control plane is vCenter to manage
the vSphere components and Prism Central for managing Nutanix components.

VRT-003 DEPLOY AN HA VCENTER INSTANCE WITH


EMBEDDED PSC TO MANAGE ESXI-BASED
NUTANIX CLUSTERS.

Impact Requires additional vCenter licenses and added installa-


tion time to fully configure each vCenter instance.

Justification Nutanix upgrade automation features, such as 1-click


upgrades and LCM, require the advanced control
features that vCenter provides. In addition, vCenter
Server is required to create and manage the VMware
vSphere cluster responsible for DRS and HA.

EVC
VMware’s Enhanced vMotion Capability (EVC) allows VMs to move between different
processor generations within a cluster. EVC mode is a manually configured option, unlike
the corresponding feature in AHV. EVC should be enabled at cluster installation time
to negate the possible requirement to reboot VMs in the future when it is enabled.

VRT-004 ENABLE EVC MODE AND SET IT TO THE HIGHEST


COMPATIBILITY LEVEL THE PROCESSORS IN THE
CLUSTER WILL SUPPORT.

Impact Requires manual configuration to determine the


highest compatible EVC mode.

Justification Even if a cluster is homogenous at the outset, it is


likely new nodes will be added over time. Enabling EVC
mode when a cluster is built helps ensure future node
additions are seamless.

93
HA/DRS
VMware HA and DRS are core features which should be utilized as a part of this
document. Nutanix best practices dictate that a few HA/DRS configuration settings be
changed from the default. These changes are outlined in the design decisions below.

VRT-005 ENABLE HA

Impact Requires additional cluster resources to ensure VMs


on the failed node(s) can be restarted elsewhere in the
cluster.

Justification In production environments VM availability is extremely


important. Automatically restarting VMs in case of
node failure helps ensure workload availability.

VRT-006 ENABLE DRS WITH DEFAULT AUTOMATION LEVEL

Impact Initiates vMotion for optimal VM performance in case


of resource contention. Moving VMs may temporarily
impact Nutanix data locality.

Justification Hot spots can develop in a cluster. DRS moves VMs


as needed to ensure optimal performance.

VRT-007 DISABLE VM PDL AND APD COMPONENT


PROTECTION

Impact Permanent device loss (PDL) and all paths down


(APD) monitoring is disabled, and vSphere will not
monitor for these conditions.

Justification These settings are designed for non-Nutanix storage.


They should not be enabled in Nutanix clusters because
they can cause storage unavailability. APD monitoring
is managed via a Nutanix component.

94
VRT-008 CONFIGURE DAS.IGNOREINSUFFICIENTHBDATASTORE
IF ONE NUTANIX CONTAINER IS PRESENTED TO THE
ESXI HOSTS

Impact Eliminates false positive warnings when a cluster uses a


single datastore.

Justification In many Nutanix clusters a single datastore is used.


The warning message is not applicable in Nutanix
environments.

VRT-009 DISABLE AUTOMATION LEVEL FOR ALL CVMS

Impact DRS will not try and move a CVM to another host.
Nor will CVM resources be taken into account for
HA calculations.

Justification CVMs are stored on local storage, and cannot be


vMotioned to another host. HA does not need to
reserve resources for CVMs since they are bound
to a single node and cannot be restarted elsewhere
in the cluster.

VRT-010 SET HOST FAILURES CLUSTERS TOLERATE


TO 1 FOR RF2 AND TO 2 FOR RF3.

Impact Automatically sets the proper amount of reserved


resources in the cluster.

Justification Starting in vSphere 6.5, cluster resource percentage is


now tied to “host failures cluster tolerates” and auto-
matically adjusts the percentage based on the number
of nodes in the cluster.

95
VRT-011 SET HOST ISOLATION RESPONSE TO “POWER OFF
AND RESTART VMS”

Impact If a host becomes isolated on the network, the VMs


on that host will be powered off and restarted on
another node in the cluster.

Justification In an HCI environment all communications, including


storage, are via the ethernet network. If a host becomes
isolated on the network, VMs need to be moved to a
healthy host to continue to function.

VRT-012 SET HOST ISOLATION RESPONSE TO


“LEAVE POWERED ON” FOR CVMS

Impact If a host becomes isolated on the network,


the CVMs will remain on.

Justification If there is a transient network disruption, you do


not want the CVMs to be powered off.

VRT-013 DISABLE HA/DRS FOR EACH NUTANIX CVM

Impact vSphere will not try and restart the CVM on another
host, which is impossible since the CVM uses local
storage.

Justification CVM uses local storage and no attempt to restart


on other hosts should be performed.

96
SIOC

VRT-014 DISABLE SIOC

Impact Enhanced storage statistics are not available.

Justification If SIOC is enabled, it can have negative side effects.


These include storage unavailability, creating unneces-
sary lock files, and complications with Metro Availability.

4.4. MANAGEMENT LAYER

4.4.1 CONTROL PLANE


This document provides a primary control plane where the majority of daily operational
tasks are performed. In this design:

• The primary control plane is Prism Central.


• VMware vCenter is also required if you are using VMware ESXi.

Each of these has a maximum size that limits the total number of VMs, nodes,
and clusters that can be managed by an instance.

Prism Central
Prism Central (PC) is a global control plane for Nutanix, which includes managing VMs,
replication, monitoring, and value-add products such as:

• Nutanix Calm for application-level orchestration.


• Nutanix Flow for microsegmentation <<of east-west traffic>>.
• Prism Pro for advanced task automation and capacity planning.

These products are installed from Prism Central and managed centrally for
all Prism Element clusters associated to the Prism Central instance.

Prism Central plays an important part in a deployment regardless of the scale. There are
two deployment architectures for Prism Central that can be utilized and scaled
depending on the size and goals of the design.

97
Single-VM Prism Central Deployment
The single VM deployment option for Prism Central offers a reduced footprint option for
designs that do not require high availability for the control plane beyond that provided
by the hypervisor. The single Prism Central VM option can be used with a small or large
VM sizing. The size directly correlates to the size of environment it can manage.

Scale-Out Prism Central Deployment


A scale out PC cluster is comprised of three VMs. All of the members of the cluster
are running all of the Nutanix services and products in a distributed architecture.
This distributed architecture allows for the scaled-out Prism Central to manage a
larger environment as well as handle more incoming requests from users and the API.
Additionally, it provides a higher level of availability. If one of the VMs is down
or performing a maintenance operation, the other members remain available.
You have the option of deploying the PC VMs as small- or large-sized VMs.

98
The scale-out Prism Central architecture deploys 3 VMs on the same cluster and is
superior in terms of availability and capacity. If scale-out is not the best option from
the beginning, it is fairly simple to go from a single PC VM deployment to a scale-out
architecture at a later time.

Image Templates
Within any environment operating at scale there is a need to keep approved template
images available and in sync between clusters. For AHV clusters, image placement
policies should be utilized for this. Image policies are configured to determine the
clusters each image can or cannot be utilized in. This makes the initial roll out of new
and updated image versions easy.

Prism Central Recommendations

RECOMMENDATION DESCRIPTION
AREA

Prism Central instances Deploy Prism Central in the following way:


• Scale out mode (cluster of 3 VMs).

The benefits:
• Simplifies capacity planning.
• Simplifies platform life cycle management.
• Simplifies management for virtual networking.
• Reduce management overhead.

Workload Domains Deploy management workloads such as Prism Central


on management clusters.

Access Control • Integrate with Active Directory.


• Leverage AD groups to optimize access management.

4.4.2 VMWARE VCENTER SERVER


For designs utilizing ESXi as the hypervisor, vCenter is a critical part of the infrastructure
for hypervisor cluster management and VM operations. It may be accessed often in
environments that are not highly automated. In automated environments, vCenter is
needed to perform operations that are sent to it from different orchestration layers.

The vCenter appliance can be deployed in many different sizes. This ultimately
determines how large an environment you can support in terms of VMs, hosts, and
clusters. Refer to the official VMware documentation for the version you are deploying
to determine the proper sizing for your design.

99
Beyond the size of the environment, the vCenter appliance can be deployed as a single
VM or as a vCenter High Availability (vCenter HA) for additional availability. The option
to deploy vCenter HA does not increase the size of the environment it can manage as
a single virtual appliance is active at any time. The size of the environment that can be
managed is based on the size of the VMs deployed.

To ensure that the control plane is as highly available as possible, the clustered deployment
option is preferred as long as its compatible with other layers of the solution.

vCenter Server Recommendations

RECOMMENDATION DESCRIPTION
AREA

vCenter instances Deploy one vCenter in an HA configuration for all workloads.


• Configure with Embedded PSC.

Workload domains Deploy management workloads on management clusters.

Access Control • Integrate with Active Directory.


• Leverage AD groups to optimize access management.

4.4.3 DEPENDENT INFRASTRUCTURE

There are a variety of other infrastructure services that are necessary for a successful
Nutanix deployment, such as NTP, DNS, and AD. You may already have these infrastruc-
ture deployed and available for use when you deploy your Nutanix environment. If they
do not exist, then you will probably need to deploy them as part of the new
environment.

100
NTP
Network Time Protocol (NTP) is used to synchronize computer clock times including
network, storage, compute and software. If clock times drift too far apart ,some products
may have trouble communicating across layers of the solution. Keeping time in-sync is
beneficial when examining logs from different layers of the solution to determine the
root cause of an event.

A minimum of three NTP servers should be configured and accessible to all solution
layers with the NTP standard recommendation being five to detect a rogue time source.
These include AOS, AHV, and Prism Central, plus vCenter and ESXi if using VMware
vSphere as the virtualization layer. Use the same NTP servers for all infrastructure
components.

Do not use an Active Directory Domain Controller as an NTP source.

If you are in a Dark site with no internet connectivity, consider using a switch or GPS
time source.

DEP-001 A MINIMUM OF THREE NTP SERVERS FOR


ALL INFRASTRUCTURE COMPONENTS SHOULD
BE PROVIDED.

Impact

Justification

DNS
Domain Name System (DNS) is a directory service that translates domain names of the
form domainname.ext to IP addresses. DNS is important to ensure that all layers can
resolve names and communicate. A highly available DNS environment should be utilized
to support this design. At least two DNS servers should be configured and accessible at
all layers to ensure components can reliably resolve addresses at all times.

These layers include the following:


• AOS
• AHV
• Prism Central
• VMware ESXi
• VMware vCenter
• Network switches

101
DEP-002 A MINIMUM OF TWO DNS SERVERS
SHOULD BE CONFIGURED ACCESSIBLE TO ALL
INFRASTRUCTURE LAYERS

Impact

Justification

Active Directory
Active Directory (AD) is a directory service developed by Microsoft for Windows domain
networks. AD often serves as the authoritative directory for all applications and
infrastructure within an organization. For this design, all of the consoles and element
managers will utilize RBAC and use AD as the directory service for user and group
accounts. Where possible, AD groups should be utilized to assign privileges for easier
operations. User access can then be controlled by adding or removing a user from the
appropriate group.

The AD design should be highly available to ensure that directory servers are always
available when authentication requests occur.

Logging Infrastructure
Capturing logs at all layers of the infrastructure is very important. For example, if there is
a security incident, logs can be critical for forensics. An example of a robust log collector
is Splunk, but there are other options. Nutanix recommends that you deploy a robust
logging platform that meets your security requirements. All infrastructure logs should
be forwarded to the centralized log repository.

RECOMMENDATION: It is commonly recommended to store the logs in a different


cluster, or location, from where they are being collected. This protects the logs in case of
a catastrophic cluster failure, ensuring they can later be used for forensics.

102
4.5. SECURITY LAYER

Nutanix Enterprise Cloud can be used to build private or multi-tenant solutions, and
depending on the use case the security responsibility will vary. The security approach
includes multiple components:

• Physical
• Datacenter access
• Equipment access (firewalls, load balancers, nodes, network switches, racks,
routers)
• Virtual Infrastructure
• Clusters
• Nodes
• Management components
• Network switches
• Firewalls
• Load balancers
• Threat vectors
• Active
• Automated
• Internal
• External
• Workloads
• Applications
• Containers
• Virtual Machines

Nutanix Enterprise Cloud infrastructure is designed to deliver a high level of security


with less effort. Nutanix publishes custom security baseline documents for compliance,
based on United States Department of Defense (DoD) RHEL 7 Security Technical
Implementation Guides (STIGs) that cover the entire infrastructure stack and prescribe
steps to secure deployment in the field. The STIGs uses machine-readable code to
automate compliance against rigorous common standards.

Nutanix has implemented Security Configuration Management Automation (SCMA)


which is installed, configured, and enabled during the Nutanix deployment. There are
configuration options, however one key differentiator is that Nutanix provides this
functionality by default. SCMA checks multiple security entities for both Nutanix storage
and AHV. Nutanix automatically reports inconsistencies and reverts them to the baseline.

With SCMA, you can schedule the STIG to run hourly, daily, weekly, or monthly. STIG has
the lowest system priority within the virtual storage controller, ensuring that security
checks do not interfere with platform performance.

103
In addition, Nutanix releases Nutanix Security Advisories which describe potential
security threats and their associated risks plus any identified mitigation(s) or patches.

For more information about please refer to Building Secure Platforms And Services
With Nutanix Enterprise Cloud.

4.5.1 AUTHENTICATION
Maintain as few user and group management systems as possible. A centrally managed
authentication point is preferred to many separately managed systems. Based on that
general recommendation you should at a minimum take advantage of the external LDAP
support provided by Nutanix components.

Prism Central also provides both LDAP and Security Assertion Markup Language
(SAML) and makes it possible for users to authenticate through a qualified Identify
Provider (IdP). If none of the options are available Nutanix also provides local user
management capabilities.

SEC-001 USE ACTIVE DIRECTORY AUTHENTICATION.


THIS APPLIES FOR USER AND SERVICE ACCOUNTS.

Impact Requires a highly available active directory


infrastructure and additional initial configuration.

Justification User activity logged for auditing purposes and account


security is configured and maintained from a single
centralized solution.

SEC-002 USE SSL/TLS CONNECTION TO ACTIVE DIRECTORY.

Impact Might require additional configuration.

Justification Increases security by eliminating cleartext exchanges


on the network.

4.5.2 CERTIFICATES
All components facing consumers should be protected with certificate authority (CA)
signed certificates. Internal or external signed certificates depend on consumer classifi-
cation and what kind of service the specific component provides.

104
SEC-003 USE CERTIFICATE AUTHORITY (CA) THAT ARE
CONSIDERED "TRUSTED CA" BY YOUR ORGANIZA-
TION FOR THE COMPONENTS WHERE CERTIFICATES
CAN BE REPLACED. THIS CAN BE EITHER INTERNAL
OR EXTERNAL SIGNED CERTIFICATES.

Impact Implementation effort and potentially an extra cost


if certificates need to be purchased.

Justification Certificates are required as part of Transport Layer


Security (TLS) protocol and are negotiated during
session establishment to establish secure session.
They provide an extra layer of security and prevent
man-in-the-middle attacks.

4.5.3 CLUSTER LOCKDOWN


Nutanix cluster lockdown lets you enforce SSH access to the CVMs and hosts
using key pairs instead of passwords.

SEC-004 DO NOT USE NUTANIX CLUSTER LOCKDOWN.

Impact Users can access the CVM with username and


password.

Justification Nutanix recommends that access, including SSH,


directly to CVM and hypervisor should be restricted to
as few entities as possible:
• Operators. Keep service account passwords secure.
• Remote workstations. Use e.g. jump hosts.
• Networks. Use network segmentation via firewall
rules or IPtables rules.

If password-less communication is required, then enable Nutanix cluster lockdown.

4.5.4 VMWARE CLUSTER LOCKDOWN


Cluster lockdown can be enabled at the VMware layer as well. Enabling either normal
or strict cluster lockdown mode can fail the Nutanix cluster functionality. Make the
necessary ESXi configurations to guarantee Nutanix functionality if vSphere cluster
lockdown mode is required.

105
SEC-005 DO NOT USE VSPHERE CLUSTER LOCKDOWN.

Impact Users can access vSphere cluster with username


and password.

Justification No requirements exist in this design that justify vSphere


cluster lockdown mode.

4.5.5 HARDENING
There are certain hardening configurations you can apply for the AHV host and
the CVM if required:
• AHV and CVM
• AIDE. Advanced Intrusion Detection Environment.
• Core. Enable stack traces for cluster issues.
• High Strength Passwords. Enforce complex password policy (Min length 15,
different by at least 8 characters).
• Enable Banner. Get specific login message via SSH.
• CVM only
• Enforce SNMPv3 only.

SEC-006 ENABLE CVM AND HYPERVISOR AIDE

Impact Additional, very limited, CVM and hypervisor resources


will be required.

Justification With AIDE enabled, the environment will perform


checksum verification of all static binaries and libraries
for improved security.

SEC-007 CONFIGURE SCMA TO RUN HOURLY

Impact SCMA is lightweight and adds very limited additional


load to CVM and hypervisor. Security benefits outweigh
the added resources.

Justification To more quickly capture unacceptable configuration


drift. Default is to take actions daily.

106
SEC-008 STOP UNUSED ESXI SERVICES AND CLOSE
UNUSED FIREWALL PORTS.

Important: Make sure not to stop a service or close


a firewall port required by Nutanix, such as:
• SSH
• NFS

Impact Additional configuration required.

Justification To limit the attack surface.

We recommend that you follow the Hardening Guide for the CVM.

If you are using AHV, follow the Hardening Guide For AHV.

For updated Hardening Guides, search portal.nutanix.com for the most recent guide
that relates to your version of AOS and AHV.

4.5.6 INTERNET FACING SERVICES


Security for Internet-facing services is out of the scope of this design, but highly
important and mentioned here for awareness.

Internet-facing services are at constant risk of being attacked. Two common types of
attacks are denial of service (DoS) and distributed denial of service (DDoS). To help
mitigate the potential impact of a DoS or a DDoS attack a design can take advantage of:

• Multiple internet connections (active/backup).


• ISP provided DoS and DDoS filtering.

There are additional ways to implement protection against these types of attacks.

4.5.7 LOGGING
Logging is critical from an auditing and traceability perspective so making sure the
virtual infrastructure, AOS, Prism Central, AHV, ESXi, any additional software in the
environment send their log files to a highly available log infrastructure is critical.

RECOMMENDATION: A minimum one of the individual targets making up the highly


available logging infrastructure should run outside of the virtual infrastructure itself
so that if the virtual infrastructure is compromised, forensic investigations will need
to access logs that might not be available on the cluster itself.

107
A single centralized activity logging solution for auditing purposes and account security
should be configured and maintained.

SEC-009 SEND LOG FILES TO A HIGHLY AVAILABLE


SYSLOG INFRASTRUCTURE.

Impact Requires initial configuration and a highly available


logging infrastructure.

Justification Logs are helpful in report creation, during


troubleshooting, and for investigating potential security
breaches.

SEC-010 INCLUDE ALL NUTANIX MODULES IN LOGGING.

Impact Data will be generated per module meaning the more


modules included, the more log data is generated.

Justification Ensures data from all modules are included and search-
able via a logging system. Refer to Nutanix Syslog
documentation for additional information. Modules
can be excluded if needed.

SEC-011 USE ERROR LOG LEVEL FOR THE NUTANIX


COMPONENTS.

Impact Will not send all logging information to syslog


infrastructure. The following levels will be included:
• Error
• Critical
• Alert
• Emergency

Justification Will provide error-based information. Refer to Nutanix


Syslog documentation for additional information.

108
SEC-012 USE DEFAULT ESXI LOGGING LEVEL,
LOG ROTATION, AND LOG FILE SIZES.

Impact In rare situations, logs might rotate very fast.

Justification No reason to change size and rotation unless


a huge amount of logs are expected.

SEC-013 IF EXTRA SECURITY AND RELIABILITY ARE


REQUIRED, THEN USE TCP FOR LOG TRANSPORT.
OTHERWISE, USE THE DEFAULT SYSLOG
PROTOCOL, UDP.

Impact Requires a sender and receiver to establish communica-


tion which generates minimal additional network traffic.
May require configuration of logging infrastructure to
accept TCP communication.

Justification Provides a reliable logging setup and ensures the logs


are being received by the logging infrastructure.

SEC-014 USE PORT 514 FOR LOGGING.

Impact Traffic on well-known ports can be easy to locate.

Justification Defined port in syslog RFC. No reason has been


identified to change the port.

4.5.8 NETWORK SEGMENTATION


To protect Nutanix CVM and hypervisor traffic, place them together in their own dedicated
VLAN, separate from other VM traffic. This applies to all hosts in a cluster.

Nutanix recommends configuring the CVM and hypervisor host VLAN as a native, or
untagged, VLAN on the connected switch ports. This native VLAN configuration allows for
easy node addition and cluster expansion. By default, new Nutanix nodes send and receive
untagged traffic. If you use a tagged VLAN for the CVM and hypervisor hosts instead, you
must configure that VLAN while provisioning the new node, before adding that node to the
Nutanix cluster.

Do not segment Nutanix storage and replication traffic, or iSCSI Volumes traffic, on separate
interfaces (VLAN or physical) unless additional segmentation is required by mandatory
security policy or the use of separate physical networks. The added complexity of
configuring and maintaining separate networks with additional interfaces can not be
justified unless absolutely required.

109
SEC-015 USE VLAN FOR TRAFFIC SEPARATION OF
MANAGEMENT AND USER WORKLOADS.

Impact This provides clear separation between:


• Management and end user traffic.
• Different management traffic types.
• Different end user traffic types.

Justification VLAN is a well-known standard for traffic separation


when there are no requirements specifying physical
separation.

SEC-016 PLACE CVM AND HYPERVISOR ON THE SAME


VLAN AND SUBNET.

Impact Different components placed on the same network


segment. Alternate configurations are not supported.

Justification Required for Nutanix functionality. Both hypervisor


and CVM should be classified as core or management
service traffic delivering the same service and can
therefore be placed on the same VLAN.

SEC-017 PLACE OUT-OF-BAND MANAGEMENT ON A


SEPARATE VLAN OR PHYSICAL NETWORK.

Impact Network or VLAN management.

Justification Out-of-band management is not required for core


Nutanix functionality. To provide additional security,
the management interface should not be on the same
network segment as the CVM and hypervisor.

4.5.9 ROLE BASED ACCESS CONTROL (RBAC)


Nutanix has built-in RBAC in both Prism Central and Prism Element, with the option to
create custom roles in Prism Central. RBAC is used to limit access and control for various
individuals and groups of administrative users. Use a least-privilege and separation-of-
duties approach when assigning permissions to make sure each group or individual user
has just enough permissions to perform their duties. Use pre-defined roles or create new
roles as needed.

110
Configure RBAC at the Prism Central level since it provides the overlying management
construct. Via Prism Central, consumers and administrators will be directed to underlying
components, such as Prism Element, when needed. This ensures your least-privilege
configuration stays in place, avoiding common mistakes that occur when RBAC is
configured at multiple different levels.

The following table shows Prism Central default roles:

ROLE PURPOSE

Super Admin Highest-level admin with full infrastructure and tenant


access. Manages a Nutanix deployment and can set up,
configure, and make use of every feature in the
platform.

Self-Service Admin Cloud admin for a Nutanix tenant. Manages virtual


infrastructure, oversees self-service, and can delegate
end-user management.

Project Admin Team lead to whom cloud administration is delegated


in the context of a project. Manages end-users within
a project and has full access to their entities.

Prism Viewer View-only admin. Has access to all infrastructure and


platform features but cannot make any changes.

Prism Admin Day-to-day admin of a Nutanix deployment. Manages


the infrastructure and platform but cannot entitle other
users to be admins.

Operator Lifecycle manager for team applications. Works on


existing application deployments, exercises blueprint
actions.

Developer Application developer within a team. Authors


blueprints, tests deployments, and publishes
applications for other project members.

Consumer Owner of team applications at runtime. Launches


blueprints and controls their lifecycle and actions.

111
When creating custom roles in Prism Central there are multiple entities available,
each with their own set of permission definitions:
• App
• VM
• Blueprint
• Marketplace Item
• Report
• Cluster
• Subnet
• Image

The following table describes Prism Element default roles:

ROLE PURPOSE

User Admin Able to view information, perform any administrative


task, and create or modify user accounts.

Cluster Admin Able to view information and perform any


administrative task. Cannot create or modify user
accounts.

Viewer Able to view information only. No permission to


perform any administrative tasks. Useful for auditing
purposes.

In addition, there are pre-defined roles in VMware vCenter Server, with the option to
create custom roles.

SEC-018 USE A LEAST-PRIVILEGE ACCESS APPROACH WHEN


DECIDING WHO HAS ACCESS. ALIGN RBAC
STRUCTURE AND USAGE OF DEFAULT PLUS CUSTOM
ROLES ACCORDING TO COMPANY REQUIREMENTS.

Impact Access requirements need to be defined, created and


implemented for each access group.

Justification This approach helps protect the environment from


having users performing actions they are not
authorized to perform.

112
SEC-019 ALIGN RBAC STRUCTURE AND USAGE OF DEFAULT
PLUS CUSTOM ROLES ACCORDING TO THE
COMPANY REQUIREMENTS DEFINED VIA SEC-018.

Impact If no company structure exists, fall back on description


in SEC-018 and create a structure that meets your
needs.

Justification There are numerous ways of applying RBAC. This


design document can’t cover all possibilities.

4.5.10 DATA AT REST ENCRYPTION


Data at Rest Encryption (DaRE) is a security measure to prevent data from being stolen.
DaRE offers protection against data exposure in the event of:

• Activities that require a full data copy that will be used outside the platform.
• Failed drives leaving the datacenter.
• Drive or node theft.

Keeping management traffic, including storage traffic, on a separate network is often


an adequate data security practice. Nutanix supports four different options for Data at
Rest Encryption (DaRE):

• Self Encrypting Drives (SEDs) with an External Key Manager (EKM).


• Software-based encryption with an EKM.
• Software-based encryption and SED with an EKM (aka Dual Encryption).
• Software-based encryption with the Nutanix Native Key Manager (KMS).

With all software-based encryption options, encryption is performed in the software


layer and data is encrypted over the wire, between CVMs. All or part of the user VM data
can be encrypted.

• AHV encrypts the entire Nutanix cluster.


• ESXi gives you the option to define encryption on a per-Nutanix-container basis
if required.

Self Encrypting Drives (SEDs) provide FIPS 140-2 Level 2 compliance and can be used
without any performance impact. Nutanix Software Based Encryption and Native Key
Manager are FIPS 140-2 Level 1 Evaluated.

You can use both software-based encryption and SEDs. This requires an external key
management server.

113
NOTE:
• All methods of DaRE are FIPS 140-2 compliant however if levels 2, 3, or 4 are required
a hardware component is necessary.
• Software-based encryption with the Nutanix Native Key Manager (KMS) can encrypt
storage containers (ESXi or Hyper-V) or the entire Cluster (AHV).

SEC-020 DO NOT USE STORAGE ENCRYPTION.

Impact Data is never encrypted unless encrypted at the virtual


machine level or via the ESXi VM encryption feature.

Justification There are no technical requirements to justify


implementing storage encryption in this design.

4.5.11 KEY MANAGEMENT SERVER


A key management server is required when using Nutanix storage encryption.
When using an external key management service you should have a minimum of two
key management servers running. At least one server must not run on the infrastructure
it is protecting.

SEC-021 DO NOT USE A KEY MANAGEMENT SERVER.

Impact N/A

Justification No functionality will be used that requires a key


management server.

4.5.12 XI BEAM
During the environment lifecycle it’s important to ensure compliance is met and security
configuration meets required standards. Xi Beam is a cost and security optimization
SaaS offering that works across public cloud and on-premises environments.
Beam Security Compliance for Nutanix provides a centralized view of the security
posture of a Nutanix private datacenter and provides visibility into the security of your
on-prem environment based on known compliance standards.

Beam can help you to comply with regulatory and business-specific compliance policies
such as PCI-DSS, HIPAA, and NIST. (note: Xi Beam is not available for dark sites)

You gain deep insights into your on-premises Nutanix deployments based on over
300 audit checks and security best practices according to:
• Audit security checks for access, data, networking and virtual machines.
• Compliance checks against PCI-DSS v3.2.1 for AHV, AOS, Flow and Prism Central.

114
4.6. AUTOMATION LAYER

Nutanix supports intelligent IT operations and advanced automation that enable you to
streamline operations, enable IT as a Service, and support the needs of developers and
business teams. This section covers automation and orchestration of virtual
infrastructure, focusing on provisioning and maintenance which are important aspects
of the overall solution. Nutanix tools reduce the time required to perform initial setup
plus software and firmware upgrades.

4.6.1 UPGRADES
Upgrades can occur across a variety of components. The process to upgrade many of
these components is primarily executed using LCM, which is able to understand
dependencies without operator intervention.

It’s important to understand the impact of upgrading certain software components, as


their upgrade may affect the outside world.

• AOS. When upgrading, each CVM is individually upgraded in a rolling fashion. While a
CVM is rebooting, the host it is running on is redirected to a remote CVM to deliver
storage IO. This is invisible to the VM. However, it does result in a loss of locality, which
can potentially impact IO throughput and latency if the load on the system is high.

• Hypervisor. When upgrading the hypervisor on a node, each VM must be migrated off
the host for the update to be performed so that a reboot of the host can occur. For
vSphere, this requires vCenter integration to be configured to allow for a host to be put
into maintenance mode.

AHV live migration and vSphere vMotion are normally non-disruptive, however certain
applications that utilize polling-style device drivers or that have near-real-time kernel
operations cannot tolerate being migrated during the final cutover step. Hypervisor
updates will require downtime for these apps if they don’t offer native failover
functionality.

Firmware
Firmware can be updated for a variety of devices including:

1. Board Management Controller (BMC)


2. Motherboard (BIOS)
3. Host Bus Adapter (HBA)
4. Host Boot Device (SATA DOM)
5. SSD or HDD
6. NIC

115
Software
Software updates can be performed for multiple components, including:

1. Acropolis Operating System (Core AOS in CVM)


2. Hypervisor (vSphere or AHV)
3. Nutanix Cluster Check (NCC)
4. Life Cycle Manager (LCM)
5. Acropolis Operating System (AOS in Prism Central)
6. MicroServices Platform (MSP in Prism Central)
7. Services Functionality
• Nutanix Files (Prism Element)
• Calm/Epsilon (Prism Central)
• Objects Manager (Prism Central)
• Karbon (Prism Central)
• Era (independent)
• Move (independent)
• X-Ray (independent)

4.6.2 LIFE CYCLE MANAGER (LCM)


Nutanix LCM consists of a framework and a set of modules for inventory and update
management. Its modules are independent of AOS. LCM tracks software and firmware
versions of all entities in a cluster and provides a single pane of glass interface to per-
form any software or firmware maintenance operation. While some of these updates
can be performed online, many require a reboot of the physical host. LCM will perform
the DRS / ADS migration triggers for maintenance mode similar to a hypervisor
upgrade.

LCM supports environments with internet connectivity as well as dark site deployments.

116
4.6.3 FOUNDATION
Foundation manages the initial setup and configuration of a cluster. Nutanix nodes may
come pre-installed with AHV and the Controller Virtual Machine (CVM) and you can:
• Add the Nodes to an existing Nutanix cluster.
• Create a new Nutanix cluster.
• Re-image the nodes with different AHV and or AOS version or different hypervisor
and create a Nutanix cluster.

There are four ways of invoking a Foundation process:


1. Connect to a factory deployed Nutnaix nodes CVM via http://CVM_IP:8000/
2. Use portable Foundation, Mac or Windows executable.
3. Standalone Foundation VM.
4. When adding a node to an existing Nutanix cluster.
5. Call Foundation API via third party solution which provides a way to orchestrate
the deployment.

NOTE: The Foundation process may vary for different hardware vendors.

117
Nutanix provides a Foundation pre-configuration service which is accessible via install.
nutanix.com. Via the service, you can define and download the Nutanix cluster
configuration to be used during the Foundation process. The downloaded file contains
all configuration required to perform the foundation operation and comes in json format.
This makes it easy to keep track of configurations, document them, and share with
your peers.

118
The following table provides an overview of the functions available for the different
Foundation software options.

CVM Foundation PORTABLE STANDALONE


FOUNDATION FOUNDATION VM
(WINDOWS,
MAC)

Function Factory-imaged Factory-imaged Maintains high availability


nodes. nodes. in the event of the loss of
one switch.

Bare-metal nodes Factory-imaged Requires at least two


nodes. TOR switches, which
will impact cost and rack
space. Increases total
bandwidth to each host
to 20 Gbps.

Bare-metal nodes

Hardware Any Nutanix (NX)

Dell (XC)

HPE (DX) Any

If IPV6 is disabled Cannot image IPMI IPv4 IPMI IPv4 required on


nodes. required on the nodes.
the nodes.

VLAN Support No No Yes

LACP Support Yes Yes Yes

Multi-homing N/A Yes Yes


Support

RDMA Support Yes Yes Yes

NOTE: When available, Foundation Central will provide capabilities to perform


Foundation operations via Prism Central.

119
4.7. BUSINESS CONTINUITY
4.7.1. BACKUP AND RECOVERY
Nutanix provides native support for snapshots and clones. Both snapshots and clones
leverage a redirect-on-write algorithm which is effective and efficient. Offloaded
snapshots can be leveraged via VAAI, ncli, ODX, REST APIs and Prism.

Nutanix also provides asynchronous and near-synchronous (NearSync) replication to


copy local snapshots to a remote cluster.

There are two different forms of snapshots to support different modes of replication:

• Full snapshots for asynchronous replication (with RPO of 60 minutes or greater).


• Lightweight snapshots (LWS) for NearSync replication (when the RPO is between
15 minutes and 1 minute).

Full snapshots are efficient at keeping system resource usage low when you are creating
many snapshots over an extended period of time. LWS reduces metadata management
overhead and increases storage performance by decreasing the high number of storage
I/O operations that long snapshot chains can cause.

Nutanix provides APIs that backup vendors such as HYCU, Veeam, and Commvault
leverage to take native snapshots and backups on Nutanix.

Local Backup
Nutanix native snapshots provide data protection at the VM level, and our crash-
consistent snapshot implementation is the same across hypervisors. The implementation
varies for application-consistent snapshots due to differences in the hypervisor layer.
Nutanix can create local backups and recover data instantly to meet a wide range of
data protection requirements. These local snapshots should not take the place of a
comprehensive traditional backup and disaster recovery solution.

VM snapshots are by default crash-consistent, which means that the vDisks and volume
groups captured are consistent to a single point in time and represent the on-disk data.
Application-consistent snapshots capture the same data as crash-consistent snapshots,
plus all data in memory and all transactions in process. The Nutanix application-
consistent snapshot uses Nutanix Volume Shadow Copy Service (VSS) to quiesce the file
system prior to taking a snapshot for both ESXi and AHV.

Protection Domain and Consistency Groups


Protection Domains. A Protection Domain is a group of VMs or volume groups that can
be either snapshotted locally or replicated to one or more remote clusters. Prism
Element uses Protection Domains when replicating between remote sites.

120
RECOMMENDATIONS: Protection Domains
• No more than 200 VMs per PD.
• No more than 10 VMs per PD with NearSync.
• Group VMs with similar RPO requirements.

Consistency groups. Administrators can create a consistency group for VMs and volume
groups that are part of a protection domain where you want to snapshot all members of
the group in a crash-consistent manner.

RECOMMENDATIONS: Consistency Groups


• Keep consistency groups as small as possible. Try to limit consistency groups to fewer
than 10 VMs for efficiency.
• A consistency group using application-consistent snapshots can contain only 1 VM.

BCN-001 PLACE NEARSYNC VMS IN THEIR OWN


PROTECTION DOMAIN.

Justification NearSync can only have 1 schedule so place NearSync


VMs in their own PD.

Implication Multiple PDs will need to be created for different


NearSync VMs depending on their schedule
requirements.

Snapshot Schedule and Retention Policy

Full Snapshots and Async Replication


Your RPO determines how much data you will lose in the event of a failure. The snapshot
interval should be equal to your desired RPO. You can create multiple schedules for a
Protection Domain using full snapshots at various frequencies with different retention
policies.

BCN-002 CONFIGURE SNAPSHOT SCHEDULES TO RETAIN


THE LOWEST NUMBER OF SNAPSHOTS WHILE
STILL MEETING THE RETENTION POLICY.

Justification Metadata management on a cluster is more efficient


for lower number of snapshots.

Implication Multiple schedules need to be created for the same


Protection Domain at different levels rather than a
simple daily schedule.

121
LWS and NearSync Replication
Nutanix offers NearSync with a telescopic schedule (time-based retention). When the
RPO is set to be ≤ 15 minutes and ≥ one minute, you have the option to save your
snapshots for a specified number of weeks or months. Multiple schedules cannot be
created with NearSync.

The following table represents the schedule to save recovery points for 1 month:

TYPE FREQUENCY RETENTION

Minute Every minute 15 minutes

Hourly Every hour 6 hours

Daily Every 24 hours 7 days

Weekly Every week 4 weeks

Monthly Every month 1 month

In addition to above, refer to general guidelines and limitations for


Async & NearSync DR.

4.7.2. ADDITIONAL BUSINESS CONTINUITY OPTIONS


Nutanix offers a number of additional options for ensuring business continuity that are
beyond the scope of this design:

• Metro Availability. Full synchronous replication across metropolitan distances.


• Xi Leap. Cloud-based disaster recovery service.
• Nutanix Mine. A backup solution using Nutanix infrastructure for secondary storage.
• Works in conjunction with third-party backup vendors such as HYCU and Veeam.

122
4.8. OPERATIONS
4.8.1. CAPACITY & RESOURCE PLANNING
After initial installation and migration of workloads to the platform, long term capacity
planning should be enabled to avoid running out of resources. Many of the features
discussed are integrated into Prism Pro.

OPS-001 DEPLOY PRISM PRO FOR ENHANCED CLUSTER


MANAGEMENT

Impact Requires an additional license, which can increase cost.

Justification Analytics, Capacity planning, Custom Dashboards,


and Playbooks require the presence of Prism Pro.

Cluster capacity
Native runway calculations built into Prism Central will automatically calculate the
remaining capacity of the system as soon as the cluster Prism Element is brought
under management by Prism Central.

These runway calculations should be configured to run as part of periodic reports


and reviewed on a regular basis to ensure sufficient capacity exists. This is especially
important for organizations that have a significant lag between the date a commitment
to purchase additional gear occurs and when its online and available to use.

123
Expansion planning
When a new workload is identified to be onboarded, planning needs to occur if the new
workload size or requirements are outside of established patterns. Prism Central offers
a scenario simulation function that shows how the available capacity runway would
change if this new workload was accommodated. Utilizing this planning functionality
helps avoid unplanned capacity constraints.

OPS-002 REVIEW MONTHLY CAPACITY PLANNING

Impact Requires an additional license, which can impact cost.

Justification Analytics, capacity planning, custom dashboards,


and playbooks require Prism Pro.

Right-Sizing VMs
While general system capacity planning is useful, accurate and efficient system level
planning requires accurate sizing for individual workloads. Machine learning in Prism
Central provides anomaly detection for VMs when the workload crosses learned
thresholds.

124
In addition, based on a number of thresholds, the system will categorize VMs based
on their behavior. These categories include:

Bully. A VM with potential to degrade the overall cluster performance by impacting


the capability of the node they reside on.
Constrained. A VM that is experiencing very high CPU or RAM utilization and potentially
is unable to meet the needs of the application its running, and is likely a candidate to
increase the resources provided.
Over-provisioned. A VM that is significantly under utilizing the resources provided
to it, and is likely a candidate to reduce in size.
Inactive. A VM that hasn’t powered on recently (dead) or is virtually idle (zombie)
that is a candidate for reclamation.

In addition, custom alert policies can be created that match VMs by conditions

4.8.2. UPGRADE METHODOLOGY


This section describes the design decisions associated with upgrading the Nutanix
infrastructure. Whether to perform upgrades during business hours or outside business
hours comes down to a number of possible factors, including:

• Is the environment sized to tolerate failures during upgrades?


• Will performance be adequate during upgrades?
• Past experiences.
• Size of Nutanix cluster.

OPS-003 PERFORM UPDATES AFTER HOURS FOR PERFOR-


MANCE OR MIGRATION SENSITIVE APPLICATIONS

Impact Potential staffing impact.

Justification Reduces load on the system to be handled during


rolling CVM reboot from code upgrade.

When upgrading, AOS offers the choice of release trains to apply. Each of these is
denoted with a major.minor.maintenance.patch numbering scheme, for example:
5.10.7.1 or 5.11.1.1.

A release train is based on the major and minor components. There are two types
of release trains:

• Short Term Support (STS) releases which include new features and provide a regular
and frequent upgrade path. These releases are maintained for shorter durations.

125
• Long Term Support (LTS) releases which provide bug fixes for features that have been
generally available for a longer period of time. After features have been generally
available in an STS for some time, they are included in and LTS, which are maintained for
a longer duration.

Knowledge base article 5505 on the Nutanix Support Portal covers the differences
in greater detail.

OPS-004 UTILIZE THE CURRENT LTS BRANCH

Impact Lack of access to features present in or requiring


new branch.

Justification Unless a new feature is required for a design, enterprise


customers typically prefer a less frequent major version
change of software components.

Updates to an existing train are released on a regular basis and should be applied
on a standard cadence.

OPS-005 UPDATE TO THE NEXT MAINTENANCE VERSION


4 WEEKS AFTER RELEASE.
UPDATE TO THE CURRENT PATCH VERSION
2 WEEKS AFTER RELEASE.

Impact Frequent smaller updates keep the amount of changes


per update relatively small, minimizing the amount to
troubleshoot in the event something goes wrong.

Justification Unless a new feature is required for a design, enterprise


customers typically prefer a less frequent major version
change of software components.

4.8.3. TESTING
Environments which seek to achieve high overall uptime greater than 99.9% should
build a pre-production environment to mimic production so that configuration changes
can be tested before being pushed into production.

126
OPS-006 MAINTAIN A PRE-PRODUCTION ENVIRONMENT
FOR TESTING ANY CHANGES NEEDED (FIRMWARE,
SOFTWARE, HARDWARE) PRIOR TO EXECUTING
THE CHANGE IN PRODUCTION.

Impact A pre-production environment will increase the overall


cost of the solution.

Justification Environments typically have stringent uptime due to


associated financial penalties or losses that occur from
unavailable services. The cost of an outage with a
realistic duration often outweighs the pre-production
environment cost.

4.8.4. MONITORING
Nutanix includes a variety of built-in, system-level monitoring functions. The relevant
metrics for built-in monitoring are automatically gathered and stored without user
intervention required.

Native cluster alerts can be sent from either the individual cluster or Prism Central.

OPS-007 CONFIGURE ALERTS AND ALERT POLICIES IN


PRISM CENTRAL, NOT PRISM ELEMENT.

Impact Requires Prism Central to receive alerts.

Justification Creates consistency across multiple clusters and


reduces effort when making multiple changes.
Also allows for anomaly detection.

When alerts are generated, in addition to raising the alert in Prism, the system can
generate an outbound message. The two options available for sending alerts are
SNMP and SMTP.

OPS-008 UTILIZE SMTP FOR ALERT TRANSMISSION

Impact Requires access to SMTP systems.

Justification Creates consistency across multiple clusters and


reduces effort to make multiple changes. Also allows
for anomaly detection. Offers more options to
customize delivery.

127
The following screen shows how to configure Prism to send alerts to a specified email
account via SMTP.

128
5. Conclusion
This document is intended to demonstrate valuable methods and practices that
organizations, both large and small, can use to implement Nutanix Solutions to solve their
IT and business problems. There is no one-size-fits-all solution for Nutanix
Hyperconverged Infrastructure, and the contents of this document are informational only
and intended to provide customers with best practices suggestions from which they can
elaborate and evolve their private, hybrid, and multi-cloud solutions.

For more information on any details of this document, please visit our website at
www.nutanix.com or reach out to our sales team. You may also contact one of our
many global support phone numbers listed on our website.

129
Appendix — Table of
Design Decisions
The following table summarizes all the design decisions described in this document.

REFERENCE SECTION DECISION NAME STATUS

VRT-001 Design Choose Nutanix AHV or Customer


Considerations VMware ESXi as the Input
hypervisor for your
deployment.

PFM-001 Design Management Cluster Customer


Considerations Architecture: Deploy a Input
separate Management
Cluster or share a cluster
with other workloads. When
choosing a separate
management cluster,
consider a redundant
configuration.

PFM-002 Design Mixed or dedicated Customer


Considerations workload per cluster. Input

PFM-003 Design Select physical node vendor. Customer


Considerations Input

PFM-004 Design Select nodel model(s) per Customer


Considerations use case. Input

130
REFERENCE SECTION DECISION NAME STATUS

PFM-005 Platform Number of clusters. Customer


Considerations Input

PFM-006 Platform A single workload domain Complete


Considerations will span multiple racks.

NET-001 Networking Use a large buffer Complete


datacenter switch at 10Gbps
or faster.

NET-002 Networking Use a leaf-spine network Complete


topology for new
environments.

NET-003 Networking Populate each rack with two Complete


10GbE or faster ToR
switches.

NET-004 Networking Avoid switch stacking to Complete


ensure network availability
during individual device
failure.

NET-005 Networking Ensure that there are no Complete


more than three switches
between any two Nutanix
nodes in the same cluster.

NET-006 Networking Reduce network Complete


oversubscription to achieve
as close to a 1:1 ratio as
possible.

NET-007 Networking Configure the CVM and Complete


hypervisor VLAN as native,
or untagged on server
facing switch ports.

NET-008 Networking Use tagged VLANs on the Complete


switch ports for all guest
workloads.

NET-009 Networking Use a Layer 2 network Complete


design.

NET-010 Networking Connect at least one 10 GbE Complete


or faster NIC to each
top-of-rack switch.

131
NET-011 Networking Use a single br0 bridge with Complete
at least two of the fastest
uplinks of the same speed.

NET-012 Networking Use VLANs to separate Complete


logical networks.

NET-013 Networking Use active-backup uplink Complete


load balancing.

NET-014 Networking Use standard 1,500 byte Complete


MTU and do not use jumbo
frames.

NET-015 Networking Use virtual distributed Complete


switch (vDS).

NET-016 Networking Connect at least one 10 GbE Complete


NIC to each top-of-rack
switch.

NET-017 Networking Use a single vSwitch0 with Complete


at least two of the fastest
uplinks of the same speed.

NET-018 Networking Use Route Based on Complete


Physical NIC Load uplink
load balancing.

NET-019 Networking Use standard 1,500 byte Complete


MTU and do not use jumbo
frames.

CMP-001 Compute If running a non NUMA Complete


aware application on a VM,
configure the VM’s memory
and vCPU to be within a
NUMA node on AHV host.

STR-01 Storage When creating vDisks in


ESXi, always use thin-
provisioned vDisks.

STR-02 Storage When sizing a hybrid cluster, Complete


make sure to have enough
usable SSD capacity to
meet active data set of
application.

STR-03 Storage Do not mix node types from Complete


different vendors in the
same cluster.

132
STR-04 Storage Do not mix nodes that Complete
contain NVMe SSDs in same
cluster with hybrid SSD/
HDD nodes.

STR-05 Storage Minimum 2:1 HDD to SSD Complete


ratio required for Hybrid
clusters.

STR-06 Storage Size for N+1 node Complete


redundancy for storage and
compute when sizing. For
mission critical workloads
that need higher SLAs, use
N+2 node redundancy.

STR-07 Storage Use FT=2 and RF=3 for Complete


workloads and clusters that
need higher SLAs or for
cluster sizes >32.

STR-08 Storage Enable Inline Compression. Complete

STR-09 Storage Enable Deduplication. Customer


Input

VRT-001 Virtualization Deploy scale-out Prism Complete


Central for enhanced cluster
management.

VRT-002 Virtualization Use VM-HA Guarantee. Complete

VRT-003 Virtualization Deploy HA vCenter instance Complete


with embedded PSC to
manage all ESXi based
Nutanix clusters.

VRT-004 Virtualization Enable EVC mode and set Complete


to the highest compatibility
level the processors in the
cluster will support.

VRT-005 Virtualization Enable HA. Complete

VRT-006 Virtualization Enable DRS with default Complete


automation level.

VRT-007 Virtualization Disable VM PDL and APD Complete


component protection.

133
VRT-008 Virtualization Configure das.ignoreInsuffi- Complete
cientHbDatastore if one
Nutanix container is pre-
sented to the ESXi hosts.

VRT-009 Virtualization Disable Automation level Complete


for all CVMs.

VRT-010 Virtualization Admission control use Complete


percentage based policy.

VRT-011 Virtualization Set host isolation response Complete


to “Power off and restart
VMs”.

VRT-012 Virtualization Set host isolation response Complete


to “Leave Powered on” for
CVMs.

VRT-013 Virtualization Disable HA/DRS for each Complete


Nutanix CVM.

VRT-014 Virtualization Disable SIOC. Complete

DEP-001 Dependent Infra A minimum of three NTP Complete


servers for all infrastructure
components should be
provided.

DEP-002 Dependent Infra A minimum of two DNS Complete


Servers should be
configured accessible to all
infrastructure layers.

SEC-001 Security Use active directory Complete


authentication. This applies
for user and service
accounts.

SEC-002 Security Use SSL/TLS connection Complete


to Active Directory.

SEC-003 Security Use signed Certificate Complete


Authority (CA) certificates
for the components where
certificates can be replaced.
This can be either internal or
external signed certificates.

SEC-004 Security Do not use Nutanix Cluster Complete


lockdown.

134
SEC-005 Security Do not use vSphere Cluster Complete
lockdown.

SEC-006 Security Enable CVM and hypervisor Complete


AIDE.

SEC-007 Security Configure SCMA to run Complete


hourly.

SEC-008 Security Stop unused ESXi services Complete


and close unused firewall
ports.

SEC-009 Security Send log files to a highly Complete


available syslog
infrastructure.

SEC-010 Security Include all Nutanix modules Complete


in the logging.

SEC-011 Security Use Error log level for the Complete


Nutanix components.

SEC-012 Security Use Error log level for the Complete


Nutanix components.

SEC-013 Security Use default ESXi logging Complete


level, log rotation and log
file sizes.

SEC-014 Security Use secure and reliable log Complete


transport by taking
advantage of the TCP
protocol instead of default
syslog protocol UDP.

SEC-015 Security Use port 514 for logging. Complete

SEC-016 Security Use VLAN for traffic Complete


separation of management
and user workloads.

SEC-017 Security Place CVM and hypervisor Complete


on the same VLAN and
subnet.

SEC-018 Security Place out of band Complete


management on separate
VLAN or physical network.

135
SEC-019 Security Use the least privileged Complete
access approach when
providing access. and Align
RBAC structure and usage
of default plus customer
roles according to the
company requirement.

SEC-020 Security Align RBAC structure and Complete


usage of default plus
custom roles according to
the company requirements
defined via SEC-019.

SEC-021 Security Do not use storage Complete


encryption.

SEC-022 Security Do not use a key Complete


management server.

BCN-001 Business Continuity Place NearSync VMs in their Complete


own Protection Domain.

BCN-002 Business Continuity Configure snapshot Complete


schedules to retain the
lowest number of snapshots
while still meeting the
retention policy.

OPS-001 Operations Deploy Prism Central Pro for Complete


enhanced cluster
management.

OPS-002 Operations Review monthly capacity Complete


planning.

OPS-003 Operations Perform updates after hours Complete


for performance or
migration sensitive
applications.

OPS-004 Operations Utilize the current LTS Complete


branch.

OPS-005 Operations Update to the next Complete


maintenance version 4
weeks after release.
Update to the current patch
version 2 weeks after
release.

136
OPS-006 Operations Maintain a pre-production Complete
environment for testing any
changes needed (firmware,
software, hardware) prior to
executing the change actual
production.

OPS-007 Operations Configure alerts and alert Complete


policies in prism central,
not prism element.

OPS-008 Operations Utilize SMTP for alert Complete


transmission.

137
Nutanix makes infrastructure invisible, elevating IT to focus on
the applications and services that power their business. The
Nutanix Enterprise Cloud OS leverages web-scale engineering
and consumer-grade design to natively converge compute,
virtualization, and storage into a resilient, software-defined
solution with rich machine intelligence. The result is predictable
performance, cloud-like infrastructure consumption, robust
security, and seamless application mobility for a broad range
of enterprise applications. Learn more at www.nutanix.com
or follow us on Twitter @nutanix.

[email protected] | www.nutanix.com | @nutanix

You might also like