0% found this document useful (0 votes)

1 views9 pages

Solution Architecture

The document outlines the architecture and design considerations for a high-availability AI solution using Nutanix and Mellanox technologies. Key requirements include availability, recoverability, manageability, performance, security, and network architecture, with specific configurations and features detailed for Nutanix and Mellanox components. It emphasizes the importance of replication factors, block awareness, and load balancing to enhance system resilience and performance.

Uploaded by

Bhushan Rane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views9 pages

Solution Architecture

Uploaded by

Bhushan Rane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

phases.

When you understand the core principles of the methods, you can better evaluate when one may be more suitable than the next.

he term agile has become a stand-in for any workflow that improves efficiency and transparency.

ttributes of DevOps.

isplay of task cards and statuses, with a focus on moving a backlog of tasks through to completion.

pping, and statistical process control (SPC).

e management (ITSM).
An architecture model is a partial abstraction of a system. It is an approximation, and it captures the different properties of the syste
Type your text
Solution Architecture
When architecting this solution, we took into consideration the following AI-specific
requirements:

Availability: The solution must have high availability and maintain availability during upgrades
and failure scenarios.

Recoverability: The solution must have a strategy for recovering AI workloads and restoring
data in case of a disaster, while also minimizing recovery point objectives (RPOs) and
recovery time objectives (RTOs).

Manageability: The solution must reduce administrative effort for day-one and day-two
operations.

Performance and scalability: The solution must increase resilience, performance, and
capacity by scaling without impacting performance.

Security: The solution must implement security policies across the full stack.

Storage: The solution must reduce administrative effort and maximize storage performance
regardless of the application workload.

Network: The solution must provide a network architecture that maximizes bandwidth and
decreases latency.

Availability
In this section, we discuss the common failure scenarios in the solution as well as how the
various hardware and software components increase infrastructure availability.

Table. Summary of Availability Design

Configuration Item Parameter

1 per host (4 total)

Nutanix CVM design
12 vCPUs, 64 GB of RAM

3 VMs (initial configuration)

Nutanix file server VM
design
4 vCPUs, 12 GB of RAM

Nutanix file server export

NFS v4
protocol
Configuration Item Parameter

Nutanix file server export

Sharded directories
type

Nutanix file server DNS Records automatically created in AD during provisioning

settings (round robin)

Cluster redundancy factor 2

Cluster high availability

Enabled
reservation

Cluster virtual IP address Set

Cluster iSCSI data services

Set
IP

Nutanix Availability Features

Nutanix can operate as either a single node or a cluster of nodes in which three or more
nodes share resources and distributed data, which increases application and storage
availability. As represented in the following figure, as AOS ingests data from the NVIDIA DGX-
1 system or application, it creates a local copy on the home node and distributes a secondary
copy to another node in the cluster. Then the system sends an acknowledgment back to the
ingesting application that the write operation is complete. Consequently, as the application
writes data, the system always stores a secondary copy stored on another node. This
process is the replication factor, which by default is set to 2. An administrator can set the
replication factor to 3, which requires a minimum of five Nutanix nodes but dramatically
increases availability.

Note: We selected replication factor 2 because it provides an acceptable level of availability for this architecture.
However, if customers are evaluating larger clusters, it may be useful to increase the replication factor to 3.

Figure. Nutanix Data Availability

Click to enlarge
Block Awareness

A block is a rack-mountable enclosure that contains one to four Nutanix nodes. In multinode
blocks, the power supplies and the fans are the only components shared by nodes in a block.
When certain conditions are met, Nutanix cloud clusters are block aware, which means that
redundant copies of any data needed to serve I/O are placed on nodes that aren’t in the
same block, which maximizes the solution’s availability. When you scale your AI infrastructure,
it’s important to note that block awareness is applied automatically when all the following
conditions are met:

The cluster is three or more blocks (unless the cluster was created with replication factor 3, in
which case the cluster is five or more blocks).

Every storage tier in the cluster contains at least one drive on each block.

Every container in the cluster has a replication factor of at least 2.

The storage tiers on each block in the cluster are of comparable size.
The size of cluster SSD tiers with replication factor 2 isn’t more than 33 percent different
across blocks.

The size of cluster SSD tiers with replication factor 3 isn’t more than 25 percent different
across blocks.

Figure. Nutanix Data Availability with Block Awareness

Click to enlarge

Tip: When you scale your AI infrastructure with Nutanix blocks, consider physically distributing blocks across racks to
further increase availability.

Node Availability

Because all Nutanix cloud clusters have at least three nodes, we used an additional node in
our solution to meet the requirement for n + 1 availability. This extra node allows us to handle
planned or unplanned events without operating in a degraded state. When you enable
Cluster High Availability in Prism, the system maintains this level of availability automatically.

Controller VM

Nutanix is a 100 percent software-defined solution that places a software storage controller
(the CVM) on each node in the cluster. The storage controller actively accepts I/O from
applications running locally on that node and participates in cluster-wide operations such as
replicating data, self-healing, and rebalancing data.

Nutanix Files Load Balancing

As a complement to the CVM, Nutanix Files also serves NFS and SMB requests from clients
and internal and external systems. Nutanix Files use the CVM for reads and writes to
distributed storage, providing resilience (replication factor 2), data integrity, and scalability, as
detailed in the Nutanix Data Availability figure. Nutanix Files doesn’t need to be present on
every node; rather, it starts off with a minimum of three file server VMs (FSVMs) and
automatically scales out when needed. The following figure shows a high-level
representation of the relationship between the Nutanix CVM and FSVMs—specifically the
distribution of NFS exports and directories across multiple FSVMs.
Figure. High-Level Nutanix Files Architecture
Click to enlarge

Refer to the Nutanix Files tech note and the Nutanix Volumes best practices guide for more
detailed information on load balancing. For AI architects, a DGX system is equivalent to the
NFS client.

Note: In our testing, we implemented Nutanix Files with the Sharded Directory option and configured an iSCSI data
services IP.

Mellanox Availability Features

Mellanox SN2100 series switches are designed for high availability from both a software and
hardware perspective. Key high availability features include:

Color-coded PSUs and fans.

Up to 64x 10 or 25 GbE ports, 32x 50 GbE ports, or 16x 100 GbE ports.

MLAG for active-active L2 multipathing.

64-way equal-cost multipath (ECMP) routing for load balancing and redundancy.

1 + 1 power supplies.

The following table provides a summary of how Mellanox SN2100 switches maintain
availability during certain network failures.

Table. Network Failures Summary

Event Detection Action Effect on Network

Subordinate role
changed to
Three continuous standalone.
keepalives were No traffic loss.
lost and the leader
Leader down
isn’t visible on the Flush all MLAG
management
MACs.
network.

Flush all IPL MACs.

Standalone role
IPL up and received
Leader up changed to No traffic loss.
leader keepalive.
subordinate.

Three continuous Flush any MACs the

keepalives are lost subordinate has
and the subordinate learned.
Subordinate down No traffic loss.
isn’t visible on the
management
network. Flush all IPL MACs.

IPL up and received

Sync subordinate
Subordinate up subordinate No traffic loss.
with leader tables.
keepalive.

Manual TK-Strike Truescore 2014
50% (2)
Manual TK-Strike Truescore 2014
27 pages
BP Nutanix Physical Networking
100% (1)
BP Nutanix Physical Networking
24 pages
Nut A Nix First Call Presentation
No ratings yet
Nut A Nix First Call Presentation
19 pages
PSS PPT System Part
No ratings yet
PSS PPT System Part
30 pages
Nutanix Files
No ratings yet
Nutanix Files
64 pages
nutanix-2024-07-30.09-54-21
No ratings yet
nutanix-2024-07-30.09-54-21
13 pages
The Definitive Guide To Hyperconverged Infrastructure: How Nutanix Works
100% (1)
The Definitive Guide To Hyperconverged Infrastructure: How Nutanix Works
26 pages
Silo - Tips - Nutanix Tech Note Configuration Best Practices For Nutanix Storage With Vmware Vsphere
No ratings yet
Silo - Tips - Nutanix Tech Note Configuration Best Practices For Nutanix Storage With Vmware Vsphere
12 pages
Nutanix Performance
No ratings yet
Nutanix Performance
17 pages
Nutanix-Files 1598838521771
No ratings yet
Nutanix-Files 1598838521771
47 pages
AHV Networking: Nutanix Best Practices
No ratings yet
AHV Networking: Nutanix Best Practices
50 pages
Nutanix TechNote-VMware VSphere Networking With Nutanix
No ratings yet
Nutanix TechNote-VMware VSphere Networking With Nutanix
35 pages
Nutanix Book
No ratings yet
Nutanix Book
282 pages
Nutanix
No ratings yet
Nutanix
162 pages
TN 2041 Nutanix Files
No ratings yet
TN 2041 Nutanix Files
66 pages
Nutanix NCP Exam Topics
No ratings yet
Nutanix NCP Exam Topics
2 pages
Nutanix Virtual Computing Platform: Data Sheet
No ratings yet
Nutanix Virtual Computing Platform: Data Sheet
2 pages
Nutanix Datasheet PDF
No ratings yet
Nutanix Datasheet PDF
2 pages
The Definitive Guide To Hyperconverged Infrastructure: How Nutanix Works
No ratings yet
The Definitive Guide To Hyperconverged Infrastructure: How Nutanix Works
26 pages
How Nutanix Works 2018 Part1
No ratings yet
How Nutanix Works 2018 Part1
6 pages
Ebook Hyperconverged Infrastructure
No ratings yet
Ebook Hyperconverged Infrastructure
26 pages
BPG Cisco ACI
No ratings yet
BPG Cisco ACI
38 pages
Vmware Vsphere Networking: Nutanix Best Practices Version 2.2 - June 2020 - Bp-2074
No ratings yet
Vmware Vsphere Networking: Nutanix Best Practices Version 2.2 - June 2020 - Bp-2074
33 pages
Corebootcamp 1o4 Hci Rebranded
No ratings yet
Corebootcamp 1o4 Hci Rebranded
61 pages
Bom Check
No ratings yet
Bom Check
10 pages
HCI Presentation
No ratings yet
HCI Presentation
16 pages
Nutanix Files Ds
No ratings yet
Nutanix Files Ds
3 pages
Nutanix Hybrid Cloud
No ratings yet
Nutanix Hybrid Cloud
56 pages
Nutanix EUC Customer Presentation Deck
No ratings yet
Nutanix EUC Customer Presentation Deck
26 pages
Nutanix Con Clientes Colombia
No ratings yet
Nutanix Con Clientes Colombia
44 pages
BP-2009 Metro Availability
No ratings yet
BP-2009 Metro Availability
50 pages
WP Idc Report Nutanix Files Business Value
No ratings yet
WP Idc Report Nutanix Files Business Value
16 pages
BP 2071 AHV Networking
No ratings yet
BP 2071 AHV Networking
48 pages
Nutanix NCSA Core
No ratings yet
Nutanix NCSA Core
22 pages
WP Nutanix Complete Cluster Technical Whitepaper
No ratings yet
WP Nutanix Complete Cluster Technical Whitepaper
15 pages
RA 2114 Nutanix Enterprise Cloud For AI
No ratings yet
RA 2114 Nutanix Enterprise Cloud For AI
66 pages
BP 2071 AHV Networking PDF
No ratings yet
BP 2071 AHV Networking PDF
50 pages
TN 2063 Mellanox Networking
No ratings yet
TN 2063 Mellanox Networking
37 pages
Nutanix Advantage Vs VMware VSAN Competitive Brief
No ratings yet
Nutanix Advantage Vs VMware VSAN Competitive Brief
7 pages
VMware View Nutanix Reference Architecture
No ratings yet
VMware View Nutanix Reference Architecture
36 pages
Ds Private Cloud Storage Consolidation Design Guide
No ratings yet
Ds Private Cloud Storage Consolidation Design Guide
12 pages
Nutanix Clasisc
No ratings yet
Nutanix Clasisc
268 pages
ds-nus
No ratings yet
ds-nus
3 pages
TN-2117-Nutanix-Files-Performance
No ratings yet
TN-2117-Nutanix-Files-Performance
34 pages
Next-Generation switching OS configuration and management: Troubleshooting NX-OS in Enterprise Environments
From Everand
Next-Generation switching OS configuration and management: Troubleshooting NX-OS in Enterprise Environments
Mamta Devi
No ratings yet
Ckwlytpeqvgclpac
No ratings yet
Ckwlytpeqvgclpac
4 pages
NCP-MCI Course
No ratings yet
NCP-MCI Course
62 pages
NCP-MCI 5.15 Study Guide
No ratings yet
NCP-MCI 5.15 Study Guide
36 pages
775081405 Nutanix Interview Prep Compressed
No ratings yet
775081405 Nutanix Interview Prep Compressed
50 pages
TN-2041-Nutanix-Files
No ratings yet
TN-2041-Nutanix-Files
66 pages
BP Nutanix Volumes
No ratings yet
BP Nutanix Volumes
49 pages
Cisco ACI + Nutanix Integration - Best Practices PDF
No ratings yet
Cisco ACI + Nutanix Integration - Best Practices PDF
34 pages
Cisco ACI With Nutanix
No ratings yet
Cisco ACI With Nutanix
34 pages
BP 2009 Metro Availability PDF
No ratings yet
BP 2009 Metro Availability PDF
66 pages
PDF Why Nutanix For Enterprise Workloads
No ratings yet
PDF Why Nutanix For Enterprise Workloads
44 pages
Nutanix Datasheet Standard 1-24-13
No ratings yet
Nutanix Datasheet Standard 1-24-13
2 pages
Euc Customer Deck v2
No ratings yet
Euc Customer Deck v2
26 pages
Nutanix Acropolis: Foundation For Enterprise Cloud
No ratings yet
Nutanix Acropolis: Foundation For Enterprise Cloud
2 pages
003 Technical Deep Dive HX
No ratings yet
003 Technical Deep Dive HX
67 pages
Alpine Linux Administration: Definitive Reference for Developers and Engineers
From Everand
Alpine Linux Administration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
LEMP Architecture and Administration: Definitive Reference for Developers and Engineers
From Everand
LEMP Architecture and Administration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Research methodology report on impact of social media on adolescent mental health
No ratings yet
Research methodology report on impact of social media on adolescent mental health
4 pages
Bash Poster Exi
No ratings yet
Bash Poster Exi
1 page
Vibr Prob Sheet1 May25
No ratings yet
Vibr Prob Sheet1 May25
2 pages
Brr Additive Manufacturing
No ratings yet
Brr Additive Manufacturing
29 pages
Group 55 Mech Report
No ratings yet
Group 55 Mech Report
18 pages
Internship Project Report
No ratings yet
Internship Project Report
17 pages
Mm2407 Brr Rm Report
No ratings yet
Mm2407 Brr Rm Report
46 pages
ChemicalScience
No ratings yet
ChemicalScience
5 pages
Price Line Alert Indicator for MT4_MT5
No ratings yet
Price Line Alert Indicator for MT4_MT5
10 pages
Alphatec Astreo 300 Plus en V1.0
No ratings yet
Alphatec Astreo 300 Plus en V1.0
2 pages
A STD 61 Part1 Issue1
No ratings yet
A STD 61 Part1 Issue1
61 pages
Digital Jewellery: Presented by Girish Achanta 109-00-4281
No ratings yet
Digital Jewellery: Presented by Girish Achanta 109-00-4281
20 pages
Run-Length Encoding
No ratings yet
Run-Length Encoding
3 pages
5 Unique IOT Business Opportunities in Philippines
No ratings yet
5 Unique IOT Business Opportunities in Philippines
10 pages
03 - LTE-EPS Mobility & Session Management
100% (1)
03 - LTE-EPS Mobility & Session Management
0 pages
Manual Trilithic 180 Dps PDF
No ratings yet
Manual Trilithic 180 Dps PDF
386 pages
Different Account Types Offered in SAP BTP
No ratings yet
Different Account Types Offered in SAP BTP
3 pages
An Introduction To Healthcare Data Analytics
No ratings yet
An Introduction To Healthcare Data Analytics
18 pages
Embedded Processors - PRELIM - QP - 2017-18
No ratings yet
Embedded Processors - PRELIM - QP - 2017-18
2 pages
CCNA 200-301 Practice Exam
100% (1)
CCNA 200-301 Practice Exam
52 pages
Resume Utkarsh Patil
No ratings yet
Resume Utkarsh Patil
1 page
Cisco 350-401: Implementing Cisco Enterprise Network Core Technologies
No ratings yet
Cisco 350-401: Implementing Cisco Enterprise Network Core Technologies
4 pages
Skse64 Whatsnew
No ratings yet
Skse64 Whatsnew
10 pages
Spinner ComboBox DropDown List Android Example Code
No ratings yet
Spinner ComboBox DropDown List Android Example Code
6 pages
EmpGrad 2024
No ratings yet
EmpGrad 2024
9 pages
Sign Language To Text Converter
No ratings yet
Sign Language To Text Converter
16 pages
Pertemuan 9 Konsep Sistem - 2
No ratings yet
Pertemuan 9 Konsep Sistem - 2
28 pages
Difference Between MOLAP, ROLAP and HOLAP in SSAS
No ratings yet
Difference Between MOLAP, ROLAP and HOLAP in SSAS
3 pages
Internet of Things
No ratings yet
Internet of Things
17 pages
Client Side Session Handling For Angular
No ratings yet
Client Side Session Handling For Angular
8 pages
Webtop Software: Transforming The Desktop With Personalized Web-Based Applications
No ratings yet
Webtop Software: Transforming The Desktop With Personalized Web-Based Applications
2 pages
System Software Unit-II
90% (10)
System Software Unit-II
21 pages
Nslookup
No ratings yet
Nslookup
94 pages
Introducing PDO
No ratings yet
Introducing PDO
29 pages
middleware
No ratings yet
middleware
5 pages
SeeGull LX User Manual Rev K PDF
No ratings yet
SeeGull LX User Manual Rev K PDF
26 pages
SANS_Richard-Greene_Revolutionizing-Enterprise-Security-The-Exciting-Future-Passkeys-Beyond-Passw-1
No ratings yet
SANS_Richard-Greene_Revolutionizing-Enterprise-Security-The-Exciting-Future-Passkeys-Beyond-Passw-1
26 pages

Solution Architecture

Uploaded by

Solution Architecture

Uploaded by

phases.

pping, and statistical process control (SPC).

Table. Summary of Availability Design

1 per host (4 total)

3 VMs (initial configuration)

Nutanix file server export

Nutanix file server export

Nutanix file server DNS Records automatically created in AD during provisioning

Cluster redundancy factor 2

Cluster high availability

Cluster virtual IP address Set

Cluster iSCSI data services

Nutanix Availability Features

Figure. Nutanix Data Availability

Every container in the cluster has a replication factor of at least 2.

Figure. Nutanix Data Availability with Block Awareness

Nutanix Files Load Balancing

Mellanox Availability Features

Color-coded PSUs and fans.

MLAG for active-active L2 multipathing.

Table. Network Failures Summary

Flush all IPL MACs.

Three continuous Flush any MACs the

IPL up and received

You might also like