Ai Infrast Pods Inferencing Ds
Ai Infrast Pods Inferencing Ds
Cisco public
Product specifications 7
System requirements 9
Ordering information 10
Warranty information 12
Product sustainability 13
Cisco and partner services 13
Cisco and partner services for AI-ready Infrastructure 13
Cisco Capital 13
For more information 14
Unlock the Future of AI with Cisco AI Pods 14
Product overview
Cisco's AI-ready Infrastructure Stacks provide a comprehensive suite of solutions tailored for various AI and ML
use cases. These validated designs simplify and automate AI infrastructure, enabling organizations to deploy AI
models efficiently across different environments—from edge inferencing to large-scale data centers. Cisco’s
infrastructure stacks cater to a range of requirements:
● MLOps with Red Hat OpenShift AI to streamline machine learning operations and model deployment.
● Gen AI Modeling utilizing Intel Gaudi 2 processors and Nexus 9000, optimized for high-performance AI
tasks.
● Data Modeling platforms powered by Cisco’s robust networking and compute capabilities.
● Digital Twin frameworks, supported by NVIDIA AI Enterprise and integrated data systems.
Each solution is built on Cisco’s validated designs, ensuring seamless integration and reliable performance for
AI applications. Organizations can choose from converged infrastructure like FlashStack and FlexPod or extend
their deployments with hyperconverged solutions like Nutanix.
Customers across different segments, from small businesses to large enterprises, will find immense value in
Cisco AI Infrastructure. Small businesses can utilize cost-effective pods for development and basic workloads,
while larger enterprises can harness advanced pods for intensive AI and data processing tasks. The primary
business benefits include improved productivity, reduced costs, and enhanced scalability, while the primary
technical value lies in the pods' ability to deliver high performance and reliability, ensuring seamless integration
and operation within existing network infrastructures.
Feature Benefit
Scalability on Demand Dynamically scale resources to meet evolving AI/ML workloads with configurable compute,
memory, and accelerators tailored to deployment sizes.
Future Ready Modular, future-proof system supporting next-generation processors, GPUs, and storage
solutions, ensuring long-term investment protection.
Customization Tailor configurations to specific AI models and workloads for optimal performance and
efficiency across edge, medium, large, and scale-out deployments.
Comprehensive Platform Integrated with Cisco Intersight and Nexus Dashboard, the AI PODs simplify infrastructure
management with centralized provisioning, real-time visibility, and automated scaling.
Simplified Management Streamline management with Cisco Intersight, offering centralized provisioning, real-time
visibility, and automation for Cisco UCS servers and GPUs, reducing operational
complexity.
Flexible Configurations Choose from multiple AI POD sizes, from small edge inferencing setups to large, multi-
model deployments, to meet specific workload requirements.
Prominent feature
Transforming AI Architecture Design with Cisco’s Full Stack Solution
This solution combines the best of hardware and software technologies to create a robust, scalable, and
efficient AI-ready infrastructure tailored to diverse needs.
1. NVIDIA AI Enterprise and NVIDIA Infrastructure Management Suite (NIMS): These components
provide a powerful foundation for managing AI workloads. They offer optimized performance,
streamlined operations, and advanced management capabilities, enabling users to focus on
developing and deploying AI models without worrying about underlying infrastructure complexities.
2. OpenShift by Red Hat: This Kubernetes-based container platform simplifies the orchestration and
deployment of containerized AI applications. It provides flexibility and scalability, allowing users to
easily manage and scale their AI projects as they grow.
3. Cisco Unified Computing System (UCS): Cisco UCS serves as the hardware backbone, delivering
high performance and reliability with its multiple server configurations. This ensures that the
compute power required for demanding AI workloads is readily available and scalable.
◦ FlashStack (Cisco UCS servers with Pure Storage) and FlexPod (Cisco UCS servers with NetApp
storage) both provide high-performance storage and compute capabilities essential for handling
large-scale AI and machine learning tasks. These solutions offer integrated, pre-validated
configurations that simplify deployment and management, enhancing operational efficiency and
reducing time to market.
By leveraging Cisco's full stack solution, customers benefit from:
● Easy to Buy:
◦ Cisco AI PODs are available as pre-configured, validated bundles that are ready to order, ensuring
that customers can quickly select the right setup for their specific AI needs. This eliminates the
complexity of choosing individual components and reduces decision-making time.
◦ By offering orderable bundles this quarter, Cisco enables faster time-to-value for customers, allowing
them to accelerate their AI initiatives without unnecessary delays.
● Easy to Deploy and Integrate:
◦ Cisco AI PODs are designed for plug-and-play deployment, ensuring that organizations can rapidly
integrate them into existing data center or cloud environments with minimal effort. This seamless
deployment capability is supported by Cisco’s Intersight and Nexus Dashboard, providing centralized
management, real-time visibility, and automation to reduce operational complexity.
◦ The pre-configured bundles are pre-tested and validated, ensuring that all components work
harmoniously together right out of the box. This reduces the risk of deployment issues and speeds up
time-to-production for AI workloads.
◦ The pre-integrated and pre-tested nature of the solution reduces the complexity of designing AI
infrastructure. It provides a clear, replicable formula for optimal performance, eliminating guesswork.
◦ The solution supports a range of deployment sizes from small edge inferencing setups to large, high-
performance environments. This scalability ensures that the infrastructure can grow with the
organization’s needs.
◦ Integrated management tools from NVIDIA and Red Hat simplify the day-to-day operations of AI
infrastructure, making it easier for IT teams to support AI projects.
For individuals and organizations at the ideation stage, Cisco's full stack solution provides a clear, structured
path to developing a robust AI architecture. It combines the latest in AI management, container orchestration,
and high-performance hardware, making it an ideal choice for anyone looking to build and scale their AI
capabilities.
Scalability on Demand
Requirements change often, and you need a system that doesn’t lock you into one set of resources when you
find that you need another. Cisco AI-ready Infrastructure Stacks are by nature growth ready and seamlessly
scales with your Generative AI inferencing needs because of the modular nature of the Cisco UCS X-Series. It’s
as simple as adding or removing servers, adjusting memory capacities, and configuring resources in an
automated manner as your models evolve and workloads grow using Cisco Intersight ®. Additionally, you even
have the flexibility to vary the CPU to GPU ratio or choose between Intel Xeon Scalable or AMD EPYC
processors within a single chassis, depending on your specific use case.
● Dynamic Resource Allocation: Easily add or reduce compute and storage resources.
● Flexible CPU/GPU Ratios: Customize the balance of compute power.
● Automated Scaling: Adjust resources automatically as workloads grow.
Future Ready
The Cisco AI-ready Infrastructure Stacks are built to be future ready, evolving with new technologies and
adapting to emerging business objectives. The UCS X-Series modular system is designed to support future
generations of processors, storage, nonvolatile memory, accelerators, and interconnects. It can be selectively
upgraded piece by piece without the need to purchase, configure, maintain, power, and cool discrete
management modules and servers. Cloud-based management is kept up to date automatically with a constant
stream of new capabilities delivered by the Intersight software-as-a-service model. The resultant extended
lifecycles for hardware will lower ongoing costs over time.
Comprehensive Platform
Operationalizing AI is a daunting task for any Enterprise. One of the biggest reasons AI/ML efforts fail to move
from proof-of-concept to production is due to the complexity associated with streamlining and managing data
and machine learning that deliver production-ready models. Instead of ad-hoc ML efforts that add technical
debt with each AI/ML effort, it is important to adopt processes, tools, and best-practices that can continually
deliver and maintain models with speed and accuracy. Cisco AI-ready Infrastructure Stacks provide a purpose-
built, full stack solution that accelerates AI/ML efforts by reducing complexity. The design uses the Cisco UCS
X-Series modular platform with the latest Cisco UCS M7 servers, Cisco UCS X440p PCIe nodes with NVIDIA
GPUs, all centrally managed from the cloud using Cisco Intersight. Building on this accelerated and high-
performance infrastructure, Red Hat OpenShift AI enhances the solution by integrating essential tools and
technologies to accelerate and operationalize AI consistently and efficiently. NVIDIA AI Enterprise software
further complements the solution, offering key capabilities such as virtual GPUs, a GPU operator, and a
comprehensive library of tools and frameworks optimized for AI. These features simplify the adoption and
scaling of AI solutions, making it easier for enterprises to operationalize AI technologies.
Simplified Management
Optimize your infrastructure management with Cisco Intersight. Our platform offers centralized provisioning,
real-time visibility, and automation, simplifying the management of UCS servers and GPUs. This streamlines
operations and reduces manual intervention, enhancing efficiency and reducing complexity.
Table 2. Specifications for Cisco AI Data Center and Edge Inference Pod
Item Specification
Processors Dual Intel 5th Generation 6548Y+ processors for each compute node
Memory 8x 64 GB DDR5 at 4800 MT/s for a total of 512 GB of memory per compute node, or
512GB in total
Internal Storage 2x 1.6 TB Non-volatile Memory Express (NVMe) 2.5-inch drives per compute node
PCIe Node 1x Cisco UCS X440p PCIe Node per compute node
Management ● Cisco Intersight software (SaaS, virtual appliance, and private virtual appliance)
● Cisco UCS Manager (UCSM) 4.3(2) or later
The Cisco AI RAG Augmented Inference Pod is purpose-built for demanding AI workloads, supporting larger
models such as Llama 2-13B and OPT 13B. Equipped with enhanced GPU and node capacity, it efficiently
manages complex tasks, delivering increased computational power and efficiency. This advanced pod can
accommodate optimizations like Retrieval-Augmented Generation (RAG), which leverages knowledge sources
to provide contextual relevance during query service, significantly reducing hallucinations and improving
response accuracy. This makes the Cisco RAG Augmented Inference Pod an ideal choice for advanced
applications requiring robust AI capabilities and exceptional performance.
Item Specification
Processors Dual Intel 5th Generation 6548Y+ processors for each compute node, or 4 CPUs in total
Memory 16x 64 GB DDR5 (4 DIMMS per CPU) at 4800 MT/s for a total of 512 GB of memory
per compute node, or 1TB in total
Internal Storage 2x 1.6 TB Non-volatile Memory Express (NVMe) 2.5-inch drives per compute node
mLOM 1x Cisco UCS VIC 15230 2x100G mLOM per compute node
PCIe Node 1x Cisco UCS X440p PCIe Node per compute node
GPU 2x Nvidia L40S GPU (dual slot) per compute node, or 4 GPUs in total
Management ● Cisco Intersight software (SaaS, virtual appliance, and private virtual appliance)
The Cisco AI Scale Up Inference Pod for High Performance is meticulously optimized to support large-scale
models like CodeLlama 34B and Falcon 40B. With its substantial GPU and node capacity, this pod is engineered
to deliver exceptional performance for the most complex AI tasks. It significantly enhances Retrieval-
Augmented Generation (RAG) capabilities by supporting larger vector databases and improving vector search
accuracy through advanced algorithms.
Table 4. Specifications for Cisco AI Scale Up Inference Pod for High Performance
Item Specification
Processors Dual Intel 5th Generation 6548Y+ processors for each compute node, or 4 CPUs in total
Memory 16x 64 GB DDR5 (4 DIMMS per CPU) at 4800 MT/s for a total of 512 GB of memory
per compute node, or 1TB in total
Internal Storage 2x 1.6 TB Non-volatile Memory Express (NVMe) 2.5-inch drives per compute node
mLOM 1x Cisco UCS VIC 15230 2x100G mLOM per compute node
PCIe Node 1x Cisco UCS X440p PCIe Node per compute node
GPU 2x Nvidia H100 GPU (dual slot) per compute node, or 4 GPUs in total
Management ● Cisco Intersight software (SaaS, virtual appliance, and private virtual appliance)
● Cisco UCS Manager (UCSM) 4.3(2) or later
Table 5. Specifications for Cisco AI Scale Out Inference Pod for Large Deployment
Item Specification
Processors Dual Intel 5th Generation 6548Y+ processors for each compute node, or 8 CPUs in total
Memory 64x 64 GB DDR5 (8 DIMMS per CPU) at 4800 MT/s for a total of 1 TB of memory per
compute node, or 4TB in total
Internal Storage 2x 1.6 TB Non-volatile Memory Express (NVMe) 2.5-inch drives per compute node
mLOM 1x Cisco UCS VIC 15230 2x100G mLOM per compute node
PCIe Node 1x Cisco UCS X440p PCIe Node per compute node
GPU 2x Nvidia L40S GPU (dual slot) per compute node, or 8 GPUs in total
Management ● Cisco Intersight software (SaaS, virtual appliance, and private virtual appliance)
System requirements
Table 6. System requirements
Feature Description
Disk Space Minimum 1TB SSD per compute node for AI model storage and operations
Hardware ● 1x X210C Compute Node, 1x X440p PCIe, 1x Nvidia L40s for Data Center and Edge
Inference Pod
● 2x X210C Compute Nodes, 2x X440p PCIe, 4x Nvidia L40s for RAG Augmented Inference
Pod
● 2x X210C Compute Nodes, 2x X440p PCIe, 4x Nvidia H100 for Scale Up Inference Pod
● 4x X210C Compute Nodes, 4x X440p PCIe, 8x Nvidia L40s for Scale Out Inference Pods
Memory Minimum 128GB RAM per compute node, expandable based on workload
requirements
Network High-speed network connectivity (minimum 10GbE) for data transfer between nodes
and storage
Power and Cooling Adequate power supply and cooling system to support high-performance compute
nodes and GPUs
Operating System Compatible with Red Hat OpenShift AI for MLOps and AI/ML workloads
GPU Support Nvidia GPU drivers and software (NVIDIA AI Enterprise) for optimal performance and
management
Security Secure boot, encryption, and access controls for data protection
Integration Seamless integration with existing data center infrastructure and cloud services
Fabric Interconnect Cisco UCS 6454, 64108, and 6536 fabric interconnects
Cisco Intersight Intersight Managed Mode (minimum Essentials license per server)
Ordering information
Table 7. Ordering Guide for Cisco AI Data Center and Edge Inference Pod
Item Part #
Chassis UCSX-9508-ND-U
Processors UCSX-CPU-I6548Y+
Memory UCSX-MRX64G2RE1
mLOM UCSX-S9108-100G
GPU UCSX-GPU-L40S
Fabric Interconnect
Item Part #
Chassis UCSX-9508-ND-U
Processors UCSX-CPU-I6548Y+
Memory UCSX-MRX64G2RE1
mLOM UCSX-I9108-100GND
GPU UCSX-GPU-L40S
Table 9. Ordering Guide for Cisco AI Scale Up Inference Pod for High Performance
Item Part #
Chassis UCSX-9508-ND-U
Processors UCSX-CPU-I6548Y+
Memory UCSX-MRX64G2RE1
mLOM UCSX-I9108-100GND
GPU UCSX-GPU-H100-80
Item Part #
Chassis UCSX-9508-ND-U
Processors UCSX-CPU-I6548Y+
Memory UCSX-MRX64G2RE1
mLOM UCSX-I9108-100GND
GPU UCSX-GPU-L40S
Warranty information
The Cisco UCS X210c Compute Node has a three-year Next-Business-Day (NBD) hardware warranty and a 90-
day software warranty.
Augmenting the Cisco Unified Computing System™ (Cisco UCS) warranty, Cisco Smart Net Total Care® and
Cisco Solution Support services are part of Cisco's technical services portfolio. Cisco Smart Net Total Care
combines Cisco's industry-leading and award-winning foundational technical services with an extra level of
actionable business intelligence that is delivered to you through the smart capabilities in the Cisco Smart Net
Total Care portal. For more information, please refer to
https://www.cisco.com/c/en/us/support/services/smart-net-total-care/index.html
Cisco Solution Support includes both Cisco® product support and solution-level support, resolving complex
issues in multivendor environments on average 43 percent more quickly than with product support alone. Cisco
Solution Support is a critical element in data center administration, helping rapidly resolve issues encountered
while maintaining performance, reliability, and return on investment.
This service centralizes support across your multivendor Cisco environment for both our products and solution
partner products that you have deployed in your ecosystem. Whether there is an issue with a Cisco product or
with a solution partner product, just call us. Our experts are the primary point of contact and own the case from
first call to resolution. For more information, please refer to
https://www.cisco.com/c/en/us/services/technical/solution-support.html.
Information on electronic waste laws and regulations, including our WEEE Compliance
products, batteries and packaging
Information on product takeback and reuse program Cisco Takeback and Reuse Program
Cisco Capital
Flexible payment solutions to help you achieve your objectives
Cisco Capital makes it easier to get the right technology to achieve your objectives, enable business
transformation and help you stay competitive. We can help you reduce the total cost of ownership, conserve
capital, and accelerate growth. In more than 100 countries, our flexible payment solutions can help you
acquire hardware, software, services and complementary third-party equipment in easy, predictable
payments. Learn more.