0% found this document useful (0 votes)
18 views

A Comprehensive Guide To HPC in The Data Center

High-performance computing (HPC) enables organizations to use parallel processing across multiple computers to handle complex computing tasks like AI and data analytics. Implementing HPC requires specialized hardware, software frameworks, and facilities to house large computer clusters. While HPC can help organizations handle big data, it also comes with a high price tag and implementation challenges that must be considered.

Uploaded by

hofonad662
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

A Comprehensive Guide To HPC in The Data Center

High-performance computing (HPC) enables organizations to use parallel processing across multiple computers to handle complex computing tasks like AI and data analytics. Implementing HPC requires specialized hardware, software frameworks, and facilities to house large computer clusters. While HPC can help organizations handle big data, it also comes with a high price tag and implementation challenges that must be considered.

Uploaded by

hofonad662
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

A Comprehensive Guide

to HPC in the Data Center


In this handbook:

Hardware, software and


best practices for data
center HPC Hardware, software and best practices for data center
Top considerations for
HPC infrastructure in the HPC
data center ALLYSON LARCOM, ASSOCIATE SITE EDITOR

Compare Hadoop vs.


Spark vs. Kafka for your High-performance computing enables organizations to use parallel processing to run
big data strategy advanced programs such as AI and data analytics. The technology combines the processing
Which processing units power of multiple computers to handle complex computing tasks.
for AI does your
organization require?
When implemented correctly, high-performance computing (HPC) can help organizations
handle big data. The technology requires specialized hardware, software frameworks and
often large facilities to house it all. HPC also comes with a high price tag, which can act as a
barrier to many organizations hoping to implement HPC in their data centers.

To determine whether your organization could benefit from HPC, consider the complexity of
computing tasks your data center handles, the budget you can feasibly allocate toward
obtaining and running numerous servers, the expertise and training requirements of your
staff and the size of your data center facilities. Data centers that handle compute-intensive
tasks, such as government resource tracking or financial risk modeling, and data centers that

1 A COMPREHENSIVE GUIDE TO HPC IN THE DATA CENTER


In this handbook:
lean heavily on AI and machine learning can benefit greatly from HPC. But if your
Hardware, software and
organization might struggle to keep an HPC cluster busy, it might not be worth the
best practices for data
center HPC investment.

Top considerations for


HPC infrastructure in the
Once you decide to implement HPC, take stock of the various investments you must make.
data center Understand which networking and processing hardware you require; what tools you intend
to use to monitor, provision and manage your HPC cluster; which big data framework suits
Compare Hadoop vs.
Spark vs. Kafka for your your data processing needs; and what you intend to use HPC for. Once you've considered all
big data strategy
these areas, you can put plans in place to troubleshoot any complications and ensure the
Which processing units HPC cluster runs smoothly.
for AI does your
organization require?

▼ NEXT ARTICLE

2 A COMPREHENSIVE GUIDE TO HPC IN THE DATA CENTER


In this handbook:

Hardware, software and


Top considerations for HPC infrastructure in the data
best practices for data
center HPC
center
STEPHEN BIGELOW, SENIOR TECHNOLOGY EDITOR
Top considerations for
HPC infrastructure in the
data center High-performance computing arose decades ago as an affordable and scalable method for
tackling difficult math problems. Today, many organizations turn to HPC to approach
Compare Hadoop vs.
Spark vs. Kafka for your complex computing tasks such as financial risk modeling, government resource tracking,
big data strategy spacecraft flight analysis and many other "big data" projects.
Which processing units
for AI does your HPC combines hardware, software, systems management and data center facilities to
organization require? support a large array of interconnected computers working cooperatively to perform a
shared task too complex for a single computer to complete alone. Some businesses might
seek to lease or purchase their HPC, and other businesses might opt to build an HPC
infrastructure within their own data centers.

By understanding the major requirements and limiting factors for an HPC infrastructure, you
can determine whether HPC is right for your business and how best to implement it.

3 A COMPREHENSIVE GUIDE TO HPC IN THE DATA CENTER


In this handbook:
WHAT IS HPC?
Hardware, software and
best practices for data
Generally speaking, HPC is the use of large and powerful computers designed to efficiently
center HPC
handle mathematically intensive tasks. Although HPC "supercomputers" exist, such systems
Top considerations for often elude the reach of all but the largest enterprises.
HPC infrastructure in the
data center
Instead, most businesses implement HPC as a group of relatively inexpensive, tightly
Compare Hadoop vs.
Spark vs. Kafka for your
integrated computers or nodes configured to operate in a cluster. Such clusters use a
big data strategy distributed processing software framework -- such as Hadoop and MapReduce -- to tackle
complex computing problems by dividing and distributing computing tasks across several
Which processing units
for AI does your networked computers. Each computer within the cluster works on its own part of the
organization require? problem or data set, which the software framework then reintegrates to provide a complete
solution.

4 A COMPREHENSIVE GUIDE TO HPC IN THE DATA CENTER


In this handbook:

Hardware, software and


best practices for data
center HPC

Top considerations for


HPC infrastructure in the
data center

Compare Hadoop vs.


Spark vs. Kafka for your
big data strategy

Which processing units


for AI does your
organization require?

5 A COMPREHENSIVE GUIDE TO HPC IN THE DATA CENTER


In this handbook:
Distributed HPC architecture poses some tradeoffs for organizations. The most direct
Hardware, software and
benefits include scalability and cost management. Frameworks like Hadoop can function on
best practices for data
center HPC just a single server, but an organization can also scale them out to thousands of servers. This
enables businesses to build an HPC infrastructure to meet its current and future needs using
Top considerations for
HPC infrastructure in the readily available and less expensive off-the-shelf computers. Hadoop is also fault-tolerant
data center and can detect and separate failed systems from the cluster, redirecting those failed jobs to
Compare Hadoop vs. available systems.
Spark vs. Kafka for your
big data strategy
Building an HPC cluster is technically straightforward, but HPC deployments can present
Which processing units business challenges. Even with the ability to manage, scale and add nodes over time, the
for AI does your cost of procuring, deploying, operating and maintaining dozens, hundreds or even thousands
organization require?
of servers -- along with networking infrastructure to support them -- can become a
substantial financial investment. Many businesses also have limited HPC needs and can
struggle to keep an HPC cluster busy, and the money and training a business invests in HPC
requires that deployment to work on business tasks to make it worthwhile.

Only a thorough understanding of use cases, utilization and return on investment metrics
lead to successful HPC projects.

6 A COMPREHENSIVE GUIDE TO HPC IN THE DATA CENTER


In this handbook:
WHAT DOES IT TAKE TO IMPLEMENT HPC?
Hardware, software and
best practices for data
The three principal requirements for implementing an HPC cluster in a business data center
center HPC
include computing hardware, the software layer and facilities to house it all. More exact
Top considerations for requirements depend on the scale of the HPC deployment.
HPC infrastructure in the
data center
Compute requirements. Building an HPC cluster requires servers, storage and a dedicated
Compare Hadoop vs.
Spark vs. Kafka for your
network that should not share the everyday business traffic LAN. In theory, you can
big data strategy implement HPC software such as Hadoop on a single server, which can help staff learn and
gain experience with HPC software and job scheduling. However, a typical HPC cluster based
Which processing units
for AI does your on Hadoop uses a minimum of three servers: a primary node, a worker node and a client
organization require? node.

You can scale up that simple model with multiple primary nodes that each support many
worker nodes, which means the typical HPC deployment consists of multiple servers --
usually virtualized to multiply the number of effective servers available to the cluster. The
dedicated cluster network also requires high-bandwidth TCP/IP network gear such as Gigabit
Ethernet, NICs and switches. The number of servers and switches depends on the size of the
cluster, as well as the capability of each server.

A business new to HPC often starts with a limited hardware deployment scaled to just a few
racks, and then scales out the cluster later. You can limit your number of servers and

7 A COMPREHENSIVE GUIDE TO HPC IN THE DATA CENTER


In this handbook:
switches by investing in high-end servers with ample processors and storage, which leads to
Hardware, software and
more compute capacity in each server.
best practices for data
center HPC
Software requirements. Mature stacks must readily support the suite of HPC cluster
Top considerations for
HPC infrastructure in the
management functions. Software stacks such as Bright Cluster Manager and OpenHPC
data center typically include an assortment of tools for cluster management, including:

Compare Hadoop vs.


Spark vs. Kafka for your • Provisioning tools
big data strategy • Monitoring tools
• Systems management tools
Which processing units • Resource management tools
for AI does your • MPI libraries
organization require?
• Math libraries
• Compilers
• Debuggers
• File systems

Some organizations might adopt an HPC framework such as the Hadoop framework to
manage their HPC. Hadoop includes components such as the HDFS file system, Hadoop
Common, MapReduce and YARN, which offer many of the same functions listed above.

HPC projects require an output, which can take the form of visualization, modeling or other
reporting software to deliver computing results to administrators. Tools like Hunk, Platfora

8 A COMPREHENSIVE GUIDE TO HPC IN THE DATA CENTER


In this handbook:
and Datameer visualize Hadoop data, and open source tools such as Jaspersoft, Pentaho and
Hardware, software and
BIRT; business intelligence tools such as Cognos, MicroStrategy and QlikView; and charting
best practices for data
center HPC libraries, including Rshiny, D3.js and Highcharts, can visualize output for non-Hadoop
frameworks.
Top considerations for
HPC infrastructure in the
data center Facilities requirements. Facilities can often become the most limiting factor in HPC. To
implement HPC, you require the physical floor space and weight support to hold racks of
Compare Hadoop vs.
Spark vs. Kafka for your additional servers, power to operate them and adequate cooling capacity to manage heat.
big data strategy
Some businesses simply might not have the space and cooling infrastructure to support a
Which processing units substantial number of additional servers.
for AI does your
organization require?
Hyper-converged infrastructure systems can minimize physical computing footprints, but HCI
carries high-power densities that can result in rack "hot spots" and other cooling challenges.
A full compute rack intended for HPC deployment can include up to 72 blade-style servers
and five top-of-rack switches, weighing in total up to 1,800 pounds and demanding up to 43
kW of power.

HPC deployments require a careful assessment of data center facilities and detailed
evaluations of system power and cooling requirements versus capacity. If the facilities are
inadequate for an HPC deployment, you must seek alternatives to in-house HPC.

9 A COMPREHENSIVE GUIDE TO HPC IN THE DATA CENTER


In this handbook:
HANDLING HPC IMPLEMENTATION CHALLENGES
Hardware, software and
best practices for data
Compute challenges. Although HPC hardware is familiar and readily available, you can
center HPC
address compute limitations with modular high-density servers. A modular design makes
Top considerations for servers easy to expand and replace. You can achieve the best performance using dedicated
HPC infrastructure in the
data center high-performance servers with a dedicated high-speed LAN, which enables you to update
HPC programs over time through regular technology refresh cycles and additional
Compare Hadoop vs.
Spark vs. Kafka for your investment.
big data strategy
Software challenges. The principle HPC software challenges lie in managing software
Which processing units
for AI does your component versions and interoperability, i.e., ensuring that patching or updating one
organization require? component does not adversely impact the stability or performance of other software
components. Make testing and validation a central part of your HPC software update
process.

Facilities challenges. Available physical data center space, power and cooling required to
handle additional racks filled with servers and network gear limit many organizations hoping
to implement HPC. Server upgrades can help. By deploying larger and more capable servers
to support additional VMs, you can effectively add more HPC "nodes" without adding more
physical servers. In addition, grouping VMs within the same physical server can ease

10 A COMPREHENSIVE GUIDE TO HPC IN THE DATA CENTER


In this handbook:
networking issues because VMs can communicate within the server without passing traffic
Hardware, software and
through the LAN.
best practices for data
center HPC
You can look to third-party options such as colocation for additional space. Colocation
Top considerations for
HPC infrastructure in the
enables your organization to rent space in a provider's data centers and use that provider's
data center power and cooling. However, colocation often demands a costly long-term contractual
obligation that can span years.
Compare Hadoop vs.
Spark vs. Kafka for your
big data strategy Power costs also affect the long-term costs of an HPC deployment, so evaluate the
Which processing units availability and cost of local power. Consider balanced three-phase electrical distribution
for AI does your infrastructure and advanced power distribution gear -- such as smart PDUs and switched
organization require?
PDUs -- to increase power efficiency. Uninterruptible power supply units support orderly
server shutdowns of HPC clusters to minimize data loss.

Adding racks of high-density servers can add a considerable cooling load to a data center's
air handling system. When additional cooling isn't available, evaluate colocation or cloud
options, or consider advanced cooling technologies such as immersion cooling for HPC racks.

11 A COMPREHENSIVE GUIDE TO HPC IN THE DATA CENTER


In this handbook:
USING THE CLOUD FOR HPC?
Hardware, software and
best practices for data
Several public cloud providers, including AWS, Google Cloud Platform and Microsoft Azure,
center HPC
offer HPC services for businesses daunted by the challenges of building and operating HPC.
Top considerations for Public clouds overcome scale and cost challenges for individual businesses, which often
HPC infrastructure in the
data center makes them ideal for HPC tasks. Clouds can provide:

Compare Hadoop vs.


Spark vs. Kafka for your
• almost unlimited scale through globally available data centers;
big data strategy • a variety of dedicated CPU, GPU, field-programmable gate array and fast interconnect
hardware capabilities to optimize job performance for tasks like machine learning,
Which processing units visualization and rendering;
for AI does your • mature and readily available HPC services such as Azure CycleCloud and Apache
organization require? Hadoop on Amazon EMR, which lessen the learning curve and support burden on local
IT staff; and
• pay-as-you-go cost models that enable a business to only pay for HPC when it actually
uses those cloud services and resources.

Businesses with frequent and modest HPC tasks can choose to build and maintain a limited
HPC cluster for the convenience and security of local data processing projects and still turn
to the public cloud for occasional more demanding HPC projects that they cannot support in-
house.

12 A COMPREHENSIVE GUIDE TO HPC IN THE DATA CENTER


In this handbook:

Hardware, software and


best practices for data
▼ NEXT ARTICLE

center HPC

Top considerations for


HPC infrastructure in the
data center

Compare Hadoop vs.


Spark vs. Kafka for your
big data strategy

Which processing units


for AI does your
organization require?

13 A COMPREHENSIVE GUIDE TO HPC IN THE DATA CENTER


In this handbook:

Hardware, software and


Compare Hadoop vs. Spark vs. Kafka for your big data
best practices for data
center HPC
strategy
DANIEL ROBINSON
Top considerations for
HPC infrastructure in the
data center Big data became popular about a decade ago. The falling cost of storage led many
enterprises to retain much of the data they ingested or generated so they could mine it for
Compare Hadoop vs.
Spark vs. Kafka for your key business insights.
big data strategy
Analyzing all that data has driven the development of a variety of big data frameworks
Which processing units
for AI does your capable of sifting through masses of data, starting with Hadoop. Big data frameworks were
organization require? initially used for data at rest in a data warehouse or data lake, but a more recent trend is to
process data in real time as it streams in from multiple sources.

WHAT IS A BIG DATA FRAMEWORK?

A big data framework is a collection of software components that can be used to build a
distributed system for the processing of large data sets, comprising structured,
semistructured or unstructured data. These data sets can be from multiple sources and
range in size from terabytes to petabytes to exabytes.

14 A COMPREHENSIVE GUIDE TO HPC IN THE DATA CENTER


In this handbook:
Such frameworks often play a part in high-performance computing (HPC), a technology that
Hardware, software and
can address difficult problems in fields as diverse as materials science, engineering or
best practices for data
center HPC financial modeling. Finding answers to these problems often lies in sifting through as much
relevant data as possible.
Top considerations for
HPC infrastructure in the
data center The most well-known big data framework is Apache Hadoop. Other big data frameworks
include Spark, Kafka, Storm and Flink, which are all -- along with Hadoop -- open source
Compare Hadoop vs.
Spark vs. Kafka for your projects developed by the Apache Software Foundation. Apache Hive, originally developed
big data strategy
by Facebook, is also a big data framework.
Which processing units
for AI does your
organization require? WHAT ARE THE ADVANTAGES OF SPARK OVER HADOOP?

The chief components of Apache Hadoop are the Hadoop Distributed File System (HDFS) and
a data processing engine that implements the MapReduce program to filter and sort data.
Also included is YARN, a resource manager for the Hadoop cluster.

Apache Spark can also run on HDFS or an alternative distributed file system. It was
developed to perform faster than MapReduce by processing and retaining data in memory
for subsequent steps, rather than writing results straight back to storage. This can make
Spark up to 100 times faster than Hadoop for smaller workloads.

15 A COMPREHENSIVE GUIDE TO HPC IN THE DATA CENTER


In this handbook:
However, Hadoop MapReduce can work with much larger data sets than Spark, especially
Hardware, software and
those where the size of the entire data set exceeds available memory. If an organization has
best practices for data
center HPC a very large volume of data and processing is not time-sensitive, Hadoop may be the better
choice.
Top considerations for
HPC infrastructure in the
data center Spark is better for applications where an organization needs answers quickly, such as those
involving iterative or graph processing. Also known as network analysis, this technology
Compare Hadoop vs.
Spark vs. Kafka for your analyzes relations among entities such as customers and products.
big data strategy

Which processing units WHAT IS THE DIFFERENCE BETWEEN HADOOP AND KAFKA?
for AI does your
organization require?
Apache Kafka is a distributed event streaming platform designed to process real-time data
feeds. This means data is processed as it passes through the system.

Like Hadoop, Kafka runs on a cluster of server nodes, making it scalable. Some server nodes
form a storage layer, called brokers, while others handle the continuous import and export
of data streams.

Strictly speaking, Kafka is not a rival platform to Hadoop. Organizations can use it alongside
Hadoop as part of an overall application architecture where it handles and feeds incoming
data streams into a data lake for a framework, such as Hadoop, to process.

16 A COMPREHENSIVE GUIDE TO HPC IN THE DATA CENTER


In this handbook:

Hardware, software and


best practices for data
center HPC

Top considerations for


HPC infrastructure in the
data center

Compare Hadoop vs.


Spark vs. Kafka for your
big data strategy

Which processing units


for AI does your
organization require?

Because of its ability to handle thousands of messages per second, Kafka is useful for
applications such as website activity tracking or telemetry data collection in large-scale IoT
deployments.

17 A COMPREHENSIVE GUIDE TO HPC IN THE DATA CENTER


In this handbook:

Hardware, software and


best practices for data
center HPC WHAT IS THE DIFFERENCE BETWEEN KAFKA AND SPARK?

Top considerations for


HPC infrastructure in the Apache Spark is a general processing engine developed to perform both batch processing --
data center similar to MapReduce -- and workloads such as streaming, interactive queries and machine
Compare Hadoop vs. learning (ML).
Spark vs. Kafka for your
big data strategy Kafka's architecture is that of a distributed messaging system, storing streams of records in
Which processing units categories called topics. It is not intended for large-scale analytics jobs but for efficient
for AI does your stream processing. It is designed to be integrated into the business logic of an application
organization require?
rather than used for batch analytics jobs.

Kafka was originally developed at social network LinkedIn to analyze the connections among
its millions of users. It is perhaps best viewed as a framework capable of capturing data in
real time from numerous sources and sorting it into topics to be analyzed for insights into
the data.

That analysis is likely to be performed using a tool such as Spark, which is a cluster
computing framework that can execute code developed in languages such as Java, Python or
Scala. Spark also includes Spark SQL, which provides support for querying structured and

18 A COMPREHENSIVE GUIDE TO HPC IN THE DATA CENTER


In this handbook:
semistructured data; and Spark MLlib, a machine learning library for building and operating
Hardware, software and
ML pipelines.
best practices for data
center HPC

Top considerations for


HPC infrastructure in the
data center

Compare Hadoop vs.


Spark vs. Kafka for your
big data strategy

Which processing units


for AI does your
organization require?

19 A COMPREHENSIVE GUIDE TO HPC IN THE DATA CENTER


In this handbook:
OTHER BIG DATA FRAMEWORKS
Hardware, software and
best practices for data
Here are some other big data frameworks that might be of interest.
center HPC

Top considerations for Apache Hive enables SQL developers to use Hive Query Language (HQL) statements that are
HPC infrastructure in the
data center similar to standard SQL employed for data query and analysis. Hive can run on HDFS and is
best suited for data warehousing tasks, such as extract, transform and load (ETL), reporting
Compare Hadoop vs.
Spark vs. Kafka for your
and data analysis.
big data strategy

Which processing units


for AI does your
organization require?

20 A COMPREHENSIVE GUIDE TO HPC IN THE DATA CENTER


In this handbook:

Hardware, software and


best practices for data
center HPC

Top considerations for


HPC infrastructure in the
data center

Compare Hadoop vs.


Spark vs. Kafka for your
big data strategy

Which processing units


for AI does your
organization require?

21 A COMPREHENSIVE GUIDE TO HPC IN THE DATA CENTER


In this handbook:
Apache Flink combines stateful stream processing with the ability to handle ETL and batch
Hardware, software and
processing jobs. This makes it a good fit for event-driven workloads, such as user interactions
best practices for data
center HPC on websites or online purchase orders. Like Hive, Flink can run on HDFS or other data storage
layers.
Top considerations for
HPC infrastructure in the
data center Apache Storm is a distributed real-time processing framework that can be compared to
Hadoop with MapReduce, except it processes event data in real time while MapReduce
Compare Hadoop vs.
Spark vs. Kafka for your operates in discrete batches. Storm is designed for scalability and a high level of fault
big data strategy
tolerance. It is also useful for applications requiring a rapid response, such as detecting
Which processing units security breaches.
for AI does your
organization require?

▼ NEXT ARTICLE

22 A COMPREHENSIVE GUIDE TO HPC IN THE DATA CENTER


In this handbook:

Hardware, software and


Which processing units for AI does your organization
best practices for data
center HPC
require?
DANIEL ROBINSON
Top considerations for
HPC infrastructure in the
data center If you're looking to deploy AI in your data center, carefully consider what hardware and
infrastructure to invest in first.
Compare Hadoop vs.
Spark vs. Kafka for your
big data strategy AI covers a range of techniques, such as machine learning and deep learning. And AI includes
a broad range of business applications, from analytics capable of predicting future
Which processing units
for AI does your performance to recommendation systems and image recognition.
organization require?
As more large businesses adopt artificial intelligence as part of digital transformation efforts,
AI continues to expand and develop as a technology. Understanding why your business
requires AI can also help you decide which infrastructure to adopt in order to support it.

SERVERS WITH GPUS

Equipping servers with GPUs has become one of the most common infrastructure
approaches for AI. You can use the massively parallel architecture of a GPU chip to
accelerate the bulk floating-point operations involved in processing AI models.

23 A COMPREHENSIVE GUIDE TO HPC IN THE DATA CENTER


In this handbook:
GPUs also tend to have broad and mature software ecosystems. For example, Nvidia
Hardware, software and
developed the CUDA toolkit so developers can use GPUs for a variety of purposes, including
best practices for data
center HPC deep learning and analytics. However, although GPUs support certain deep learning tasks,
they do not necessarily support all AI workloads.
Top considerations for
HPC infrastructure in the
data center "There are models within the context of AI and machine learning that don't fall into this neat
category of deep learning and have been underexplored because the GPU is very good at
Compare Hadoop vs.
Spark vs. Kafka for your neural network type stuff, but it isn't necessarily good at some of these other interesting
big data strategy
flavors of algorithms that people are starting to do interesting things with," said Jack Vernon,
Which processing units analyst at IDC.
for AI does your
organization require?
Before deploying AI in the data center, you should start by considering your motives for
adopting the technology to decide whether GPUs suit your requirements. Then, seek a
specialist's advice on the kind of model that best fits your organization's requirements to
understand what other infrastructure you require.

OTHER HARDWARE ACCELERATORS

Field-programmable gate arrays (FPGAs) are essentially chips crammed with logic blocks that
you can configure and reconfigure as required to perform different functions. ASICs have
logic functions built into the silicon during manufacturing. Both accelerate hardware

24 A COMPREHENSIVE GUIDE TO HPC IN THE DATA CENTER


In this handbook:
performance. ASICs make more sense for organizations with a large volume of well-defined
Hardware, software and
workloads, whereas FPGAs require more complex programming.
best practices for data
center HPC
Google offers its TPU -- an ASIC designed specifically for deep learning -- to customers
Top considerations for
HPC infrastructure in the
through its Google Cloud Platform. Graphcore designed its IPUs specifically for AI workloads,
data center and Cambricon offers processor chips designed around an instruction set optimized for deep
learning. Intel's acquisition Habana Labs makes programmable accelerators as separate chips
Compare Hadoop vs.
Spark vs. Kafka for your for the training and inference parts of deep learning known as Gaudi and Goya, respectively.
big data strategy

Which processing units Although GPUs and similar types of hardware accelerators get the most attention when it
for AI does your comes to AI, CPUs remain relevant for many areas of AI and machine learning. For example,
organization require?
Intel has added features to its server CPUs to help accelerate AI workloads. The latest Xeon
Scalable family features Intel Deep Learning Boost, which features new instructions to
accelerate the kind of calculations involved in inferencing. This means that these CPUs can
accelerate certain AI workloads with no additional hardware required.

STORAGE FOR AI

Organizations should not overlook storage when it comes to infrastructure to support AI.
Training a machine learning model requires a huge volume of sample data, and systems
must be fed data as fast as they can take it to keep performance up.

25 A COMPREHENSIVE GUIDE TO HPC IN THE DATA CENTER


In this handbook:
"Storage is a really big thing, and the training process itself often involves feedback loops. So,
Hardware, software and
you need to essentially save the model in one stage, run some processing on top of that, to
best practices for data
center HPC update it, and then sort of continuously recall it," Vernon said. "Most organizations that are
building out training and inferencing infrastructure often quickly have a massive requirement
Top considerations for
HPC infrastructure in the for additional storage."
data center
Organizations with existing HPC infrastructure often already have a fast flash storage layer
Compare Hadoop vs.
Spark vs. Kafka for your back-ended by a much larger capacity layer. For most organizations, this means
big data strategy
implementing NVMe SSDs with as low latency as possible, backed by less costly storage to
Which processing units deliver the capacity.
for AI does your
organization require?
SPECIALIZED AI SYSTEMS

Several specialized systems offer higher performance for AI workloads. Nvidia bases its DGX
servers around its GPUs, with an architecture optimized to keep those GPUs fed with data.
Storage vendors have also partnered with Nvidia to provide validated reference
architectures that pair high-performance storage arrays with Nvidia DGX systems. For
example, DDN optimized its Accelerated, Any-Scale AI portfolio for all types of access
patterns and data layouts used in training AI models, and vendors such as NetApp and Pure
Storage offer similar storage architectures.

26 A COMPREHENSIVE GUIDE TO HPC IN THE DATA CENTER


In this handbook:
Intel offers its OpenVINO toolkit as an inferencing engine designed to optimize and run
Hardware, software and
pretrained models. This has a plugin architecture that enables it to execute models on a
best practices for data
center HPC range of hardware, such as CPUs, GPUs, FPGAs or a mixture of all three, which gives
organizations greater deployment flexibility.
Top considerations for
HPC infrastructure in the
data center You might also elect to build and train your AI models in the cloud, using on-demand
resources they can discontinue once training is finished.
Compare Hadoop vs.
Spark vs. Kafka for your
big data strategy

Which processing units


for AI does your
organization require?

27 A COMPREHENSIVE GUIDE TO HPC IN THE DATA CENTER

You might also like