How to Select the Best Processor and Hpc System for Your Ansys Workloads
How to Select the Best Processor and Hpc System for Your Ansys Workloads
Ansys use cases require compute, memory, and input/output (I/O) to varying degrees, depending on their application.
To reap optimal benefit from Ansys applications, manufacturers need a modern HPC system that can balance
performance needs with other considerations such as power consumption and Ansys software licensing expenses.
These environmental demands and others will compel manufacturers to seek a combination of both hardware
(processors and servers) that is performant but cost-, space-, and energy-efficient, and software that is optimized to
take advantage of hardware technologies. To assist manufacturing engineers in the selection process, Intel, HPE,
and Ansys have teamed up to help them realize greater efficiencies through optimized performance resulting in
more innovation time.
With more than 30 3rd Gen Intel® Xeon® Scalable processors available, choosing the most appropriate processor for a
particular Ansys workload may seem a bit daunting. Through close collaboration and co-innovation, Intel, HPE, and Ansys
are helping manufacturing engineers to optimize computing environments and performance through careful processor
selection to:
• Reduce time to results by solving larger and more complex problems with greater accuracy.
• Deliver more performance in a smaller footprint to reduce data center floor space requirements and lower energy costs.
• Minimize system downtime and unplanned outages through improved system reliability.
• Identify performance bottlenecks and reduce costs of Ansys performance testing on select Intel CPUs.
Results
A sampling of performance improvements across Ansys workloads demonstrate how the collaboration is guiding
customers toward positive performance outcomes:
• Running a dozen standard benchmarks in Ansys Fluent resulted in a 13-18 percent improvement in jobs per day, with
less wall-clock time using the latest Intel processor.
• Five standard Ansys CFX benchmarks ran better on a 3rd Gen Intel Xeon Scalable processor with jobs-per-day
improvements from 19-29 percent wall-clock times of up to 24 percent faster.
• Three types of crash simulations run in Ansys LS-Dyna resulted in a 41 percent improvement in jobs per day on the 3rd
Gen Intel Xeon Scalable processor with consistently lower wall-clock times.
1
https://www.prnewswire.com/news-releases/global-computer-aided-engineering-cae-market-to-reach-8-7-billion-by-2026--301321641.html
How to Select the Best Processor and HPC System for Your Ansys Workloads // 1
/ CHOOSING THE RIGHT PROCESSOR FOR THE JOB
Intel’s latest processors for HPC workloads are 3rd Generation Intel Xeon Scalable processors offering manufacturers a
variety of SKUs to choose from. This ensures performance is optimized for the right resource constraints. Compared to
previous-generation Intel Xeon Scalable processors, the 3rd Gen processors offer improvements in three areas: compute,
memory, and I/O.
Compute:
• Up to 40 cores in a standard socket
• Enhanced per-core performance, including a 20 percent boost in instructions per clock (IPC)
• Wide range of frequency, feature, and power levels
• Intel Speed Select Technology (Intel SST), which provides fine-grain control over CPU performance that can help to
optimize total cost of ownershitp (TCO)
• Built-in HPC and artificial intelligence (AI) acceleration with Intel Advanced Vector Extensions 512 (Intel AVX-512) and
Intel Deep Learning Boost (Intel DL Boost)
Memory:
• Increased memory capacity with up to eight channels — up to 6 TB of system memory per processor
• Enhanced memory performance with support for up to 3200 MT/s DIMMs (two DIMMs per channel)
• Increased L1 and L2 cache
• Faster internode connections with three Intel Ultra Path Interconnect links at 11.2 GT/s
• Support for Intel® Optane™ persistent memory 200 series
I/O:
• Support for PCI Express (PCIe) Gen4 and up to 64 lanes (per socket) at 16 GT/s
• Intel Optane solid state drives (SSDs), with consistently high performance and up to 100 drive writes per day (DWPD)
• Support for a wide range of network fabrics
How to Select the Best Processor and HPC System for Your Ansys Workloads // 2
Ansys Computational Fluid Dynamics (CFD)
Ansys CFD applications are used for simulating the flow of air, fluids, heat, and viscous material, and have many uses in
a wide variety of industries — including aeronautic science, drag simulation in car shape design, and jet and thermal
flow in engine design. CFD-related workloads are typically analyses of complex, often unstructured meshes with tens
to hundreds of millions of cells. The data involved usually requires preprocessing, which can affect runtimes and the
quality of results.
To achieve optimum performance, Ansys CFD applications need plenty of memory capacity and memory bandwidth
(see Table 3). Although a processor with a high clock speed is usually ideal for this purpose, it is less crucial for Ansys
CFD workloads that are running on a large cluster. With large clusters, communication throughput is becoming more
important than compute speed, so the processor speed is not as critical. With the exception of transient models, I/O
performance is generally not a critical performance factor.
Four 3rd Gen Intel Xeon Scalable processor SKUs are well-suited for running CFD workloads (see Table 2 for number of
cores, frequencies, and other details):
• 6346. This processor is an excellent entry-level CAE SKU for performance per core, with strong core density and
performance.
• 6336Y. This processor also has strong core density and performance and is a good choice for a general-purpose CAE
cluster. It offers configuration flexibility with three performance profiles, which enables manufacturers to choose a
configuration with low thermal design power (TDP) for use in power-constrained environments.
• 8358. This processor is a good choice for Ansys CFX and Ansys Fluent, both of whichbenefit from higher core counts.
• 8360Y. Like the 6336Y, this processor offers three performance profiles and is a good choice for a general-purpose CAE
cluster. With more cores than the 6336Y, the 8360Y can be used for codes that demand more compute.
Ansys structural analysis concerns stress analysis on components and assemblies. Structural applications generally use FEA
in two main numerical simulation approaches:
• Implicit FEA — Implicit analysis using sparse direct solver type models is used for longer duration, relatively static problems
in which time dependency of the solution is not an important factor, such as analysis of forces on structures. As the
problem size increases, implicit analysis usually uses much longer time steps but can require more computational
resources.
How to Select the Best Processor and HPC System for Your Ansys Workloads // 3
• Explicit FEA — Explicit analysis using iterative direct solver type models is used for high-impact and short-duration
simulations where each step takes into account forces like mass and inertia from the previous step. Examples of this type
of analysis include crash, impact, and blast simulations. Such “nonlinear” events are modeled to predict cascading
damage to structural and component integrity. Major users of crash testing include automotive manufacturers, who use it
to save money by reducing the need for real-world crash tests.
Unlike Ansys CFD workloads, Ansys FEA workloads don’t necessarily need high memory capacity, and memory
bandwidth is of less importance. Instead, FEA workloads (see Table 4) require high clock speeds (3.4 GHz or more) and
an appropriate core count (at least 32 cores on a two-socket system). These workloads are quite sensitive to network
latency and fabric latency but are not I/O-constrained.
Four 3rd Gen Intel Xeon Scalable processor SKUs are well-suited for running Ansys FEA workloads (see Table 2 for
number of cores, frequencies, and other details):
• 8352Y and 8360Y. With 32 cores and 36 cores respectively, these processors are tuned for a general-purpose CAE
cluster but have the higher number of cores required by Ansys FEA workloads. They offer configuration flexibility with
three performance profiles each, which enables manufacturers to choose a configuration with a TDP that suits their
environment.
• 8358 and 8362. Also offering 32 cores but different frequencies, these processors provide good performance, especially
for LS-DYNA.
NVH applications simulate acoustics to pinpoint indicators associated with quality such as squeak and rattle, vibration issues
and external and internal noise levels heard within a vehicle. NVH solvers can be static or dynamic; the latter models involve
millions of finite elements (1–40 million degrees of freedom) with thousands of components and properties.
All NVH workloads can benefit from large memory capacity, but higher core counts provide negligible benefits (see Table 5).
In fact, Ansys Mechanical should use lower core counts. The other performance factors are harder to generalize for NVH than
for other CAE disciplines because they have varying performance requirements.
The following 3rd Gen Intel Xeon Scalable processor SKUs are well-suited for running NVH workloads (see Table 2 for number
of cores, frequencies, and other details):
• 6334. This processor is an excellent entry-level CAE SKU for performance per core.
• 6342. Depending on a manufacturer’s core density and core frequency requirements, the 6342 is best suited for
Ansys Mechanical.
How to Select the Best Processor and HPC System for Your Ansys Workloads // 4
/ DEPLOYING A SCALE- OUT PL ATFORM FOR ANSYS
The previous discussion centered on choosing the appropriate processor for a specific type of Ansys workload, which is
important, but it is only part of the picture. A best-fit processor alone cannot solve all the challenges facing CAE data
centers, such as keeping server footprint to a minimum, lowering TCO, and reducing the cost to the environment (TCE)
by shrinking energy consumption.
The HPE Apollo 2000 Gen10 Plus System is a dense, multi-server platform that packs substantial performance and
workload flexibility into a small data center space, while delivering the efficiencies of a shared infrastructure. It is
designed to provide a path to scale-out architecture for traditional data centers, so enterprise customers can achieve
the space-saving value of density-optimized infrastructure in a cost-effective and nondisruptive manner. The HPE
Apollo 2000 Gen10 Plus System also offers twice the density of traditional 1U rack mount systems, helping data centers
to maximize the use of valuable data center space. For these reasons the HPE Apollo 2000 Gen10 Plus System is good
choice for data center modernization as Ansys workloads climb.
/ RESULTS: RUN MORE JOBS PER DAY TO IMPROVE DESIGN AND SPEED TIME TO MARKET
Collaboration between HPE, Intel, and Ansys is producing impressive performance improvements across a wide variety
of Ansys workloads. These improvements enable product design engineers to run more jobs and iterations per day
using higher fidelity models. The results showcased here are standard Ansys benchmarks that represent real-world
workloads. These performance improvements illustrate the power of hardware and software vendors working together
to provide the best possible outcome for customers — improved workload performance resulting in faster time-to-
market and more time for innovation.
Overall, the benchmark tests demonstrated that the majority of Ansys workloads — CFD, Mechanical, or LS-DYNA — can
support more jobs per day and reduced wall-clock times using the latest Intel Xeon Scalable processor. These results
are directly related to the newer processor’s additional CPU cores, its IPC improvements, and its additional memory
capacity and bandwidth. Running more jobs per day with lower wall-clock time translates directly to accelerated design
and testing, which in turn can lead to product improvements and faster time to market.
How to Select the Best Processor and HPC System for Your Ansys Workloads // 5
Ansys Fluent
Fluent uses a cell-centered code capable of handling polyhedral mesh and cut-cell meshes. It offers both pressure-
based and density-based options generally used for combustion, multiphase or chemically-reacting flows.
Over a dozen standard Fluent benchmarks were run — ranging from a gasoline direct injection model to a landing
gear analysis to external flows over an aircraft wing and various types of cars. Figures 1 and 2 show a sampling of the
results, illustrating a 13–18 percent improvement in jobs per day with consistently less wall-clock time using the latest
Intel processor. The test also used Fluent’s “-platform=intel” parameter, which provides optimizations specific to Intel
architecture. With the 2021 R2 version of Fluent released in July 2021, Ansys and Intel worked together to further
improve these optimizations.
Jobs
JobsPer
PerDay
DayChart
Chart(higher
(Higherisisbetter)
Better)
Intel 6354 3GHz 18c 39MB L3 205W Intel 6246R 3.4GHz 16c 35.75MB L3 205W
18%
3500 More
JPD
3000
17%
More
JPD
2500
2000 13%
More 16%
JPD More
16%
1500 More 14% JPD
15% JPD More
15% JPD
More 14%
More
1000 13% JPD
JPD More
More 13% JPD
13% JPD More
500 More JPD
JPD
0
# Nodes 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4
# Cores 32 36 64 72 96 108 128 144 32 36 64 72 96 108 128 144 32 36 64 72 96 108 128 144
Boeing Landing gear analysis External flow over Aircraft Wing Flow through combustor (lm6000_16M)
(landing_gear_15M) (aircraft_wing_14M)
JPD= Jobs per day
Elapsed
Elapsed Walltime
Wall-Clock (lower
Time isisbetter)
(Lower Better)
Intel 6354 3GHz 18c 39MB L3 205W Intel 6246R 3.4GHz 16c 35.75MB L3 205W
1800
1600
1400 24%
Faster
1200
1000
800
22%
Faster 22%
600 Faster
19%
400 Faster 19% 18%
Faster 16%
Faster 19%
18% Faster
200 Faster
17% 17% 18% Faster
Faster Faster Faster
0
# Nodes 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4
# Cores 32 36 64 72 96 108 128 144 32 36 64 72 96 108 128 144 32 36 64 72 96 108 128 144
Air Foil 100M Air Foil 10M Air Foil 50M
How to Select the Best Processor and HPC System for Your Ansys Workloads // 6
Ansys CFX
CFX is a fully implicit solver that requires quite a bit of storage. It uses a cell-vertex code, is pressure-based, and handles
traditional tetra and hexa mesh topologies. This software is often used in turbomachinery analyses.
During analysis five standard CFX benchmarks were run: a LeMans race car, a pump, and three air foils. All benchmarks
ran better on the 3rd Gen Intel Xeon Scalable processor. Figures 3 and 4 show a sampling of the results, illustrating jobs-
per-day improvements ranging from 19 percent to 29 percent, and wall-clock times of up to 24 percent faster.
JobsJobs
Per Per
DayDay
Chart (higher
Chart isisbetter)
(Higher Better)
Intel 6354 3GHz 18c 39MB L3 205W Intel 6246R 3.4GHz 16c 35.75MB L3 205W
21%
More
3500 JPD
3000 21%
More
JPD
2500
21%
2000 More
JPD
1500
22%
More
24%
1000 JPD
19% More
24% 22% More JPD
24% More More JPD
28%
500 24%
29% More JPD More JPD
More JPD
More JPD JPD
JPD
0
# Nodes 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4
# Cores 32 36 64 72 96 108 128 144 32 36 64 72 96 108 128 144 32 36 64 72 96 108 128 144
JPD= Jobs per day Air Foil 100M Air Foil 10M Air Foil 50M
ElapsedWall-Clock
Elapsed WalltimeTime
(lower is isbetter)
(Lower Better)
Intel 6354 3GHz 18c 39MB L3 205W Intel 6246R 3.4GHz 16c 35.75MB L3 205W
1800
1600
1400 24%
Faster
1200
1000
800
22%
Faster 22%
600 Faster
19%
400 Faster 19% 18%
Faster 16%
Faster 19%
18% Faster
200 Faster
17% 17% 18% Faster
Faster Faster Faster
0
# Nodes 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4
# Cores 32 36 64 72 96 108 128 144 32 36 64 72 96 108 128 144 32 36 64 72 96 108 128 144
Air Foil 100M Air Foil 10M Air Foil 50M
How to Select the Best Processor and HPC System for Your Ansys Workloads // 7
Ansys LS-DYNA
LS-DYNA is a multiphysics simulation application that is used for drop tests, impact and penetration, smashes and
crashes, occupant safety, and more.
During analysis, three types of crash simulations were run. The benchmarks all showed marked improvement in jobs
per day on the 3rd Gen Intel Xeon Scalable processor — up to 41 percent. Wall-clock times were also consistently lower.
Figures 5 and 6 show a sampling of the results.
JobsJobs
Per Per
DayDay
Chart (higher
Chart (Higherisisbetter)
Better)
Intel 6354 3GHz 18c 39MB L3 205W Intel 6246R 3.4GHz 16c 35.75MB L3 205W
2500 18%
More
JPD
23%
More
2000 JPD
21%
More
1500 JPD
21%
1000 More
JPD
500 16%
13% 16%
21% More More
More 41% 15% 21% 20%
More JPD JPD
JPD More More More More
JPD JPD
JPD JPD JPD
0
# Nodes 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4
# Cores 32 36 64 72 96 108 128 144 32 36 64 72 96 108 128 144 32 36 64 72 96 108 128 144
JPD= Jobs per day 3cars car2car Neon
Elapsed WalltimeTime
Elapsed Wall-Clock (lower is isbetter)
(Lower Better)
Intel 6354 3GHz 18c 39MB L3 205W Intel 6246R 3.4GHz 16c 35.75MB L3 205W
20000
18000
16000
14000 29%
Faster
12000
10000
13%
8000
Faster
17%
6000 Faster
16%
Faster
4000
17%
2000 Faster 11% 14% 14% 17% 19% 15%
Faster Faster Faster 18%
Faster Faster Faster Faster
0
# Nodes 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4
# Cores 32 36 64 72 96 108 128 144 32 36 64 72 96 108 128 144 32 36 64 72 96 108 128 144
3cars car2car Neon
Figure 6. Ansys LS-DYNA wall-clock time benchmark results
How to Select the Best Processor and HPC System for Your Ansys Workloads // 8
/ CONCLUSION
HPE, Intel, and Ansys are collaborating to provide an optimal platform for product design. Memory enhancements in
the latest generation of Intel Xeon Scalable processors, along with acceleration technology like Intel AVX-512, provide
the raw horsepower needed for Ansys CFD, Mechanical, and LS-DYNA workloads. Ansys and Intel engineers also work
closely to optimize code so that it can take advantage of Intel architecture. And by deploying Ansys applications on
the HPE Apollo 2000 Gen10 Plus system, manufacturers can minimize server footprint, lower overall costs, and reduce
energy consumption.
Ansys standard benchmarks show that this workload-optimized combination of hardware and software can
substantially improve Ansys workload performance, as measured by number of jobs per day and elapsed wall-clock
time. These performance improvements can be a significant differentiator for manufacturers because they can help
them bring higher quality products to market more quickly.
/ ADDITIONAL RESOURCES
Intel AVX-512
https://www.intel.com/content/www/us/en/architecture-and-technology/avx-512-overview.html
Ansys HPC
http://www.ansys.com/hpc
/ OUR PARTNERS
ANSYS, Inc. If you’ve ever seen a rocket launch, flown on an airplane, driven a car, used a computer, touched
Southpointe a mobile device, crossed a bridge or put on wearable technology, chances are you’ve used a
2600 Ansys Drive product where Ansys software played a critical role in its creation. Ansys is the global leader
in engineering simulation. We help the world’s most innovative companies deliver radically
Canonsburg, PA 15317
better products to their customers. By offering the best and broadest portfolio of engineering
U.S.A. simulation software, we help them solve the most complex design challenges and engineer
724.746.3304 products limited only by imagination.
[email protected]
Visit www.ansys.com for more information.
Any and all ANSYS, Inc. brand, product, service and feature names, logos and slogans are
registered trademarks or trademarks of ANSYS, Inc. or its subsidiaries in the United States or
other countries. All other brand, product, service and feature names or trademarks are the
property of their respective owners.
How to Select the Best Processor and HPC System for Your Ansys Workloads // 9