Nvidia Teslap100 Techoverview
Nvidia Teslap100 Techoverview
applications spend more time moving data from memory than processing
600 it. To solve this problem, the Tesla P100 tightly integrates compute and data
on the same package by adding Chip on Wafer on Substrate (CoWoS) with
400 HBM2 technology. Using a 4096 bit-wide interface with HBM2, the Tesla
K40 M40 P100 delivers 720 GB/s, which is 3X the memory bandwidth of Tesla K40
and M40 GPUs1.
200
HBM2 memory has native support for error correcting code (ECC)
0
functionality, while GDDR5 does not. GDDR5 lacks support for internal
ECC protection of memory content and is limited to error detection of the
Figure 2: The Tesla P100 with HBM2
significantly exceeds memory bandwidth of past GDDR5 bus only. Therefore, Tesla K40 and K80 offered ECC protection by
GPU generations
allocating 6.25% of the overall GDDR5 memory capacity for ECC bits. In
addition, ECC reduces memory bandwidth. The Tesla P100 with HBM2 has
no ECC overhead, both in memory capacity and bandwidth.
Another key benefit of HBM2 memory is its small footprint. Even with 16 GB
of memory capacity, the Tesla P100 board is approximately 1/3 the size of
Tesla K40 because memory stacks are co-located with the GPU in a single
package. Smaller module design enables a new class of highly dense
server designs.
1
Comparison with ECC turned on
This figure shows an 8-GPU server design based GPU GPU GPU GPU
on NVLink and the Hybrid Cube Mesh topology.
Four GPUs are directly connected and four
cross-links connect the two quads to each other.
NVLink is the world’s first high-speed interconnect for NVIDIA GPUs and
solves the interconnect problem. With four NVLink connections per GPU
NVIDIA TESLA P100 PERFORMANCE — each delivering with 40 GB/sec bi-directional interconnect bandwidth,
The following chart shows the performance Tesla P100 delivers 160 GB/s bidirectional bandwidth in total. This is over
for various workloads demonstrating the
performance scalability a server can achieve 5X the bandwidth of PCI Express Gen3. The PCIe interface is still available
with eight Tesla P100 GPUs connected via
NVLink. Applications can scale almost linearly for communication with x86 CPU or NIC interfaces.
to deliver the highest absolute performance in
a node.
50.0x
45.0x
40.0x
Application Speed-Up over
Dual Socket Haswell CPU
35.0x
30.0x
25.0x
20.0x
15.0x
10.0x
5.0x
0.0x
Alexnet with
Caffe
VASP
HOOMD-
Blue
COSMO
MILC
Amber
HACC
The new NVIDIA Tesla P100 accelerator, built on the Pascal architecture,
combines breakthrough technologies to enable science and deep learning
workloads that demand unlimited computing resources. Incorporating
innovations in architectural efficiency, memory bandwidth, capacity,
connectivity, and power efficiency, the NVIDIA Tesla P100 delivers the
highest absolute performance for next-generation HPC and AI systems.
JOIN US ONLINE
blogs.nvidia.com
@GPUComputing
linkedin.com/company/nvidia
Google.com/+NVIDIA
© 2016 NVIDIA Corporation. All rights reserved. NVIDIA, the NVIDIA logo, NVIDIA Pascal, NVLink, CUDA, and Tesla are
trademarks and/or registered trademarks of NVIDIA Corporation. All company and product names are trademarks or
registered trademarks of the respective owners with which they are associated.