The Barcelona Supercomputing Center (BSC) and the State University of New York (Stony Brook and Buffalo campuses) recently conducted benchmark tests comparing NVIDIA's CG100 "Grace" Superchip with other competing products in various HPC and AI benchmarks. While NVIDIA has primarily focused on marketing the overall GH200 "Grace Hopper" package, it is interesting to see technical institutes focusing on the company's "first true" server processor, which is ARM-based, rather than the popular GPU aspect. The Next Platform's article provides a summary of the chip's internal components, highlighting its high core count, low thermal footprint, and low-power DDR5 (LPDDR5) memory with error correction for server-class usage.

The benchmark results were presented at the HPC Asia 2024 conference in Nagoya, Japan, and were also uploaded to the ACM Digital Library. BSC's MareNostrum 5 system includes an experimental cluster portion consisting of NVIDIA Grace-Grace and Grace-Hopper superchips. The Grace-Grace configuration, which combines two Grace CPUs with NVLink chip-to-chip interconnects, offers memory coherence across LPDDR5 memory banks and consumes around 500 watts. This configuration provides a total of 144 Arm Neoverse "Demeter" V2 cores with the Armv9 architecture, 1 TB of physical memory, and 1.1 TB/sec of peak theoretical bandwidth. However, only 960 GB of memory capacity and 1 TB/sec of memory bandwidth are currently available due to limitations in LPDDR5 memory yield.

BSC's older MareNostrum 4 supercomputer, which is based on nodes with two 24-core Skylake-X Xeon SP-8160 Platinum processors, was outperformed by the NVIDIA-fortified MareNostrum 5 system. The worst performance results of MareNostrum 5 were still 67% faster than MareNostrum 4, while the best performance indicated a 4.49x advantage. The State University of New York also compared its NVIDIA setup, both in Grace-Grace (CPU-CPU pair) and Grace-Hopper (CPU-GPU pair) configurations, against rival solutions including Intel Sapphire Rapids and Ice Lake, AMD Milan, and ARM-based Amazon Graviton 3 and Fujitsu A64FX processors. According to the comparison data, the Grace Superchip outperformed all rivals except for the Sapphire Rapids server with two 48-core Xeon Max 9468s.

The Next Platform suggests that the success of NVIDIA's CG100 server processor is largely due to its pairing with the Hopper GPU. They note that any CPU paired with the same Hopper GPU would likely perform similarly. The Gromacs performance of the CPU-only Grace-Grace unit is almost as powerful as a pair of Sapphire Rapids Xeon Max Series CPUs. It is worth considering the impact of the HBM memory on this chip, which does not significantly improve Gromacs performance. Overall, these findings provide valuable insights into the performance of the Grace CPU in HPC workloads.