Nvidia targets datacentre memory bottleneck

by Jeremy

Nvidia hopes to take graphics processing units (GPUs) in the data center to the next level by addressing a bottleneck limiting traditional architectures’ data processing. In general, a datacentre server’s central processing unit (CPU) would pass on specific data processing calculations to a GPU, which is optimized to run such workloads. But, according to Nvidia, memory bandwidth limits the level of optimization. A GPU will usually be configured with relatively less fast memory than the CPU.

Nvidia targets datacentre memory bottleneck

Which has a more significant amount of slower memory. Moving data between the CPU and GPU to run a data processing workload requires copying from the slower CPU to the GPU memory. Nvidia has unveiled its first data centre processor, Grace, to remove this memory bottleneck, based on an Arm microarchitecture. According to Nvidia, Grace will deliver ten times the performance of today’s fastest servers on the most complex AI and high-performance computing workloads. It supports the next generation of Nvidia’s coherent NVLink interconnect technology, which the company claims allows data to move more quickly between system memory, CPUs, and GPUs.

Nvidia described Grace as a highly specialized processor targeting the largest data-intensive HPC and AI applications as the training of next-generation natural language processing models with more than one trillion parameters.

The Swiss National Supercomputing Center (CSCS) is the first organization publicly announcing using Nvidia’s Grace chip in a supercomputer called the Alps, due to go online in 2023.

CSCS designs and operates a dedicated system for numerical weather predictions (NWP) for MeteoSwiss, the Swiss meteorological service. This system has been running on GPUs since 2016.

The Alps supercomputer will be built by Hewlett Packard Enterprise using the new HPE Cray EX supercomputer product line and the Nvidia HGX supercomputing platform, which includes Nvidia GPUs, its high-performance computing software developer’s kit, and the new Grace CPU. The Alps system will replace CSCS’s existing Piz Daint supercomputer.

According to Nvidia, taking advantage of the tight coupling between Nvidia CPUs and GPUs, Alps is expected to be able to train GPT-3, the world’s largest natural language processing model, in only two days – 7x faster than Nvidia’s 2.8-AI exaflops Selene supercomputer, currently recognized as the world’s leading supercomputer for AI by MLPerf.

It said that CSCS users would apply this AI performance to a wide range of emerging scientific research that can benefit from natural language understanding. This includes, for example, analyzing and understanding massive amounts of knowledge available in scientific papers and generating new molecules for drug discovery.

“The scientists will be able to carry out simulations and pre-process or post-process their data. This makes the whole workflow more efficient for them,” said CSCS Director Thomas Schulthess.

Related Posts