Why Nvidia’s A800 Workstation GPU Uses A Chip Originally Made For China
Allen Bourgoyne, a director of product marketing at Nvidia, explains why the company ended up using the same GPU powering the A800 server chip originally designed for China as the basis for the new A800 40GB Active workstation chip.
While Nvidia’s A800 server GPU was originally designed for customers in China by sidestepping U.S. export restrictions last year, a recently launched workstation version of the chip was always meant for a global customer base, according to a company representative.
The product in question is the A800 40GB Active, which Nvidia quietly launched last week as a workstation GPU for AI, data science and high-performance computing. The chip is available in North America and other regions not impacted by a new wave of U.S. export restrictions announced last month blocking sales of the A800 and other high-end AI chips to China and other countries.
[Related: AMD’s Threadripper 7000 Series To Mark Return Of High-End Desktop CPUs For Prosumers]
In an interview with CRN, Allen Bourgoyne, a director of product marketing at Nvidia, said the company introduced the A800 40GB Active because it needed a replacement for 2018’s Quadro GV100, which it stopped producing after being one of the last GPUs to use Nvidia’s six-year-old Volta architecture.
“Whatever happened to China, we would have had to build a follow-on product. We would’ve needed to do that, because that product [the GV100] went end-of-life. We needed a replacement,” he said.
When the time came to design the GV100’s successor, the product team looked at the GPUs “available to us” that could meet the product requirements defined by the team. These requirements included constraints around physical, electrical, cooling and pricing attributes, and the product also had to deliver fast double-precision performance, which is crucial for HPC applications, according to Bourgoyne.
This is typical of how Nvidia designs its chip-based products.
“It’s basically product engineering,” Bourgoyne said.
From the options available, the product team decided to use “the same GPU [that] was used in the original A800 server product,” mainly due to the A800’s high throughput for double-precision computing, which is also known as 64-bit floating point or FP64, according to the Nvidia employee.
Nvidia introduced the A800 server GPU to customers in China last year as an alternative to its powerful A100 GPU because the latter part had been banned in the country by U.S. export restrictions.
The company designed the A800 to sidestep these restrictions by reducing the GPU’s chip-to-chip bandwidth, but Nvidia had to halt sales of the A800 and other high-end GPUs to China last month due to new U.S. rules targeting high performance density capabilities.
To the A800 40GB Active’s product team, however, it didn’t matter that the A800 started out as a server GPU designed initially for the Chinese market or that it would be banned from the country a year later. What was important was that it was available and it fit the team’s needs.
“It would exist regardless of what happened to China,” Bourgoyne said.
This is similar to how Nvidia tapped the Volta-based V100 server GPU for the GV100 workstation chip.
“For the foreseeable future, we’ll probably always need to leverage data center parts for high double-precision components for things we need in desktop. That’s just where the technology is more important,” Bourgoyne said.
How Nvidia Adapted The A800 For A Workstation
To adapt the A800 for a workstation, which is a heavy-duty desktop PC, the product team needed to make some changes from the server design.
One of the most important changes is the addition of a fan for actively cooling the GPU since a workstation can’t provide enough cooling for a server chip that is only equipped with a passive cooling solution, according to Bourgoyne.
Because the fan requires power, that meant the product team had to adjust the GPU to keep it within the team’s targeted power budget for the product. These adjustments included trimming down the GPU’s memory a “little bit” and running the GPU “a little slower,” Bourgoyne said.
The result is that the A800 40GB Active offers “similar performance” to the A800 server GPU, but it can offer that performance in a desktop form factor rather than a server, the product marketer added.
How A800 40GB Active Compares To Quadro GV100
Compared to 2018’s Quadro GV100, the A800 40GB Active is considered a big upgrade, offering 40GB of HBM2 memory versus GV100’s 32GB, a 5,120-bit memory interface versus GV100’s 4,096-bit interface, 1.5 TB/s of memory bandwidth versus GV100’s 870 GB/s, 6,912 CUDA cores versus GV100’s 5,120 CUDA cores and 400 GB/s of NVLink chip-to-chip bandwidth versus GV100’s 200 GB/s.
And while the A800 40GB Active’s 432 Tensor cores is less than the GV100’s 640 Tensor cores, the former is capable of hitting more than five times the peak Tensor performance at 623.8 teraflops. The GPU also provides a single-precision performance of 19.5 teraflops and a double-precision performance of 9.7 teraflops, which are 31 percent increases over the GV100’s capabilities.
Compared to the GV100, the A800 40GB Active runs 4.2 times faster for AI inference with the BERT Large model, 90 percent faster for AI training with the BERT Large model, 90 percent faster for the GTC benchmark and 70 percent faster for the LAMMPS benchmark, according to internal tests run by Nvidia.
What makes the A800 40GB Active a big upgrade is that the GPU is based on Nvidia’s Ampere architecture, which brings several substantial improvements over Volta, including a new efficiency technique for speeding up AI computations called structural sparsity and the ability to split the GPU into as many as seven GPU instances for running multiple workloads in parallel.
Both GPUs have a power budget of 240 watts.
A800 40GB Active Availability
Nvidia started selling the A800 40GB Active globally through channel partners last week, though the recent U.S. export restrictions mean the chip won’t be available in some countries, like China.
According to Nvidia partner PNY Technologies, the full list of excluded countries is as follows:
Afghanistan, Armenia, Azerbaijan, Bahrain, Belarus, Burma, Cambodia, Central African Republic, China, Democratic Republic of Congo, Cuba, Cyprus, Egypt, Eritrea, Georgia, Haiti, Iran, Iraq, Jordan, Kazakhstan, North Korea, Kuwait, Kyrgyzstan, Laos, Lebanon, Libya, Macau, Moldova, Mongolia, Oman, Pakistan, Qatar, Russia, Saudi Arabia, Somalia, South Sudan, Republic of Sudan, Syria, Tajikistan, Turkmenistan, United Arab Emirates, Uzbekistan, Venezuela, Vietnam, Yemen and Zimbabwe.