The AI Servers Powering The Artificial Intelligence Boom

As part of CRN’s AI Week 2024, check out a sampling of AI servers from a number of server vendors and system builders.

Cutting Through The Hype On AI Servers

AI has been studied for decades, and generative AI has been used in chatbots as early as the 1960s. However, the release on November 30, 2022, of the ChatGPT chatbot and virtual assistant took the IT world by storm, making GenAI a household term and starting off a stampede to develop AI-related hardware and software.

One area where the general AI and GenAI push is starting to get strong is in AI servers. AI servers are defined by analyst firm IDC as servers that run software platforms dedicated to AI application development, applications aimed primarily at executing AI models, and/or traditional applications that have some AI functionality.

IDC in May estimated that AI Servers accounted for about 23 percent of the total market in 2023, a share that will continue to grow going forward. IDC also forecasts that AI server revenue will reach $49.1 billion by 2027 on the assumption that GPU-accelerated server revenue will grow faster than revenue for other accelerators.

The difference between AI servers and general-purpose servers is not always so clear, according to vendors and sellers.

When many people talk about AI servers, especially with the boom of GenAI, are GPU-rich systems, and especially when it comes to systems typically designed for training and fine-tuning models, said Robert Daigle, director of Lenovo’s global AI business.

“[But] there's also a lot of general-purpose servers that are used for AI workloads,” Daigle told CRN. “And as you get out of generative AI, and even out of deep learning and into traditional machine learning, a lot of the machine learning workloads still run on the CPU.”

Dominic Daninger, vice president of engineering at Nor-Tech, a Burnsville, Minn.-based custom system builder and premier-level Nvidia channel partner which both builds AI servers and sells other manufacturers’ models, told CRN that there are basically two types of AI servers, those aimed at training and, once the training is done, those aimed at inferencing.

AI servers don’t necessarily require GPUs to run, but they provide much better performance than CPUs do, Daninger said.

At the same time, he said, it is also important to note that not every server with GPUs is AI-focused. Workloads such as simulation models or liquid flow dynamics are done using GPUs without AI.

AI Servers Or Not?

The line between AI servers and non-AI servers can be tricky, and depends on workload, said Michael McNerney, senior vice president at San Jose, Calif.-based Supermicro.

“I think we have eight different major segments everywhere from LLM large-scale training all the way down to edge inference servers which are going to be pole-mounted or wall-mounted boxes on a factory floor,” McNerney told CRN. “We really see AI almost become sort of a feature of the systems, especially as you get down to the edge where those boxes get used for different things based on their configurations. Every server can become an API server at some point depending on the kind of the workload it’s running.”

AI is the dominant workload on GPU-based servers, particularly on those with the highest configurations which are typically used for LLM or large-scale inference, while midrange rackmount configurations handle a majority of inference workloads, McNerney said.

Lenovo has about 80 server platforms certified as AI-ready for both GenAI and the broad spectrum of AI, Daigle said.

“We’ve done things like increase our GPU and accelerator support across those product lines and run benchmarks on them such as MLPerf so customers can see the performance of those systems and how we've improved performance and empower AI workloads,” he said. “And then there’s the software stack that we enable to run on those. We have over 60 AI companies in our independent software vendor ecosystem. That allows us to enable over 165 enterprise-grade AI solutions.”

Going forward, there will continue to be a delineation between AI servers and general-purpose servers, Daigle said.

“There's still a lot of traditional workloads that customers have to support in their IT environment, in addition to adding AI-enabled infrastructure,” he said. “So I think we’ll continue to see systems designed for those traditional IT workloads in addition to its expansion into AI.”

Looking ahead, Daninger said he expects Intel and AMD will invest in AI-focused technology, but will find it hard to catch up with Nvidia.

“One of the things we've learned is, Nvidia has put so much work put into CUDA and the various libraries needed to really implement AI,” he said. “Plus Nvidia has made huge gains in the hardware end of things. Companies like Intel or AMD will have to move fast to beat Nvidia on the hardware end of things, but another holdback is it will take many years to develop all the code to utilize these things. Nvidia has a long lead on that.”

McNerney said that with large AI workloads, clusters of AI servers are important, which will lead to increased use of liquid cooling.

“We think we're going to go from less than 1 percent of deployments using liquid cooling to up to 30 percent in that large scale cluster space just because of the efficiency, the performance, and the cost savings,” he said.

As part of CRN’s AI Week 2024, check out a sampling of AI servers from a number of server vendors and system builders.

Lenovo ThinkSystem SR780a V3

The ThinkSystem SR780a V3 is built around eight Nvidia H100/H200/B200 Tensor Core GPUs paired with two 5th Gen Intel Xeon Scalable processors and 32 DDR5 DIMMs. The Nvidia GPUs are interconnected via the high-speed NVLink. The server includes the Lenovo Neptune liquid cooling system which the company said removes heat more efficiently than traditional air cooling and allows the GPUs and CPUs to run in accelerated mode for extended periods of time. The ThinkSystem SR780a V3 fits in a 5U chassis.

Dell PowerEdge R760xa

The Dell PowerEdge R760xa, is a purpose-built server supporting a wide range of GPUs in a dual-socket, 2U air-cooled form factor. It is centered on two 4th or 5th Generation Intel Xeon processors with up to 64 cores each and on-chip innovations to boost AI and ML operations. The server has up to four double-width PCIe Gen5 accelerators or up to 12 single-width PCIe accelerators, and supports PCIe GPU adapters from NVIDIA, AMD, and Intel. The server offers up to 32 DDR5 memory DIMM slots, Gen4 NVLink, PCIe Gen 5, and E3.S NVMe SSDs.

Supermicro AS-4125GS-TNHR2-LCC

The AS-4125GS-TNHR2-LCC from Supermicro features dual-socket AMD EPYC 9004 series processors and eight Nvidia H100 GPUs connected via Nvidia NVLink in a compact 4U footprint. The server, aimed at AI, deep learning, and HPC applications, eight PCIe 5.0 slots and 24 DIMM slots for up to 6 TBs of 4800 ECC DDR5 memory. Performance is enhanced with liquid cooling.

Nor-Tech Universal GPU Servers

Nor-Tech’s Universal GPU Servers offer multi-architecture flexibility and future-proof open-standards based designs to provide an advanced and flexible GPU server platform. The servers’ modular, standards-based platform supports the multiple GPU technologies in a variety of form factors and combinations for use with large-scale AI deep learning and HPC workloads. These 4U servers include a choice of dual 3rd Gen Intel Xeon Scalable or AMD EPYC 7003 series processors, a range of industry-standard GPU form factors, and up to 10 2.5-inch NVMe/SATA drives. An optional 1U expansion module offers improved thermal capacity for up to 700 watts of GPUs and two additional AIOM/PCIe slots.

HPE ProLiant DL385 Gen11 Server

The HPE ProLiant DL385 is a 2U dual-socket server featuring 4th Generation AMD EPYC 9004 series processors and up to 6 TBs of DDR5 memory and 36 EDSFF E3.5 NVMe SSDs. The servers can also accommodate up to four double-wide or eight single-wide Nvidia L4, L40, or L40S GPUs. Management is provided by HPE GreenLake for Compute Operations Management.