Intel Sees ‘Huge’ AI Opportunities For Xeon—With And Without Nvidia
Intel explains why its newly launched Xeon 6900P processors, which scale up to 128 cores and 8,800 megatransfers per second in memory speed, are a big deal for AI computing, whether it’s for CPU-based inferencing or serving as the host CPU for Nvidia-accelerated systems.
Intel said its latest Xeon processors present “huge” opportunities for channel partners in the AI computing space, whether the CPUs are used for inferencing or as the head node for systems accelerated by expensive, energy-guzzling chips like Nvidia’s GPUs.
[Related: Intel Offered Up To $5 Billion From Apollo: Report]
The Santa Clara, Calif.-based company on Tuesday marked the launch of its sixth-generation Xeon server CPUs with performance cores, or P-cores for short. Code-named Granite Rapids, the Xeon 6900P processors are designed to deliver the highest performance per core in contrast with the power efficiency focus of the higher-density Xeon 6700E chips with efficiency cores (E-cores) that debuted in June.
The new Xeon processors with P-cores and E-cores represent a bifurcation in Intel’s server CPU architecture that started with this year’s sixth generation, known as Xeon 6, which is meant to give organizations the “flexibility to meet any organization’s diverse efficiency and performance requirements,” the company has previously said.
With the focus on high performance, Intel is promising that the Xeon 6900P processors will provide “significant gains” in performance and performance-per-watt over the fifth-gen Xeon lineup that debuted in late 2023 across a variety of workloads, including databases, web services and high-performance computing.
But the chipmaker is making an extra emphasis on the AI inference capabilities of the Xeon 6900P lineup with the belief that there is a large and growing market for businesses to use CPU servers for such purposes instead of paying for expensive accelerated systems that require substantially greater amounts of energy but may not be used all the time.
This is part of Intel’s two-pronged AI strategy in the data center, where the company hopes to also create demand for its Gaudi 3 accelerator chips that are set to launch in the fourth quarter this year with a focus on cost-efficient performance for generative AI workloads.
Ryan Tabrah, vice president and general manager of Xeon and compute products, told CRN at an Intel press event last week that there is a “huge opportunity” for channel partners to help businesses determine whether they need accelerated systems or if they could use new or existing CPU-based systems to fulfill their AI initiatives.
“Every single market on the planet is being disrupted by AI, and they're all freaking out. I hear horror stories of customers buying certain racks of AI-accelerated racks, and then they end up sitting there and they're not even using them, and they default back to the CPU. So they're really afraid of being left behind, but they also don't want to go buy a bunch of stuff they'll never be able to use and reuse,” he said.
Xeon 6900P Enhancements And Specs
The Xeon 6900P chips double the maximum core count over the previous generation to 128 cores across three compute tiles in a chiplet design, continuing Intel’s break away from traditional monolithic designs in server chips that started with the fifth generation.
Across the five chips in the 6900P segment, the core count goes down to 72. In the first quarter of next year, Intel plans make a wider range of core count configurations available, with P-core processors that scale up to 86 cores between two compute tiles, up to 48 cores on one compute tile and up to 16 cores on a smaller compute tile.
The base and turbo frequencies are roughly in line with the previous generation, with the base frequency maxing out to 2.7GHz and the all-core turbo clock speed reaching up to 3.8GHz on the 72-core model. The single-core turbo frequency is 3.9GHz for all five chips.
Compared to fifth-gen Xeon, the 6900P chips increase DDR5 memory speed by 14 percent to 6,400 mega transfers per second (MT/s) and by 57 percent to 8,800 MT/s when using new DDR5 MRDIMM modules entering the market from Micron. Intel said it’s the first company to introduce CPUs that support MRDIMMs, which stands for multiplexed rank DIMMs and improves bandwidth and latency over standard memory DIMMS.
The 6900P series supports six Ultra Path Interconnect 2.0 links for CPU-to-CPU transfer speeds of up to 24 GT/s, up to 96 lanes of PCIe 5.0 and CXL 2.0 connectivity, and, for the first time, the 16-bit floating point (FP16) numerical format for the processor’s Advanced Matrix Extensions, which are designed to accelerate AI workloads.
With the CXL 2.0 support, Intel is expanding the way data center operators can expand memory in a cost-efficient way with a new Xeon feature called flat memory mode. The mode is controlled by hardware, and it allows applications to treat memory attached via CXL, short for Compute Express Link, as if it’s in the same tier as standard DRAM.
This allows data center operators to lower total cost of ownership (TCO), for example, by letting them use lower-cost, lower-performance DDR4 memory in the CXL region and higher-cost, higher-performance DDR5 memory in the DRAM region, according to Ronak Singhal, senior fellow and chief architect of Xeon at Intel.
“The cost of memory is a huge expense to everybody in terms of their TCO, and they're all looking at, how do I reduce their expense on that side? So the result here is that by moving from just a flat memory to this hierarchy of memory using a combination of DDR4 and DDR5, we could do that with minimal performance impact, less than 5 percent performance impact, but the customers are able to get a lower spend on their memory side by taking advantage of our platform,” Singhal said in a presentation last week.
Another change coming with the 6900P series is that Xeon’s optimized power mode, which debuted in the fourth generation, will be turned on by default. Singhal said Intel did this because the company “made improvements to that capability such that it’s no longer a choice that customers have to make between best performance and best power.”
The higher core counts and enhanced capabilities of the 6900P series come with higher power consumption. For all but one of the five chips, the thermal design power (TDP) is 500 watts while the outlier is 400 watts. While these are higher than the maximum 350-watt TDP for fifth-gen Xeon, Singhal said the 6900P chips can still be economical because of their significantly higher core counts.
“One of the hallmarks of this new platform is to be able to go to a higher TDP, especially with some of our largest customers. They continue to drive up the power because they're able to get the higher core density, and it still makes TCO sense for them to go up there,” said Singhal, who added that the additional Xeon 6 P-core processors arriving early next year will come with similar TDPs to the prior generation.
Why Intel Thinks Xeon 6900P Will Be A Big Deal For AI
Intel is hoping that the Xeon 6900P series will gain traction in the AI computing market in two ways: through adoption of CPU-only servers for inferencing and by becoming the top choice for host CPUs in servers accelerated by Nvidia GPUs, Gaudi 3 chips or the like.
In the realm of CPU-based inference, Intel said the Xeon 6900P series provides a leap in performance not only over the previous generation but also AMD’s fourth-gen EPYC processors, which debuted in 2022 and will soon be succeeded by a new set of CPUs, code-named “Turin” and powered by the rival’s Zen 5 architecture.
For example, when it came to a 7-billion-parameter Llama 2 chatbot, Intel’s 96-core Xeon 6972P is over three times faster than AMD’s 96-core EPYC 9654 and 128 percent faster than its 64-core Xeon 8952+ from the last generation, according to tests run by Intel.
Intel also showed that the new 64-core Xeon has a greater edge over AMD’s same CPU with an 8-billion-parameter Llama 3 chatbot—four times faster—while maintaining a similar boost with the smaller, older AI model over its last-gen Xeon part.
These two tests were based on chat-based interactions with large language models (LLMs), which means that the defined output and input were limited to only 128 tokens each. Tokens typically represent words, characters or combinations of words and punctuation.
When these chips were tested against the same Llama models for summarization, which involved a 1,024-token input and an 128-token output, Intel’s new 96-core processor still showed major gains over the two other chips, according to the company.
Intel’s 96-core Xeon was also significantly faster with the BERT-large language processing model, the DLRM recommendation model and ResNet50 image classification model.
Singhal said Intel performed these tests using its new 96-core Xeon CPU instead of its flagship 128-core part because the underlying frameworks of these models have been optimized for a lower core count.
The company thinks its 6900P series can usurp AMD’s upcoming EPYC Turin processors in CPU-based inferencing for LLMs.
Based on public performance claims released by AMD earlier this year, Intel said its new 128-core Xeon 6980P is 34 percent faster for LLM summarization, 2.15 times faster for an LLM chatbot and 18 percent faster for LLM translation in comparison to its rival’s 128-core flagship CPU that will arrive later this year.
“Our performance cores are uniquely architected to deliver significant performance leads in critical growth spaces like AI and HPC and databases, while also delivering lower power consumption so our customers can scale without impacting their power constraints,” said Tabrah, the head of Intel’s Xeon business.
Intel also highlighted the performance advantages of its 6900P series for vector databases, which play an important role in retrieval-augmented generation (RAG), a method for retrieving information from existing data sets using generative AI models.
Thanks to Intel’s Scalable Vector Search software library, the 96-core Xeon 6972P is 84 percent and 2.71 times faster when indexing databases with 100 million vectors and 45 million vectors, respectively, compared to AMD’s fourth-gen, 96-core 9654 CPU, according to internal tests by the company. It also found that the 96-core Xeon is 2.12 and 7.34 times faster when searching within databases with the same respective vector counts.
Singhal said this can benefit systems accelerated by GPUs running RAG applications because Intel believes vector databases for this use case mostly run on the CPU.
Citing server data from research firm IDC, Intel estimates that it likely has the largest footprint of host CPUs within accelerated systems, which represent the industry’s most powerful and are mostly used for training massive AI models.
To help protect its CPU dominance in accelerated systems, Singhal said Intel has worked closely with Nvidia to optimize the 6900P series for the AI chip rival’s MGX and HGX systems as well as upcoming systems using Intel’s Gaudi 3 accelerator chips.
“I think this is a great example of how we're working with the ecosystem to ensure the best overall AI system deployments,” he said.