Ampere Reveals 256-Core Server CPU, AI Partnership With Qualcomm In Big Update

In an interview with CRN, Ampere Computing executive Jeff Wittich says channel partners are ‘really important’ to the chip designer’s expanding push against rivals like Intel, which will include a 256-core server CPU and an AI partnership with Qualcomm, among other things.

Oracle-backed chip designer Ampere Computing has revealed a 256-core server CPU set to arrive next year and said it’s working with Qualcomm on a joint solution for AI inference.

The disclosures were made in the Santa Clara, Calif.-based company’s annual video update released on Thursday, where the company also said OEM and ODM server platforms for its AmpereOne CPUs are expected to start shipping in a few months.

[Related: Intel To Sunset First-Gen Max Series GPU To Focus On Gaudi, Falcon Shores Chips]

The chip designer’s other announcements include a plan to release a 12-channel memory version of its existing AmpereOne CPUs later this year, a new joint solution with video processing chip firm Netint, new features for letting select models run at higher or user-defined frequencies, and an expanded chiplet-based design strategy that includes potential implementation of third-party technologies in future processors.

Led by former Intel executive Renee James, Ampere also released new figures claiming that its CPUs can beat AMD’s 4th-Gen EPYC processors at performance-per-rack and offer more efficient inference performance for large language models than Nvidia’s A10 GPU.

The company released the update as its much larger x86 rivals, Intel and AMD, push to compete with Ampere’s “cloud-native” Arm-based server processors on super high core counts. Last year, AMD launched its cloud-focused EPYC 97X4 CPUs, previously code-named Bergamo, which max out at 128 cores. Intel, on the other hand, plans to launch soon a new version of Xeon processors, code-named Sierra Forest, that will carry up to 288 cores.

Ampere also has to contend with increased competition from cloud service providers who are developing their own CPUs. While Amazon Web Services stood alone for years with this approach, Microsoft Azure and Google Cloud—both Ampere customers—have said in recent months that they too are designing their own server CPUs to improve their offerings.

Ampere Expects Continued Cloud Growth, Says Channel Is ‘Really Important’

In an interview with CRN, Ampere Chief Product Officer Jeff Wittich (pictured) noted that Azure’s Ampere-based instances have expanded by 6 regions to 20 since the cloud provider’s deployment was first announced and said those instances continue to be used by “a lot of big customers.”

At the same time, Wittich said he wasn’t concerned about more competing CPU solutions entering the cloud infrastructure market.

“I fully expect we'll see continued growth in the public cloud, and we'll also see a ton of different solutions across the public cloud,” he said.

The company’s other cloud backers include Alibaba, Baidu AI Cloud, Gcore, Hetzner, JD Cloud, Kingsoft Cloud, Leaseweb, Tencent Cloud and Oracle Cloud Infrastructure. (Oracle is a major investor in Ampere, and its founder and chairman, Larry Ellison, reportedly said last year that his company plans to spend a significant amount of money on Ampere CPUs.)

While the company first focused on building partnerships with server vendors and cloud service providers a few years ago to establish itself, Ampere has more recently pursued relationships with channel partners to expand sales of Ampere-based solutions.

Wittich said Ampere’s channel roster has been “growing pretty fast” and now includes the likes of 2CRSI, Avantek Computer, Equuus Compute Solutions, Exxact, Thinkmate and others.

“We’ve gotten to that point now where we have broad enough demand that the channel piece is really important,” he said.

256-Core AmpereOne CPU Set For Next Year

Ampere revealed that it plans to release new versions of its AmpereOne server CPU next year that will scale up to 256 cores and use TSMC’s 3-nanometer manufacturing process.

“It's all sitting over at TSMC now, and we'll see that next year,” Wittich said.

The company debuted AmpereOne last year as a custom CPU design that maxed out at 192 cores and is compatible with the Arm instruction set architecture, a departure from its earlier Ampere Altra CPUs that use Arm’s off-the-shelf Neoverse chip designs.

Wittich said the existing server platform for AmpereOne will support the new models coming out next year and added that the new models will support 12 channels of memory.

“Those will push core count, overall performance, performance-per-core, performance-per-watt. All those things will increase a lot with that product,” he said.

The company also plans to introduce new versions of its existing AmpereOne design that will expand to 12 memory channels later this year from its initial eight channels.

New Partnership With Qualcomm On AI Inference Solution

Another major announcement from Ampere on Thursday is its plan to release a joint solution with Qualcomm that handles AI inference for large language models.

The joint solution will include a Supermicro server that’s equipped with Ampere CPUs—Ampere Altra or AmpereOne—and Qualcomm’s Cloud AI 100 accelerator chips.

Wittich said the joint solution is intended to give business an “easily deployable solution” that offers efficient inferencing computing for large language models that vary greatly in size.

“While we've got great performance for inferencing on CPUs […] as you get into the hundreds of billions of parameter models, you might want a different solution that scales out to more and more compute. This serves that need,” he said.

New AmpereOne Features Enables Higher But Predictable Frequencies

Among Ampere’s other disclosures was the announcement of two features that existing within select AmpereOne models: FlexSpeed and FlexSKU.

While FlexSpeed is about enabling the processor to temporarily increase its frequency where there is available power budget: FlexSKU is about enabling multiple, user-defined core count and frequency combinations that can help data centers optimize for density or performance-per-core.

Wittich said what separates FlexSpeed from a feature like Intel’s Turbo Boost Technology is that it enables the processor to operate at a higher but predictable frequency.

“This is for when you're not really consuming all the power, whether because you're at low utilization or you have an app that doesn't really have a lot of compute needs. You can run at a higher frequency with more performance,” he said.

FlexSKU is a “similar idea but a different use case,” according to Wittich.

“We do have customers that say, ‘well, I do have a set of applications that I would love to be able to run at higher frequencies in a predictable way when I don't really need all the cores,” he said.

New Performance Figures, Ampere’s Future Chip Design Strategy

In new performance figures provided by Ampere, the company claimed that its existing 192-core AmpereOne CPU enables up to 50 percent greater performance-per-watt and up to 34 percent better performance-per-rack compared to AMD’s 96-core EPYC 9654 “Genoa” CPU.

This is based on the SPECrate 2017_int_base benchmark, and Ampere said the AmpereOne CPU can also surpass AMD’s 128-core EPYC 9754 “Bergamo” CPU in the same areas.

“We’ve got a 50 percent lead in performance-per-watt versus Genoa and then about a 15 percent lead against Bergamo,” Wittich said.

When it comes to real-world workloads, Ampere said an AmpereOne-based data center running containerized web services such as Redis, MySQL, NGINX and Memcached requires up to 15 percent fewer servers and 35 percent less power than an AMD Genoa-based data center.

The company also said the AmpereOne-based data center requires 10 percent fewer servers and 26 percent less power than an AMD Bergamo-based data center.

In the AI computing space, Ampere claimed that an 80-core Ampere Altra CPU enables a 28 percent cost savings compared to Nvidia’s A10 GPU for producing one million tokens running at roughly 80 tokens per second with Meta’s 8-billion-parameter Llama 3 language model. It also uses 67 percent less energy when running such a model, according to the company.

As for what’s coming in the future, Wittich said the company plans to evolve its chiplet design strategy, which it’s using for AmpereOne, and explore the use of integrating third-party technologies in future processors as part of a collaboration with other chip designers.

This means Ampere could eventually introduce processors that contain CPU cores it designs alongside silicon technologies created by other companies, and they would be placed together as chiplets, which are smaller chips that combine on a larger package.

Wittich said Ampere is exploring these possibilities as part of a new working group within the AI Platform Alliance it formed last year. Called the Open Silicon Integration working group, one of its goals is to develop standards for chiplet integration using the Universal Chiplet Interconnect Express, a specification co-developed by several major chip firms, including Arm.

“This is us taking it one step further and saying, once you're using chiplets and once you're using an open interface, why stop there? Now you can integrate all kinds of third-party [intellectual property], you can integrate customer IP, you can integrate partner IP, and you can create a bunch of really, really cool solutions,” Wittich said.

Other members of the AI Platform Alliance, which was formed last year and aims to make AI platforms “more open, efficient and sustainable,” include AI chip designers such as Cerebras Systems, Furiosa, Rebellions and Untether AI. The roster also includes Graphcore, which is reportedly in talks to sell itself to Japanese investment giant SoftBank.

“Somebody has to step up and actually create a framework and a platform around this, because while chiplets and open interfaces make it easier, it still requires a lot of a lot of coordination,” Wittich said.