LLM Startup Embraces AMD GPUs, Says ROCm Has ‘Parity’ With Nvidia’s CUDA Platform
A startup focused on customizing large language models for enterprises reveals its embrace of AMD’s Instinct MI200 GPUs and ROCm platform as the chip designer mounts its largest offensive yet against rival Nvidia, whose GPUs serve as the main engines for many large language models and other kinds of generative AI applications today.
A startup focused on fine-tuning large language models revealed it has been “secretly running on more than 100” AMD Instinct MI200 series GPUs and said the chip designer’s ROCm software platform “has achieved software parity” with Nvidia’s dominant CUDA platform for such models.
The Palo Alto, Calif.-based startup, Lamini, made the disclosures in a blog post Tuesday as AMD mounts its largest offensive yet against rival Nvidia, whose GPUs serve as the main engines for many large language models (LLMs) and other kinds of generative AI applications today.
[Related: Top Intel AI Executive Leaves To Lead Security Business At AWS]
Founded by machine learning expert Sharon Zhou and former Nvidia CUDA software architect Greg Diamos, Lamini is a small startup whose platform allows enterprises to fine-tune and customize LLMs into private models using proprietary data. The startup claims to have more than 5,000 companies on a waitlist to use its platform that opened several months ago.
In the blog post, Lamini said it has been running more than 100 AMD Instinct MI200 GPUs on its own infrastructure, which the startup is making available through its newly announced LLM Superstation, available both in the cloud and on premises.
This makes Lamini “the only LLM platform that exclusively runs on AMD Instinct GPUs—in production,” according to the startup,” and said the compute costs of running Meta’s 70-billion-parameter Llama 2 model is 10 times cheaper than it is to do so on Amazon Web Services.
Lamini said the reliance on AMD’s Instinct GPUs is a differentiator in part because they are available, unlike Nvidia’s flagship A100 and H100 GPUs that have been experiencing shortages due to high demand for infrastructure running LLMs and other kinds of generative AI applications.
Diamos, Lamini’s CTO, praised ROCm, AMD’s software stack for coding software on GPUs, for having “achieved software parity” with Nvidia’s CUDA platform for LLMS.
He said the startup chose AMD’s flagship Instinct MI250 GPU, which launched in 2021, as the foundation for its platform “because it runs the biggest models that our customers demand and integrates fine-tuning optimizations.”
Diamos added that the large, 128-GB high-bandwidth-memory capacity of the MI250 allows Lamini “to run bigger models with lower software complexity than clusters of A100s” from Nvidia.
According to tests run by Lamini, AMD’s less powerful Instinct MI210 GPU achieves up to 89 percent of theoretical peak teraflops per second for generic matrix-matrix multiplication (GEMM) and up to 70 percent of peak bandwidth for ROCM’s hipMemcpy function.
“This shows AMD’s libraries effectively tap into the raw throughput of MI accelerators for key primitives. With basic building blocks operating efficiently, ROCm provides a solid foundation for high-performance applications like fine-tuning LLMs,” Diamos wrote in the blog post.
According to Lamini, AMD is using the startup’s platform to fine-tune LLMs “for numerous use cases” by the chip designer’s own employees.
“We’ve deployed Lamini in our internal Kubernetes cluster with AMD Instinct GPUs and are using fine-tuning to create models that are trained on AMD code base across multiple components for specific developer tasks,” said Vamsi Boppana, senior vice president of AI at AMD, in a statement.