New Alluxio Release Boosts GPU Utilization, Data Management Performance For AI/ML Applications

The new release of the Alluxio Enterprise AI data orchestration platform makes it easier to use GPU-based systems for training and operating AI applications and to provision AI/ML systems with data at HPC levels.

Alluxio has launched a new release of its data management platform for AI tasks that can better utilize high-performance GPU systems, helping businesses and organizations turbocharge their data-intensive AI workloads.

The new 3.2 release of Alluxio Enterprise AI also offers enhanced data I/O capabilities and improved performance with HPC (high performance computing) storage systems to boost AI processing.

“This release is all about GPUs and, specifically, data loading performance and storage performance for different GPUs,” said Adit Madan, Alluxio director of product, in an interview with CRN.

Alluxio, headquartered in San Mateo, Calif., develops its flagship Alluxio Enterprise Data platform for a range of big data management tasks. In October the company debuted Alluxio Enterprise AI, a new system built on the company’s core data orchestration technology specifically for data-intensive AI and machine learning tasks.

Madan noted that in provisioning AI and machine learning applications, many organizations have separate compute and storage infrastructure dedicated to developing and training AI/ML applications – including the use of deep learning frameworks such as PyTorch and TensorFlow – and for when those applications are running in production.

So while an organization may use high-performance GPU-based systems for training AI/ML systems, the AI/ML applications may not be able to tap into that compute power once in production, Madan said.

The Alluxio platform provides the needed flexibility to access those GPU resources as needed. “With the scarcity of GPUs, this is something that we've seen resonating with a lot of the early customers for this product,” Madan said. The company says the 3.2 release provides more than 97-percent GPU utilization in large language model training benchmarks.

The new release also provides enhanced data I/O performance, achieving up to 10GB/second throughput and 200K IOPS for AI/ML applications. The new software also delivers data storage performance the company says is comparable to HPC storage, based on MLPerf benchmarks, without the need for additional HPC storage infrastructure.

Alluxio Enterprise AI also offers a new file system API, an FSSpec implementation, for Python applications, a move that expands Alluxio’s interoperability within the Python development ecosystem. And new advanced cache management functionality provide administrators with more data management control.

"By achieving comparable performance to HPC storage and enabling GPU utilization anywhere, we're not just solving today's challenges, we're future-proofing AI workloads for the next generation of innovations,” Madan said.