Startup DataPelago Exits Stealth, Debuts Universal Data Processing Engine For ‘Accelerated Computing’ Tasks
DataPelago says its new technology provides a data processing boost for advanced analytics and AI applications that require huge volumes of complex, structured and unstructured data.
Startup DataPelago exited stealth today, unveiling what the company describes as the world’s first “universal data processing engine” that can handle the complexity and volume of today’s data for “accelerated computing” analytical and artificial intelligence workloads.
DataPelago, founded in 2021 and based in Mountain View, Calif., also said the company had cumulatively raised $47 million in seed and Series A funding from investors Eclipse, Taiwania Capital, Qualcomm Ventures, Alter Venture Partners, Nautilus Venture Partners and Silicon Valley Bank (a division of First Citizens Bank).
“Data is changing, the applications are changing and, most importantly, [IT] infrastructure is changing. When you have three different disruptive trends coming all together, it requires you to step back and see what the next world looks like and what should be the data processing platform,” said Rajan Goyal, DataPelago co-founder and CEO, in an interview with CRN.
[Related: Meeting The Exploding Demand For Data: The 2024 CRN Big Data 100]
The exploding volume of digital data created each year is expected to reach 291 zettabytes by 2027, according to market researcher IDC, with much of that unstructured and semi-structured data. At the same, Goyal (pictured) says, traditional data processing systems based on CPUs and basic software architectures cannot handle the complexity and volume of today’s data.
Such limitations weren’t a problem when data analytics was largely focused on powering business intelligence dashboards. But businesses and organizations today are increasingly trying to extract more value and deeper analytical insights from data, including processing data for real-time analytical and generative AI-driven applications, Goyal said.
To tackle the problem Goyal launched DataPelago in 2021 and assembled a “multi-disciplinary team” of people with expertise in system architecture, data analytics, cloud SaaS, open-source development and other technology areas.
(Goyal himself has startup experience having worked as an architecture engineer at semiconductor startup Cavium between 2005 and 2016 and CTO at data center tech company Fungible between 2016 and 2021. Before that he held software engineering and development positions at Oracle and Cisco Systems.)
DataPelago’s universal data processing engine, which is being used by some customers on a pilot/preview basis, is designed to overcome the performance, cost and scalability limitations of current-generation IT systems and meet the needs of what the company calls “the accelerated computing era.”
“Some of our customers are spending hundreds of million of dollars on these use cases, and even a $50 million savings in a year is a huge value that we bring to the customer,” Goyal said.
The startup’s universal engine was built from the ground up to support GenAI and data lakehouse analytics workloads by employing a hardware-software co-design approach, according to the company. The engine is designed to work with today’s data stacks including CPU-, GPU-, TPU- and FPGA-based hardware; data processing frameworks such as Spark, Trino and Apache Flink; multiple types of data stores; and data processing platforms such as Snowflake and data lakehouses like Databricks, Goyal said.
“We want to build an engine which can provide the benefits to all software frameworks out there,” the CEO said.
The DataPelago engine leverages several open-source technologies including Apache Gluten, Meta’s Velox, and Substrait – the latter a cross-language specification for data compute operations. The engine is built on a platform comprised of three layers: a DataVM virtual machine, a DataOS and a DataApp.
The engine processes data in the most efficient way possible based on available hardware resources and the data being processed. The result, according to the company, is a unique architecture that enables the engine to process data one to two orders of magnitude faster than traditional query engines.
DataPelago says its processing engine is uniquely suited for use cases that are resource intensive, such as analyzing billions of transactions while ensuring data freshness, supporting AI-driven models to detect threats at wire-line speeds across millions of consumer and data center endpoints, and “providing a scalable platform to facilitate the rapid deployment of training, fine-tuning and RAG inference pipelines,” according to the company’s announcement.
“When data can be extracted as quickly as it’s generated, businesses can harness insights to make better decisions and operate more efficiently,” said Lior Susan, CEO and founding partner at investor Eclipse and a DataPelago board member, in a statement. “DataPelago’s universal data processing engine represents a paradigm shift that will unlock new possibilities in the worlds of supply chains, sustainable energy, the medical field, and beyond.”
Goyal said the latest funding will be used to accelerate the company’s development operations and grow the company’s early sales and go-to-market operations, including developing a channel. The company is currently working with a channel partner overseas and expects to recruit solution providers and systems integrators as partners.