The 10 Hottest Big Data Startups Of 2024

Here’s a look at the 10 hottest big data startups of 2024, including DataPelago, MotherDuck, Onehouse and Unstructured.


Data has become a valuable asset for many businesses and organizations. They are analyzing it to gain insight about markets, customers and their own operations. They are using data to fuel digital transformation initiatives and support new data-intensive services.

And data—lots of it — is a critical component of AI and machine learning initiatives.

But wrangling, managing and analyzing data is a major challenge. The total amount of data created, captured, replicated and consumed is growing at more than 20 percent a year and is forecast to reach approximately 291 zettabytes in 2027, according to market researcher IDC.

That’s why there is a steady stream of big data startup companies developing leading-edge technologies to help businesses access, collect, manage, move, transform, analyze, understand, measure, govern, maintain and secure data.

Here’s a look at 10 big data startups that caught our attention in 2024 that we think solution providers should be aware of.

Ariga

Top Executive: Ariel Mashraki, Co-Founder, CEO

Database schema is how data is organized and structured within a database system, including data tables and how the relationships between different data elements is defined.

Database schemas must sometimes be changed when data is updated or the database needs to support new features and functionality in the applications that run on the database. And that can be a chore.

Ariga develops a database schema-as-code platform that software engineers use to define and manage database schema through code, reducing the complexity of schema changes and facilitating easier database management.

The company’s products include Atlas, a database schema-as-code tool, and the ent.go entity framework for the Go programming language.

Ariga was founded in 2021 and is based in Tel Aviv, Israel. In June 2023 the company announced a $15 Series A funding round and previously unannounced $3 million seed funding.

DataPelago

Top Executive: Rajan Goyal, Co-Founder, CEO

Startup DataPelago exited stealth in October, unveiling what the company describes as the world’s first “universal data processing engine” that can handle the complexity and volume of today’s data for what the company calls “accelerated computing” analytical and artificial intelligence workloads.

CEO Goyal says traditional data processing systems based on CPUs and basic software architectures cannot handle the complexity and volume of today’s data.

“Data is changing, the applications are changing and, most importantly, [IT] infrastructure is changing,” Goyal told CRN. “When you have three different disruptive trends coming all together, it requires you to step back and see what the next world looks like and what should be the data processing platform.”

To tackle the problem Goyal launched DataPelago in 2021 and assembled a “multi-disciplinary team” of people with expertise in system architecture, data analytics, cloud, SaaS, open-source development and other technology areas.

DataPelago’s universal data processing engine, which is being used by some customers on a pilot/preview basis, is designed to overcome the performance, cost and scalability limitations of current-generation IT systems. The system was built from the ground up to support GenAI and data lakehouse analytics workloads by employing a hardware-software co-design approach.

DataPelago, based in Mountain View, Calif., has cumulatively raised $47 million in seed and Series A funding from investors Eclipse, Taiwania Capital, Qualcomm Ventures, Alter Venture Partners, Nautilus Venture Partners and Silicon Valley Bank, a division of First Citizens Bank.

DeasyLabs

Top Executive: Co-Founder, CEO Reece Griffiths

AI models are only as good as the data that’s fed into them. DeasyLabs says its mission is to provide data governance to ensure that large language models run only on safe, relevant, high-quality data.

The startup develops a metadata orchestration platform for creating and embedding high-quality, customized metadata into their AI workflows, including retrieval-augmented generation and agentic frameworks.

DeasyLabs, founded in 2023 and based in New York, received $3 million in seed funding in 2023 and is backed by the Y Combinator with funding from General Catalyst, RTP Global and J12.

Diliko

Top Executive: Dave Albano, CEO

Diliko, which just emerged from stealth Nov. 7, has developed an agentic AI platform with automated data management and governance capabilities that the startup says reduces operational complexity and costs.

Based in Reston, Va., Diliko is targeting midsize enterprises in the data-heavy health-care, finance and logistics industries. The company says its service provides benefits for C-level executives, including CIOs, CFOs and chief data officers, and for those who work with data, including data engineers, data scientists and data analysts.

The cloud-based Diliko platform optimizes data management performance and eliminates the need for deploying and managing costly infrastructure. The service automates complex data management workflows using on-demand data integration, ETL (extract, transform, load) and orchestration and can synchronize data in real time across internal and external systems.

The Diliko platform also ensures data governance and security with cloud-native security capabilities including zero-trust architecture, end-to-end encryption and multifactor authentication.

Dymium

Top Executive: Denzil Wessel, Co-Founder, CEO

Dymium has developed a data access management platform that provides secure access to data “where it lives,” eliminating the costs and complexity of moving data into data warehouse and data lakes for analytical and AI tasks.

“The practice of copying data to provide teams with data in myriad formats, each with different access controls, policies and security requirements, has led to unprecedented complexity that impedes innovation and undermines security and governance,” says Wessel.

The Dymium Platform helps organizations cost-effectively manage data access requirements across rapidly proliferating data sources, enhancing their security posture and helping them comply with regulatory requirements.

The system uses a combination of zero-trust architecture, centralized access policies, real-time data transformation services, and AI and machine learning to deliver the right data to the right users in the right format.

The company, founded in 2022 and based in Los Gatos, Calif., exited stealth in March with $7 million in funding.

Mind

Top Executive: Eran Barak, Co-Founder, CEO

Startup Mind, which has developed next-generation data loss prevention technology, just emerged from stealth with $11 million in seed funding from YL Ventures. The company is based in Seattle.

The Mind platform incorporates AI and “smart automations” to monitor data events and identify, detect and prevent data leaks. The system discovers and classifies sensitive data—at rest, in motion and in use—across numerous IT workloads, including SaaS and GenAI applications, endpoints, on-premises systems and emails.

At the core of the system Mind AI is made up of hundreds of tailored algorithms and a proprietary AI engine to classify and categorize sensitive unstructured data, understand context-aware business views to determine risk severity, and take automated prevention and remediation actions when needed.

Mind was co-founded in 2023 by Eran Barak, who previously founded Hexadite (later acquired by Microsoft). He and the other co-founders previously served in leadership roles in the Israeli Military Intelligence Unit 8200.

MotherDuck

Top Executive: Jordan Tigani, Co-Founder, CEO

Startup MotherDuck launched the first release of its serverless MotherDuck Cloud Analytics Platform in June 2023, combining cloud and embedded database technology to make it easy to analyze data no matter where it resides.

MotherDuck’s software is based on the company’s DuckDB open-source, embeddable database. The cloud system simplifies the analysis of data of any size by combining the speed of an in-process database with the scalability of the cloud, according to the company.

MotherDuck makes the argument that most advances in data analysis in recent years have been geared toward large businesses and organizations with more than a petabyte of data while neglecting small and midsize companies with like-size data volumes.

Seattle-based MotherDuck was co-founded in 2022 by Google BigQuery founding engineer Tigani. In September 2023 the company raised $52.5 million in Series B funding, bringing its total financing to $100 million.

Onehouse

Top Executive: Vinoth Chandar, CEO

Onehouse provides a cloud-native, fully managed universal data lakehouse service that the company says is designed to ingest data from any source and can support all query engines. The system is based on the Apache Hudi open-source data lake platform.

Onehouse looks to help businesses and organizations resolve the problem of fragmented, siloed data—data that’s scattered across data storage systems, operational databases and data warehouse systems—on-premises and in the cloud.

In June the company launched additions to its product lineup with LakeView, a few lakehouse observability tool for the open-source community, and Table Optimizer for automating lakehouse optimizations. In August the company debuted a vector embeddings generator to automate embeddings pipelines as a part of the Onehouse managed ELT (extract, performance and load) cloud service.

Onehouse, founded in 2021 and based in Menlo Park, Calif. raised $35 million in a Series B funding round in June led by Craft Ventures with participation from earlier investors Addition and Greylock Partners.

Scoop Analytics

Top Executive: Brad Peters, Co-Founder, CEO

Startup Scoop Analytics emerged from stealth in June with its software for automating reporting processes and developing AI-powered business intelligence presentations and reports.

The software makes it possible for anyone with spreadsheet skills to collect data from any application, blend data from different sources and use it to create “visually compelling data stories” through slide presentations based on live data, according to the company.

Peters says Scoop’s mission is to “deliver data analytics in a form factor that doesn’t require a data team” and achieve the long-time goal of true self-service business intelligence.

San Francisco-based Scoop Analytics was founded by Peters and others who previously worked at business analytics software developer Birst. The company officially launched June 18 with $3.5 million in seed funding from Ridge Ventures, Industry Ventures and Engineering Capital.

Unstructured

Top Executive: Brian Raymond, Founder, CEO

Unstructured is getting noticed for its technology used to access, prepare and transform data, especially unstructured data such as documents and images, for use in the large language models that power AI and generative AI applications.

Amid the wave of AI development, organizations are wrestling with the task of cleaning up and readying huge volumes of data, especially unstructured data such as documents and images. The company says that more than 80 percent of enterprise data resides in documents and other unstructured files.

Unstructured’s platform and development tools make it possible to convert unstructured or “natural language” data into a format that’s ready for LLMs, vector databases and LangChain. The Unstructured system works with a range of difficult-to-use file types and formats including HTML, PDF, CSV, PNG, PPTX and more.

In July Unstructured, founded in 2022 and based in Sacramento, Calif., raised $25 million in Seed and Series A funding rounds led by Bain Capital Ventures and Madrona, respectively.