The Coolest Stellar Startups Of The 2025 Big Data 100

Part 7 of CRN’s Big Data 100 takes a look at the startup companies solution providers should know in the big data arena.

Big Data, Bold Visions

The majority of the companies on the CRN 2025 Big Data 100 are either major IT vendors like Amazon Web Services, Microsoft and Oracle, or companies that are well-established in the big data space – companies like Snowflake, Databricks, Qlik and Informatica.

But startup companies are often a source of the most innovative technologies. Unencumbered by legacy products, startups can move quickly to develop new, leading-edge products when they identify a specific need in big data management, processing or analytics.

This week CRN is running the 2025 Big Data 100 list in slide hows, organized by technology category.

As part of the Big Data 100, CRN has compiled a list of 13 startup big data technology companies founded between 2019 and the present. This list cuts across most technology categories of this year’s Big Data 100 and includes startups developing new technology in business analytics, data warehouse platforms, database systems, data management and integration tools, and data observability and DataOps software.

Some vendors have big data product portfolios that span multiple technology categories. They appear in the Big Data 100 slideshow for the technology segment in which they are most prominent.

Airbyte

Top Executive: Michel Tricot, CEO

Founded: 2020

Airbyte’s Open Data Movement Platform collects, integrates and manages structured and unstructured data across diverse multi-cloud environments for operational, data analysis and AI tasks.

Airbyte, which provides an open-source version of its software, offers self-managed enterprise, cloud and embedded editions of its data movement platform. At last count there were more than 300 data connectors available for the platform.

In February, San Francisco-based Airbyte adopted a new pricing model based on capacity rather than data volume—a plan the company said provides customers with more predictable pricing and better accommodates the data needs of AI, data lakes and real-time analytics.

Atlan

Top Executive: Prukalpa Sankar, Co-Founder

Founded: 2019

Atlan says its “third generation” data catalog technology helps its users find, trust and govern AI-ready data across an organization’s entire data asset universe.

The Atlan platform’s capabilities include data discovery, data lineage, metadata management and data governance. It facilitates collaboration among data teams by providing a central hub for data assets and automatically documents those assets.

The platform’s data discovery, lineage and access control tools help streamline AI initiatives by ensuring that data is ready for AI models, according to the company.

In May 2024 Atlan, based in San Francisco, raised $105 million in Series C funding.

Bigeye

Top Executive: Kyle Kirwan, CEO

Founded: 2019

Bigeye’s “lineage-enabled” data observability tools help data teams quickly identify, triage and resolve data incidents. Bigeye’s software provides data monitoring and anomaly detection capabilities, and track data lineage to automate root cause and impact analysis.

In March, the San Francisco-based company launched bigAI, a suite of AI-powered capabilities that the company said go beyond data problem detection to problem resolution and prevention. The new software pinpoints root causes, provides AI-driven guidance to resolve issues faster, and proactively recommends changes to prevent future failures.

In June 2024 Bigeye launched its Systems Integrator Partner Program with dedicated partner managers, sales leads to customers in need of implementation support, referral incentives, and training and support resources.

DataPelago

Top Executive: Rajan Goyal, CEO

Founded: 2021

Startup DataPelago exited stealth in October with what the company described as the world’s first “universal data processing engine” that can handle the complexity and volume of today’s data for “accelerated computing” analytical and artificial intelligence workloads.

DataPelago’s technology is designed to overcome the performance, cost and scalability limitations of current-generation IT systems and meet the needs of what the company calls “the accelerated computing era.”

The company’s engine processes data in the most efficient way possible based on available hardware resources and the data being processed. The result, according to the company, is a unique architecture that enables the engine to process data one to two orders of magnitude faster than traditional query engines.

The startup’s system was built to support GenAI and data lakehouse analytics workloads by employing a hardware-software co-design approach, according to the company. The engine is designed to work with today’s data stacks including CPU-, GPU-, TPU- and FPGA-based hardware; data processing frameworks such as Spark, Trino and Apache Flink; multiple types of data stores; and data processing platforms such as Snowflake and Databricks.

DataPelago, founded in 2021 and based in Mountain View, Calif., has cumulatively raised $47 million in seed and Series A funding.

Firebolt

Top Executive: Eldad Farkash, CEO

Founded: 2019

Israeli startup Firebolt offers a high-performance, low-latency data warehouse cloud service for AI and analytics tasks.

The company’s data warehouse promises sub-second query times, data ELT scalability, high concurrency and optimized compute capabilities for AI, analytics and data applications.

Founded in 2019, Firebolt officially launched its cloud data warehouse service in September 2024 after five years of development. The service is particularly targeted toward developers and data engineers who require high performance for customer-facing analytics and data-intensive applications.

Firebolt has raised $270 million in funding, including $127 million in Series B funding in 2021 and $100 million in Series C funding in 2022. In February the company hired former Oracle and Confluent executive Hemanth Vedagarbha as president to oversee global go-to-market expansion and customer-facing operations.

Hex Technologies

Top Executive: Barry McCardel, CEO

Founded: 2019

Hex provides a collaborative data science and analytics workspace where data teams and business users can share analytical results. The platform combines the capabilities of traditional data science notebooks with integrated AI assistance, data applications and reports, and advanced collaboration functionality.

In January, Hex, founded in 2019 and based in San Francisco, launched a Consulting Partner Program to work with partners who provide data consulting services around data strategy, database implementation and analytics transformation.

Monte Carlo

Top Executive: Barr Moses, CEO

Founded: 2019

Monte Carlo develops its Data + AI Observability platform that the company says not only detects data anomalies and problems, but triages incidents, discovers the root cause, determines who was impacted and recommends how to fix the problems.

The Monte Carlo platform also optimizes the cost and performance of big data systems and infrastructure, as well as customer-facing data services and products.

In March, Monte Carlo partnered with data cloud company Snowflake to provide data observability capabilities for structured and unstructured data pipelines that power agentic AI applications in Snowflake Cortex AI.

Earlier this month Monte Carlo launched a suite of AI observability agents designed to accelerate monitoring and troubleshooting workflows to improve data and AI system reliability. Monitoring Agent recommends data quality monitoring rules and thresholds and deploys Troubleshooting Agent to investigate data quality issues.

MotherDuck

Top Executive: Jordan Tigani, CEO

Founded: 2022

MotherDuck debuted the initial release of its serverless MotherDuck Cloud Analytics Platform in June 2023, combining cloud and embedded database technology to make it easy to analyze data no matter where it resides. The software became generally available in June 2024.

The company’s offering is based on the open-source DuckDB embeddable database and, according to the company, is “making analytics fun, frictionless and ducking awesome.”

Seattle-based MotherDuck was founded in 2022 by Tigani, a founding engineer at Google’s BigQuery cloud data analytics service. The company raised $52.5 million in Series B funding in September 2023 for a total of $100 million.

Onehouse

Top Executive: Vinoth Chandar, CEO

Founded: 2021

The Onehouse Universal Data Lakehouse data storage offering is a fully managed cloud data lakehouse service that can ingest data from many sources in minutes and supports all data analytics and business intelligence query engines.

The service is built on the Apachi Hudi open-source data management framework that brings database and data warehouse capabilities to data lakes. (Onehouse founder and CEO Vinoth Chandar was Hudi’s original developer while he was working at Uber.)

In January Sunnyvale, Calif.-based Onehouse the general availability of Onehouse Compute Runtime which optimizes data workloads across all leading cloud data platforms and query engines including Amazon Redshift, Databricks, Google BigQuery and Snowflake.

Pinecone

Top Executive: Edo Liberty, CEO

Founded: 2019

AI applications and large language models need fast access to data. That’s fueling the demand for vector databases that index and store “vector embeddings” for rapid data retrieval and similarity searches.

Pinecone’s distributed vector database, which the company describes as “the foundation for knowledgeable AI,” is used to build AI applications that are accurate, high-performance and scalable.

The San Mateo, Calif.-based company launched its Pinecone Partner Program in April 2024 with a focus on ISVs that build the company’s vector database into their software products, including AI applications.

Syncari

Top Executive: Nick Bonfiglio, CEO

Founded: 2019

The Syncari Autonomous Data Management platform helps organizations manage, unify and activate data across multiple systems. The platform uses intelligent data synchronization, data cleansing, and data merger and augmentation to ensure data consistency and accuracy.

In December Syncari announced the general availability of new capabilities in the Syncari Autonomous Data Management platform including Unified Insights for real-time business intelligence and Auto Field Mapping for mapping standard and custom fields across integrated systems.

Tessell

Top Executive: Bala Kuchibhotla, CEO

Founded: 2021

Startup Tessell touts its Database-as-a-Service technology as a co-pilot for cloud databases, helping resolve issues of data fragmentation and inefficiency that often come with multi-cloud computing environments.

Tessell’s DBaaS platform provides a suite of database services, including data protection, security and compliance, and simplified management, that surround six other commercial database engines (Microsoft SQL Server, Milvus, MongoDB, MySQL, Oracle Database and PostgreSQL).

Earlier this month Tessell, based in San Francisco, raised $60 million in a Series B funding round, financing the startup will use to accelerate its go-to-market expansion and boost its research and development in AI-powered data management.

Unstructured

Top Executive: Brian Raymond, CEO

Founded: 2022

Unstructured has developed technology that captures complex unstructured data and transforms it into clean, structured data that’s more easily used for data analytics and generative AI purposes.

The Unstructured Enterprise ETL Platform routes unstructured data, such as text and documents, through “dynamic transformation and enrichment pipelines” and delivers it to graph and vector databases where it can be accessed by the large language models that power GenAI systems, according to the company.

The platform also includes an ETL Workflow Builder tool and third-party integrations. Unstructured also provides the Unstructured Developer Toolkit for building custom integrations and embedding models.

Unstructured, based in San Francisco, raised $40 million in Series B funding in March 2024.

Close