Starburst Makes Data Lake Analytics Push With Galaxy Additions
New Partner Connect offering streamlines the ability of partners and customers to integrate third-party products with the Galaxy platform.
Starburst is taking a deeper dive into the data lake.
Tuesday the data analytics company launched Starburst Galaxy as a full-featured, fully managed data lake analytics platform that serves as a single focus point for discovering, governing and analyzing data in and around an organization’s data lake.
While Galaxy originally debuted in February 2021 as a distributed analytics query engine, Starburst has added a range of new technologies and capabilities that elevate Galaxy to the level of data lake analytics platform.
Starburst also unveiled Partner Connect, a new portal within Galaxy that provides the company’s customers and technology, consulting and service partners with a single point of access for integrating third-party software with the platform, including tools for business intelligence and visualization, data storage, data preparation and transformation, and machine learning.
[Related: The Big DThe Big Data 100 2023ata 100 2023 ]
Boston-based Starburst, founded in 2017, develops its Starburst Enterprise data analytics technology based on the Trino distributed SQL query engine. Galaxy, also incorporating the Trino technology, was originally launched as the cloud edition of the company’s data analytics offering.
Galaxy provides cross-cloud querying functionality for discovering, exploring and assessing data that’s distributed across multiple clouds, geographical regions and data sources before moving it into a data lake or data warehouse.
“Trino and Starburst are most often touted for querying data across multiple sources, this idea of being able to federate [queries] across different regions, different architectures, different clouds,” Harrison Johnson, Starburst head of technology partnerships, told CRN in an interview.
“But interestingly enough, Trino was actually built, initially, for the express purpose of querying large volumes of data in a lake for anything from an ad hoc query to a batch job,” Johnson said. “And it evolved into also being a federation engine. So this idea of a data lake as a major center of gravity is a pioneering concept for Trino and therefore Starburst.”
The expanded vision for Galaxy and its extended capabilities will put Starburst in more direct competition with data lakehouse heavyweights such as Databricks, Snowflake, Cloudera, Amazon Web Services and Microsoft.
Johnson estimated that up to 90 percent of Starburst customers already have a data lake. He said that with its expanded capabilities customers can use Galaxy either as their data lake analytics platform—what Databricks and some others call data lakehouses—or to augment and extend the multi-cloud capabilities of an existing analytics system to tap into what he called “multiple centers of data gravity.”
Data lake systems today generally operate on one of several table formats including Apache Hudi, Apache Iceberg or Delta Lake Data—some championed by competitors in the data lake system market. Johnson said Galaxy enables “first-class connectivity” to any table and file formats that customers use.
Galaxy now includes Warp Speed, patented indexing technology that Starburst says increases query speeds against data lakes by an average of 40 percent and improves the performance of interactive analytics against data lakes. Warp Speed is based on data lake analytics acceleration technology Starburst acquired when it bought Israel-based Varada in June 2022.
The faster querying speeds will open Galaxy to new uses cases including embedded analytics, according to Starburst, while Galaxy features like cluster sizing, autoscaling and cluster types simplify matching infrastructure to workload needs.
Starburst also unveiled a public preview of Gravity, the company’s new centralized data access and governance technology for connected data sources including data lakes, relational and NoSQL databases, data warehouses and data marts. Gravity incorporates a number of Galaxy capabilities including its metastore, automated data catalogs, search, attribute-based access control, and the ability to create and share data products.
The addition of Partner Connect further extends Galaxy’s capabilities to work with data in and around a data lake by making it easier for customers and the company’s partner ecosystem to integrate third-party technologies with the Galaxy platform. Johnson said the long-term goal is to make it possible for partners and customers to implement an integration in four clicks or less on a self-service or “in-product experience” basis.
A number of Starburst partners helped develop the initial release of Partner Connect, including business intelligence and visualization software partners Amazon Web Services (QuickSight), Google Cloud (Looker), Microsoft (PowerBI), Metabase, Tableau Cloud, ThoughtSpot and Zing Data. In the data storage, preparation and transformation space Starburst worked with technology partners Tabular and dbt Labs.
Data transformation software developer dbt Labs has worked with Starburst for a number of years, CEO Tristan Handy told CRN in an interview. “As Starburst has matured as a platform, we’ve started working together more closely [and] we’ve tried to find ways to make it easier for customers to on-board the two products together,” he said of their collaboration.
Starburst is also extending Galaxy deeper into the predictive analytics space with an out-of-the-box connector to Hex Data, the collaborative data science and analytics platform developed by San Francisco-based Hex Technologies.
“With the Starburst Galaxy integration, Hex users will now get access to Starburst’s powerful engine to handle complex queries quickly and easily,” said Caitlin Colgrove, Hex co-founder and CTO, in a statement. “Starburst can connect to a wide variety of data sources, including data warehouses, data lakes and cloud-based data stores and we are excited for users to be able to query them all in one place.”
Consulting and service partners including Accenture, Deloitte and Slalom Consulting are developing accelerators—pre-developed solutions to help customers get started—that work with Galaxy.
“Our clients constantly highlight the amount of time their teams spend on building and maintaining integrations between tools in their stack. We can’t wait to see how Partner Connect in Starburst Galaxy gives data teams a big boost in tackling these challenges,” said Ashwin Patil, principal at Deloitte Consulting, in a statement.
Starburst’s new vision for Galaxy, combined with its new capabilities, provides partners with a reference architecture for future development, Johnson said. And the more complete packaging of what has been discrete Starburst technologies into a single platform will allow partners to focus more on developing value-add services—what Johnson called “the people and the process aspects of transformation”—rather than product integration work.