Qubole Targets Data Lake Simplification Through Extended Informatica Alliance
New release of company’s data lake platform software also offers new capabilities to streamline the building, management and use of data lake systems.
Jump in, the data’s just fine!
Aiming to simplify some of the complex processes around data lake systems, big data software developer Qubole Tuesday unveiled a new release of its Open Data Lake Platform and announced expanded links with Informatica’s data integration toolset.
“We want to make sure data lakes are simple, open and secure,” said Qubole CEO Ashish Thusoo in an interview with CRN, citing the extended alliance with Informatica and the new capabilities in the Qubole R59 release of its flagship product.
Businesses and organizations are assembling data lakes, vast stores of unorganized, often unstructured data, as a way to derive value from the increasingly huge volumes of data they collect from internal operational systems, sales and marketing applications, and outside sources such as social media. Data that resides in both on-premises and cloud systems further complicates data lake management efforts.
[Related: The Big Data 100 2020]
But assembling and managing data lakes, let alone effectively tap into them for business intelligence, is complex. That’s given rise to software like the Qubole Open Data Lake Platform that automates data lake operations, such as data engineering and data processing, and provides users with self-service capabilities for ad hoc and streaming analysis while ensuring data governance and security.
Thusoo said data lakes are moving beyond the early adopter stage into more widespread use by large enterprises for data exploration and analysis, predictive analysis and machine learning tasks.
Informatica is a long-time player in data ETL (extract, transform and load) and data integration tools used to move data between operational systems and data warehouses and, more recently, data lake systems.
Qubole and Informatica have built tighter links between the Qubole Open Data Lake Platform and Informatica’s AI-based Data Engineering Integration tools for building and managing data pipelines at scale. Those pipelines are used to move data into on-premises and cloud data lakes.
The new integration provides the ability to do end-to-end metadata-driven data integration for building, orchestrating and processing data pipelines. That, according to the companies, makes it easier for businesses to migrate their on-premises legacy data lakes to the cloud, while lowering costs of those deployments.
“As enterprises look to do more with their data lakes and reduce data processing and infrastructure costs, they’re turning to cloud architectures to improve uptime, avoid vendor lock-in and gain price leverage,” Thusoo said. “With this Informatica integration, Qubole is answering that call to provide a robust and future-proof data management paradigm to support fast data lake adoption with a wide range of data processing needs.”
New capabilities in the Qubole R59 release include the ability of the platform’s Workbench toolset, used to compose and run queries to run across all three major cloud platforms: AWS, Microsoft Azure and Google Cloud Platform.
Qubole R59 also offers a new data visualization framework called “Qviz” that provides out-of-the-box data visualizations and rendering DataFrames with improved charting options. The software also provides new notebook workflows that allow users to stitch together ETL workflows in other notebook wrappers and a new user interface for installing and managing packages from custom channels.
“The trend this taps into is simplification of usage,” Thusoo said of the R59 enhancements. “All the themes are around usability.”
Qubole works with systems integrator partners that undertake data lake development and modernization projects, according to Thusoo, as well as systems integrators and solution providers that develop applications that run on data lakes.
Qubole’s technology and the new integration with Informatica “makes the job of the VARs and systems integrators much easier to bring in those workloads and deliver on these data lake projects,” the CEO said.