Kinetica Boosts Analytical Database With Native LLM
Kinetica began offering a ChatGPT interface earlier this year, but company executives said database query accuracy can be a problem with the open Gen AI technology and customers have expressed concerns about keeping proprietary data secure.
Kinetica continues to expand the artificial intelligence capabilities of its high-performance analytics database, unveiling a new native large language model this week that the company says allows users to perform rapid, ad-hoc data analysis on real-time, structured data using natural language.
The new capabilities follow Kinetica’s announcement in May that it had integrated its analytics database with OpenAI’s ChatGPT, making it among the first to use the popular generative AI technology for “conversational querying” that converts natural language questions into Structured Query language (SQL).
The new embedded LLM allows organizations to run language-to-SQL analytics with enhanced privacy and security and greater fine-tuning capabilities. It’s currently available in a containerized, secure environment either on-premises or in the cloud.
[Related: The 10 Coolest Big Data Tools Of 2023 (So Far)]
Kinetica develops its namesake distributed OLAP database for high-performance data analysis tasks, especially for analyzing huge volumes of real-time sensor and machine data. The new LLM allows users to perform ad-hoc data analysis on real-time, structured data at high speed using natural language.
The new LLM capabilities “can help people work with their data, understand their data and gain new insights from their data by opening up domains that were typically only the domains of SQL experts, experts in time series and spatial data,” said Phil Darringer, Kinetica product management vice president, in an interview with CRN.
“They can now write natural language and let the LLM and the database itself do the heavy lifting to convert that to an appropriate query, execute the query and deliver the insights,” he said.
While Kinetica continues to offer the OpenAI ChatGPT interface, Darringer said the new embedded LLM is intended to address issues that arose with the earlier ChatGPT interface. One was accuracy issues due to query syntax errors, including the invention of column names that don’t exist, that can work against the data model and deliver inconsistent results.
Chad Meley, Kinetica’s chief marketing officer, said the issues were most apparent with the types of complex queries you might see in the telecommunications industry where queries have multi-dimensional aspects including geographical data joins and time series data.
Another issue was around security when using the ChatGPT interface. Many companies and organizations have policies against using open Gen AI software and public LLMs because of the potential for proprietary and even sensitive data to become public.
“I think people just assume that anything that goes to that [open ChatGPT] service can end up becoming part of the general public capability and being reused. There just seems to be no desire to take that risk,” Kinetica CEO and co-founder Nima Negahban (pictured) said in the interview.
The new native LLM option is fine-tuned to the syntax and specific analytic functions of the Kinetica database so it develops more accurate SQL queries and delivers more accurate analytical results. And the embedded LLM is tuned for specific industry domains and taxonomies that Kinetica users typically work with. “It’s more accurate SQL now because we have more control over it,” Meley said.
The new LLM also addresses the security concerns of the open Gen AI technology because no external API call is required and data never leaves the customer’s environment, according to the company.
“This is a deployment option where the LLM is deployed, co-located with the database itself, so we live within the customer’s premises, within the same cloud perimeter, as where their Kinetica database would live,” Darringer said. “That metadata is still shared with a model, but since it’s all self-contained in that environment, it eliminates concerns of sharing that metadata with third-party services. So we see a lot of benefits from the security perspective.”
The native LLM’s tight integration with the Kinetica database also makes it possible to build queries that derive deeper information such as query log data.
The company already offers more than 300 pre-built connectors to data sources such as the Snowflake, Databricks and Google BigQuery platforms. Later this year the company will provide integration with other LLM platforms such as Nvidia NeMo.
The company said that the U.S. Air Force, a Kinetica customer, is already using the embedded LLM to help detect threats in airspace.