The AI Danger Zone: ‘Data Poisoning’ Targets LLMs

When it comes to GenAI, the security of the AI models themselves is at risk as ‘data poisoning’ is increasingly taking aim at the training data that large language models rely on to generate responses and make decisions.

The future of generative AI all hangs on cybersecurity—but perhaps not in the way you’d think.

While the widespread concerns over data exposure, deepfakes, phishing and other threats related to GenAI are no doubt justified, some cybersecurity experts told CRN that there’s not enough attention being paid to the security of the AI models themselves.

The threat of “data poisoning” with large language models is very real and not something that traditional application testing will address, creating a massive challenge for the industry as well as major new opportunities for solution providers.

[RELATED: CRN’s 2024 AI Special Issue]

Data poisoning targets the training data that LLMs and other AI models rely on to generate responses and make decisions.

If data poisoning and other AI model training issues are allowed to proliferate, “the risk is that [organizations] will not be able to necessarily trust the integrity of their AI systems that are going to be probably running a vast amount of the enterprise moving forward,” Accenture’s Robert Boyce told CRN.

In other words, compromises of LLM training data could jeopardize the AI revolution in a much bigger way than any of the other GenAI-related threats that organizations are concerned about today.

“What we’re worried about in the future is manipulation of AI systems that are running major critical infrastructure components for us,” said Boyce, senior managing director and global lead for cyber-resilience services at Dublin, Ireland-based systems integrator Accenture, No. 1 on the CRN 2024 Solution Provider 500. For any organization looking to utilize LLMs or GenAI applications, “you need to pay attention to this.”

That’s not to say that the arrival of GenAI has been without its incidents related to data reliability. There was a case this spring of a New York City chatbot for small business owners that showed a poor grasp of the law, dishing out false information about when it’s legal to fire an employee, for instance.

Google’s recently launched “AI Overview” search feature, meanwhile, has been known to off er some very bad advice on matters of health—advocating behaviors such as topping your pizza with glue, staring at the sun for up to 30 minutes and eating one rock a day—just a small rock, though!

There’s no indication that any of these issues were caused by intentional manipulation, but security experts say they do illustrate the risks in the event that malicious actors seek to undermine LLM integrity.

Nicole Carignan, vice president of strategic cyber AI for Cambridge, U.K.-based Darktrace, said the industry has plenty of reason to expect hackers will seek to inject “intentionally wrong information to belittle [GenAI’s] effectiveness.”

“If there is a rampant increase in inaccuracies, you lose trust in the tooling,” Carignan said. And if that happens, “that would be a big economic impact to the tech industry because of the massive amount of investment that’s been put into large language models.”

Notably, recent cases of deliberate AI model poisoning have in fact been reported. JFrog researchers in February disclosed that they’d discovered roughly 100 malicious models containing backdoors that had been uploaded to the Hugging Face AI platform, for instance.

In addition to malware insertion and misinformation, other tactics that experts worry about include the potential for poisoning of AI-powered help desks or chatbots in order to direct users to phishing sites.

Apart from introducing spurious data into an AI model, another potential threat includes poisoning the model “weights” that are used by AI-driven apps for generating useful responses to queries in Retrieval-Augmented Generation (RAG) systems, said Anand Kashyap, co-founder and CEO of Fortanix, a data security vendor based in Santa Clara, Calif.

“You need to make sure that nobody has tampered with your model weights or the data that you have in your RAG system,” Kashyap said.

Without a doubt, the growth of GenAI going forward will be “based on human beings having confidence in the models,” said Paul McDonagh-Smith, senior lecturer in information technology at the Massachusetts Institute of Technology Sloan School of Management. And society as a whole is only going to gain that level of confidence “if we feel that the safety nets are in place,” he said.

The complexity of LLMs—and the many differences between GenAI-powered apps and other types of software—presents unique difficulties for security testing, however.

In the case of “classic” vulnerabilities, there’s usually a clear line around whether an application or system is vulnerable or not, said Casey Ellis, founder and CTO at Bugcrowd, a crowdsourced security firm based in San Francisco.

However, “with AI systems—especially when you start talking about testing for model poisoning or bias or any of those input data issues—it’s more of a fuzzy outcome,” Ellis said.

A major factor is that, by nature, LLMs are what’s known as “nondeterministic,” meaning that their outputs may differ even when they’re provided with identical inputs.

“They’re very different from your classic, regular web application. They require some very specific testing,” said Jurgen Kutscher, vice president at Google Cloud-owned Mandiant Consulting, Reston, Va. “It’s definitely opening a new area of possibilities and new types of thinking about security testing that needs to take place.”

In particular, difficulties in testing arise due to the fact that GenAI interactions are extremely “state-driven”—that is, unique to the conditions of the particular session, said Dean Teffer, vice president of AI at Arctic Wolf, a security operations platform provider based in Eden Prairie, Minn.

“It’s possible that a prompt that just happens to be risky was the product of the whole session or maybe, depending on how the chatbot works, prior sessions that you’ve done with that chatbot. And so it may not even be reproducible,” Teffer said.

The stakes are high, and not just because of the need for ensuring public confidence in AI models. It’s also a serious challenge to address any faulty data introduced into an LLM, according to Teffer.

“If data poisoning is discovered, it’s very difficult to identify and back that out. And the cost to train a new model is immense,” he said. “It’s very difficult to remove the data and then retrain.”

In addition, many new LLMs that are being released are iterations of older models, Teffer noted. If data poisoning has afflicted the older models, “it would be very difficult to rip that out,” he said.

Penetration testing, or pentesting, has traditionally involved probing systems to identify weaknesses and then attempting to exploit them. And while many standard application pentesting skills still apply and are necessary with AI, testing LLMs and GenAI-powered apps also requires a deep understanding of AI models, experts noted.

“You really need to have some level of data science background,” Accenture’s Boyce said. “You have to understand the models and how they work to be able to try and think what the vulnerabilities are going to be.”

Overall, organizations need to be testing “from top to bottom” when it comes to GenAI, he said.

“We’re not talking about just testing the models. We must be talking about all the way down to the infrastructure,” Boyce said. “The good news is, in security, we have a really good understanding of how to test from the application down. What we don’t really have a good understanding of is testing the new layer—the AI layer.”

Caroline Wong, chief strategy officer at San Francisco-based pentest-as-a-service platform Cobalt, said there’s no question that “to hack an AI app, a pen-tester has to have solid understanding of how LLMs work to conduct an appropriate test.”

“AI apps are effectively a new type of target,” she said. “And it’s in every hacker’s interest to learn how LLMs and AI work.”

At Denver-based Optiv, No. 25 on the CRN 2024 Solution Provider 500, there are plenty of conversations with customers right now on GenAI pentesting, according to Optiv Managing Partner Bill Young.

“In its simplest form, not a lot has changed. Pentesting always has been: Take something that’s supposed to behave in one way and attempt to make it behave in unintended ways,” Young said. “So whether you’re talking about data poisoning or something else to manipulate the model, the attack surface stays fairly similar.”

At the same time, “with the way we have to approach the attack scenarios—the ‘what are we trying to accomplish?’—the goals have changed,” he said. “The goals are very different in pentesting in AI than they are against pentesting a more traditional data source.”

For example, the dynamic nature of GenAI models adds complexity to the testing process. GenAI models and applications are “self-modifying on the fly,” Optiv’s Young said. “That means the ability to navigate to unintended behavior requires more attention to detail and more test scenarios.”

The tendency to think about LLM and GenAI app testing as an exercise in “red teaming”—which was a focus of last fall’s Biden administration executive order on AI safety—may not be the best mindset, either. Accenture’s Boyce noted that red teaming is typically associated with occasional, point-in-time testing.

In other words, “you do the test, you make adjustments and maybe you do the test again,” he said.

“What we want to avoid is everyone thinking that, ‘Oh, we built the GenAI system, we tested it in production during a red team [exercise], we moved it to production. We’re good.’ We want to make sure that’s not what people consider to be what good looks like,” Boyce said. “Because the test that we did today may have a different result two weeks from now.”

The demand for assistance with GenAI pentesting is expected to surge in coming years as more organizations get serious about adopting the technology, experts said.

For MSSPs such as Plano, Texas-based Cybalt, the conversations have already begun on the topic, although they are mostly preliminary at this stage, said Cybalt founder and CEO Khiro Mishra.

Cybalt currently offers pentesting and red-teaming services and “very soon we might have AI pentesting opportunities,” Mishra said. However, only larger organizations are already at the stage of needing such services, he said.

Usage of GenAI, Mishra said, is currently comparable to the level of cloud adoption in around 2017 or 2018. Still, once adopting GenAI “starts becoming more modular and easier to consume, this [opportunity] probably will become more pronounced,” he said.

Darktrace’s Carignan also expects that there will be an influx of new tools for the autonomous testing of AI. This is simply because, with GenAI apps having “infinite outputs,” a more automated approach is going to be obligatory, she said.

“How do you test the infinite outputs without some sort of a vehicle to be able to autonomously test it?” Carignan said. “We’re going to see a huge increase in products that provide some level of autonomous testing to start going through the exhaustive testing, evaluation, validation and verification.”

Other potential vulnerabilities in GenAI systems include what is known as prompt injection, whereby an attacker seeks to cause an app to generate an unintended response, such as revealing information that it’s not supposed to share.

Bugcrowd is tackling a number of AI integrity challenges with a pair of recently launched offerings, including an AI pentest and an AI bias assessment bounty program. For the AI pentest offering, several issues can be assessed, including data poisoning, according to Ellis.

Cobalt’s Wong said it’s clear we are “at the very beginning of an AI revolution” and that security providers have a central role to play in ensuring that it doesn’t become curtailed by malicious actors.

Wong said that progress in this arena is happening steadily, however, and she pointed to the release of a list of major vulnerabilities that can afflict GenAI-powered apps, the OWASP Top 10 for Large Language Model Applications.

“I think we as an industry are really developing the methodology for comprehensively testing these apps to identify known security vulnerabilities,” she said.

Ultimately, compared with some of the well-known security risks posed by GenAI, experts said that data poisoning is a less understood—and potentially larger—threat to the future of the technology.

As a result, there’s no question that organizations have more to consider with GenAI than just protecting their sensitive data from exposure and getting the necessary data to drive new AI-powered services, said Patrick Harr, CEO of SlashNext, an AI-powered email security vendor based in Pleasanton, Calif.

“As an industry, we have to take very seriously how we protect those models,” Harr said, “not only how we’re building the right datasets but also how we’re protecting the model from getting corrupted.”

Apart from intentional data poisoning, it also becomes increasingly crucial to make sure the data used to train AI models is accurate to begin with, according to Harr.

In other words, there is a risk that organizations might taint their own AI models by training them on faulty data, he said.

“You have to make sure you’re cleansing the data and make sure the data is clean, that you can get to accurate data so you’re making the right decisions on the right data,” Harr said. “At its core, the AI is only as good as the data that it’s making the decisions on.”

For the purposes of AI models for security, meanwhile, building a new dataset is also continually necessary to keep up with emerging threats, he said. And to that end, using AI to generate “synthetic” data is one way to help ensure accuracy, Harr said.

“How do you get enough data to accurately train your models so they can think on their own and react on their own? That is where you have to use synthetic data to train these models,” he said. “The power of GenAI is it can permutate very rapidly. [And] in our case, it has proven highly accurate—near-zero false positive rates by doing that approach.”