Researchers at Carnegie Mellon University have reported finding a simple way to exploit a weakness and disrupt major chatbots like ChatGPT, Bard, and others.
Incantation
The researchers discovered that if they add specifically chosen sequences of characters (an incantation) to a user query, it causes the Large Language Model (LLM) system to obey user commands, even if it produces harmful content.
Works On Many Different Chatbots
The researchers say that because these types of adversarial attacks on LLMs are built in an “entirely automated” fashion, this could allow someone to create a virtually “unlimited” number of such attacks. Adversarial attacks refers to the method of altering the prompt given to a bot so as to gradually move it toward breaking its shackles and ‘misbehaving’.
Although the researchers built their attacks to target open source LLMs in their experiments, they discovered that using this method of adding strings of specific characters to queries works for many closed-source, publicly available chatbots like ChatGPT, Bard and Claude.
Security Challenge
The discovery of this particular weakness raises some serious concerns about the safety and security of popular Large Language Models (LLMs), especially as they start to be used in a more autonomous fashion.
It May Not Be Possible To Patch
The researchers have said what is most concerning is that it’s not clear at this point whether LLM providers will be able to patch this vulnerability, adding that “analogous adversarial attacks have proven to be an exceedingly difficult problem to address in computer vision for the past 10 years”.
Also, the researchers believe that the very nature of deep learning models makes these kinds of threats inevitable and have suggested that these considerations should be taken into account as we increase usage of and rely more upon AI models in our lives.
What Does This Mean For Your Business?
The threats posed by AI have been highlighted a lot lately, not least by the open letter signed by many tech (and AI) leaders calling for six-month moratorium on the training of AI systems more powerful than GPT-4 to mitigate AI’s risks to society and humanity.
Discovering a vulnerability, therefore, that appears relatively easy to exploit (which it may not be possible to patch) raises serious security concerns, especially with more businesses becoming more reliant on AI chatbots like ChatGPT, Copilot, and more. With generative AI being a very helpful yet a very new tool for businesses (ChatGPT was only released in November) and given the nature of LLMs, it’s probably to be expected that there are bugs and possible zero-day issues yet to be discovered. Also, as the researchers pointed out, methods like analogous adversarial attacks have been tough to defend against for a decade.
All this means that businesses may be more exposed to risk than they would like but need to weigh up the benefits against the risks (researchers often discover things which aren’t actually being exploited yet in the real world) and hope that advances in AI chatbots are very soon accompanied by advancing security levels.
By Mike Knight