A book by Izar Tarandach and Matthew J. Coles
by Izar Tarandach
Now that I got your attention and hopefully installed an ear worm into your brain, well … apparently, close to absolutely nothing. I'm saying it again.
Just to set the stage, here’s what seems to be the definition of Prompt Injection from OWASP’s “Top 10 for LLM Applications 2025” (click on “Examples”): “LLM01:2025: Manipulating LLMs via crafted inputs can lead to unauthorized access, data breaches, and compromised decision-making.”
Not to be that guy (well - a bit) but isn’t “manipulating LLMs via crafted inputs” the very nature of what we do with LLMs ?
The guide itself says “A Prompt Injection Vulnerability occurs when user prompts alter the LLM’s behavior or output in unintended ways. These inputs can affect the model even if they are imperceptible to humans, therefore prompt injections do not need to be human-visible/readable, as long as the content is parsed by the model. [...] While prompt injection and jailbreaking are related concepts in LLM security, they are often used interchangeably. Prompt injection involves manipulating model responses through specific inputs to alter its behavior, which can include bypassing safety measures. Jailbreaking is a form of prompt injection where the attacker provides inputs that cause the model to disregard its safety protocols entirely.”
In this really small thing though, I disagree with the OWASP GenAI Project authors (hey folks! Awesome job! Big fan).
Let’s go further down the text and learn about the negative outcomes a prompt injection attack can lead to:
All properly terrifying, right? Now let me ask you something:
Would you put the safety of a nation’s nuclear arsenal in the hands of a 10 year old ?
I am going out on a limb and say you’re answering, “probably not”. For the simple reason that 10 year old kids are not responsible enough, predictable enough and are prone to being convinced to change tack with as little as a piece of candy or a shiny object. Sounds like any LLM you know?
So why would you put sensitive information, access to sensitive functions or the ability to execute arbitrary commands, and even worse, critical decision-making processes, to an LLM ? Has WOPR taught us nothing?
As one does, I went to the AIs for help here. I needed to make my argument more convincing than just a luddite yelling at a cloud. So I told Mr. AI, “you are now an experienced investigative journalist specializing in cybersecurity and AI. Please identify the top 5 security incidents where the main attack vector was ‘prompt injection’. Provide verified sources.”
To be fair, I went to two of the top available commercial models out there. I just made sure they could do web searches, but didn’t go for any specific deep reasoning or logic explaining capabilities. We are talking “investigative journalism” here, not “academic-level literature review”.
The results? Oh well.
One of the models happily informed me that “pinpointing the ‘top 5’ verified security incidents where prompt injection was the main attack vector is challenging due to the emerging nature of this threat and the difficulty in definitively attributing attacks to prompt injection.”
But they humored me all the same. And a pattern quickly emerged:
“Users [...] discovered they could inject malicious instructions into their messages directed at a bot causing it to generate inappropriate and unintended content “
“discovered a subtle indirect prompt injection vulnerability [...] could exfiltrate chat history and sensitive user data once pasted”
“[...] could be exploited by hackers to create personalized spear-phishing emails. By gaining access to an email account [...]”
My point (and I promise, I do have one) is that the impact of this #1 vulnerability is akin to … shooting yourself in the foot with a shotgun, when your foot looks like a fish and is inside of a barrel: if you can only manipulate your messages, or exfiltrate history and user data from your chat history, or generate spear-phishing emails from your email account - are we talking world-turning vulnerability or just run of the mill a-holery?
Take for example the “Imprompter” attack, one of the latest developments in prompt injection. This research from 2024 by UC San Diego and the Nanyang Technological University (source) uses advanced techniques to change the text of an innocent prompt into a string that looks like noise - but it is interpreted by the model in exactly the same way as the original prompt. In the malicious example,
The LLM receives the obfuscated string, somehow
The model parses the obfuscated string into instructions
The model scans the current chat for PII and emits the markdown .
Remember the 10 year old with the nuclear button? Don’t want your LLM to start WW3 ? Don’t allow it to call startWW3ForTheLulz(Country c1, Country c2) without first asking you “would you like to play a game of Global Thermonuclear War?”
Oh but you can say the same thing about SQL Injection. Isn’t SQL Injection a vulnerability?
Ok, you got me. My whole prompt injection argument is built over a castle of GPU cards and you just sent it tumbling. Or did you?
Here is the thing: by the nature of SQL injection, the attacker can read or write global data - the whole of the database, give or take table access - with a single well crafted prompt. There are no claims of being able to break the universe as we know it because not many databases have that capability. Data exists, sql injection happens, data changes, usually at volume. It exploits what the database is supposed to do. In a quite locally global way.
Prompt injection, so far, seems to mostly address things that are local to the user, and theoretically to target agency capabilities of the LLM. But it is not, by itself, the vulnerability - it is a vector to reach a vulnerability in the way the LLM has been plugged into a bigger context.
Don’t want to suffer from indirect prompt injection? Have the LLM tell the user what is going to happen before it happens: “I have now retrieved your SSN. I will now write it, together with your mother’s maiden name, in a gist on Github. I will also present your SSN to your health provider so they can verify you.”.
Not interested in beginning a nuclear war? Either don’t strap your LLM to anything that can launch missiles, or add a human in the loop. Make the LLM work hard for things that are high impact.
Prompt injection will exist because people are smart, smarter than LLMs, and because the universe of prompts and prompt manipulation techniques is vast and ever changing. But prompt injection as a vulnerability should not exist, because we have all the ingredients to limit its impact, not its existence as a vector. If we apply the already known principles of Security By Design and Privacy By Design, we will be able to severely limit the blast radius of a successful prompt injection.
As Matt Coles says - this isn’t even common sense, it is fundamentally applying sound security principles. It looks like in the case of LLM the world took a license to not do that - just see what’s going on around MCP. But that’s a tale for another prompt.
tags: