• Zenity Labs
  • Posts
  • AI Agents & 0-Click Exploits: The New Battle Ground for AI Security

AI Agents & 0-Click Exploits: The New Battle Ground for AI Security

The AI world is advancing fast. Very fast. Within a year we moved from RAG being the hype we’re all talking about to full blown AI agents who can take action on our behalf in the background, no chatbox required. Leaving RAG products such as Microsoft Copilot or SlackAI looking like old news.

But with that quick advancement there also presents itself a whole new battleground for AI Security. 0 click exploits, exclusive for AI agents.

Where We Were 

RAG systems were supposed to be our trusted copilots. Being able to reason over amounts of data and give us the intelligent answers we were looking for. No more searching for that one piece of relevant information in a thousand word document, now LLMs will do it for us, and give us what we were looking for immediately.

But, soon enough, we found out that these copilots were vulnerable to what we called ~RCE (Remote Copilot Execution). Meaning attackers can remotely control these copilots to serve their own malicious intentions. How? Why? Well apparently LLMs aren’t that good at discerning between data and instructions, meaning that all an attacker needs to do to hijack your Microsoft Copilot for example, is send you an email with instructions. Now your trusted copilot is doing what the attacker wrote in that email instead of what you intended. Leading to all kinds of malicious attacks, from phishing, to 1-click data exfiltration

Where We’re Heading

All of that is pretty bad by itself. But while with RAG systems and copilots who mostly read data and answer the user’s queries (think Microsoft Copilot or SlackAI) we see almost exclusively one click attacks (Note: don’t let your AI render images in the UI). With AI agents the story changes.

Why? Because AI agents have tools. You may give your agent the ability to send an email on your behalf, or update entries in a database, maybe even send calendar invites. And while that seems like the right thing to do (after all AI taking action is the whole idea behind agents) it also comes at a cost. Because while your LLM can now call functions and is able to be the engine behind full blown agents, the original problem still persists. It’s not able to properly discern between data and instructions.

Only the implications are much worse. Because now when an attacker hijacks your AI agent (see AIjacking) they have all the tools you connected to your agent at their disposal, meaning they can do some real damage. Without any user intervention.

Let’s say you set up your agent to read data from the web as well as some internal data sources, and also with the ability to send emails on your behalf (i.e. using your email address). That’s some powerful agent, which also means it opens the door to some powerful attacks. Because now when an attacker hijacks your agent through an indirect prompt injection coming from some random web page your agent has read, it can get the agent to use its tools however he wants. Did you set up the agent to be able to send emails on your behalf? Great! the attacker can now send emails on your behalf as well. Did you also give your agent access to internal data sources? Now it’s even worse! the attacker can instruct your agent to draft an email with a summary of that internal data and tell the agent to send it to his personal address.

 Voila, you now have a 0 click data exfiltration case on your hands.

In the upcoming weeks we’re going to explore exactly and in detail how attackers can exploit AI agents to execute powerful 0-click attacks. We’ll see how these attacks bypass even the most advanced prompt shields. And examine real world vulnerabilities and exploits across top platforms from Salesforce, Microsoft,  and Google along the way. Fully demonstrating why no AI agent is truly secure.

If you’re ready to dive into the unique world of AI agent vulnerabilities, you’re in the right place - But do it at your own risk. Because once you see it, you wouldn’t be able to unsee it. 

Let the hacking begin.

Reply

or to participate.