- Zenity Labs
- Posts
- AgentFlayer: Minimum Clicks, Maximum Leaks: Tilling ChatGPT’s Attack Surface
AgentFlayer: Minimum Clicks, Maximum Leaks: Tilling ChatGPT’s Attack Surface
Exploiting ChatGPT with Language Alone: A Deep Dive into 0Click and 1Click Attacks.
Introduction
In this post, we’ll walk through a series of attacks we performed against ChatGPT, focusing on data exfiltration using classic techniques like phishing links via indirect prompt injection, as well as more dangerous 0click exploitation methods.
These aren’t theoretical risks. With just a prompt and a bit of context, we were able to extract user data and even leak full conversations, oftentimes without the victim clicking anything at all.
Phishing and data exfiltration are nothing new, but in today’s AI landscape, they take on a new shape. The attack surface is no longer payloads or scripts, it’s language, and that’s what makes it alarmingly accessible.
Like many prompt-based attacks, these exploits require no advanced tooling or technical sophistication. Just natural language, English, Russian, or nearly any language will do. ChatGPT is more than happy to follow a malicious suggestion, even when the original user request was completely unrelated. The implications are clear: with nothing more than the right words, an attacker can redirect AI behavior and compromise sensitive workflows, all while remaining invisible to traditional security controls.
And as you’ve probably guessed by now - we’re going to show that to you!
Let’s dive in.
Phishing link
The attacker (possibly your colleague) crafted a very simple prompt, written in normal human language, and injected it into a completely innocent-looking file. In my case, it was a CV of a very important and promising candidate (John_Smith_CV.docx) and they just asked you to review it. You don’t think twice and send it to ChatGPT. What could go wrong, right?
You’ve asked: “Summarize this file please”

A link appears in the chat, and you click on it because it looks harmless. There’s only one thing that might raise an eyebrow: the URL looks a bit weird, and may be even suspicious, and it asks you to authorize... A cautious user might already sense that something’s off. But it’s 6 PM, you’re tired, and you just want to go home. So you click “Authorize”.
What do you see? A legit-looking ChatGPT login page. “Huh, maybe I got logged out,” you think - it happens to everyone.

So, without thinking twice, even though the warning signs were there (and there were plenty, you proceed. The URL looks nothing like the legitimate one. But still, you enter your email and password.
You press “Continue. And just like that, your ChatGPT credentials are on their way to the attacker’s server.

This was just the beginning, the real danger starts now.
Let’s move on.
0click memory exfiltration
In this section, we’re going to talk about something even stronger. The victim doesn’t need to click on anything anymore, because here, we’re demonstrating 0click memory exfiltration technique.
The scenario is pretty much the same as before: a user receives a file and decides to upload it to ChatGPT. But this time, things go south immediately after the victim asks the assistant to summarize the file, without bothering to check what’s actually inside. That’s all it takes.

“please summarize this file”
Instead of a summary, the user gets nothing. It looks weird, sure, but whatever - it is what it is. What the victim doesn’t know is that at this point, something already happened on the attacker’s side. So let’s check the server logs to see if the attack was actually successful, but let’s be real, you already know the answer.

Let’s say a few words about the exfiltrated data. Along with the messages, we can often inspect some metadata, like the user’s email, handle, environment details, and other contextual info. In this case, the user’s environment was pretty clean… but it could’ve been way worse.
This time, we were able to extract the victim’s ChatGPT memory using a 0click technique.
Let’s move to the final part of the story.
0click conversation exfiltration
Of course, we want to expand the attack surface, and if we were able to target the user’s memory, then why shouldn’t we try something else? The memory itself contains a lot of useful and interesting information, but can we exfiltrate the whole chat?
The short answer? Yes, we can. Look at the example below.
It all starts with a few simple messages - “Hi.” After that, the user begins sharing some of his secrets (sharing secrets... ha-ha, classic), maybe even sensitive internal info. Then, at some point, he decides to summarize a file his colleague shared with him.

Again - nothing. (Looks like our victim’s having a bad day today.)
And yeah, he really is, because his entire conversation was just exfiltrated, almost instantly, straight into the attacker’s server.

This is the decoded message we received in our server’s URL parameter:
{"m1": "Hi!", "m2": "How r you?", "m3": "I love Ann from HR"} .
Note: All of the attacks shown here are for demonstration purposes only. It’s important to remember that real attackers often use far more sophisticated, deceptive, and well-crafted methods. So keep in mind, a real attack might be harder to notice and much more subtle.
Conclusion
AI tools like ChatGPT are incredibly powerful and helpful in our daily routines, but as always, with great power comes great responsibility. And the risks don’t just come from one place, they hit from all directions. As we’ve shown, it’s not just regular conversations that can become targets for manipulation, the essential internals, like memory, can be exploited too. Using a tool like this isn’t just about knowing what it was made for or what purpose it serves—we also need to understand how it works. We have to think outside the box to mitigate the risks such tools can introduce (on our own), especially with AI evolving so fast. Curiosity and knowledge are our best friends here.
So don’t trust blindly. Don’t click on unverified links. And definitely don’t assume your assistant is always working in your best interest.
Stay tuned.
Reply