• Zenity Labs
  • Posts
  • What You Don’t Know Can Hurt You: Why AI Security Research Needs to Move Out of the Lab and Into the Wild

What You Don’t Know Can Hurt You: Why AI Security Research Needs to Move Out of the Lab and Into the Wild

What we can learn from observing real attacks, made by real Adversaries

Looking back at the last 2 years in AI security, so much has happened. Systems have evolved unbelievably quickly from simple and basic chat applications and endpoints and into agents embedded in enterprise ecosystems that employees depend on; deployments of LLM runtime servers and AI proxies have become abundant; and there’s a plethora of open-source agentic frameworks and tool-calling agentic software and projects out there. All of these, where once there were mostly only SaaS-based agentic platforms, with relatively limited capabilities and integrations.

It’s clear that the AI attack surface keeps expanding rapidly, and that adoption continues to outpace security in these AI deployments and platforms. But are AI related vulnerabilities actually being exploited in the wild already?

AI systems are hot right now

The main problem now, we would argue, is that while AI agents & deployment are constantly being pushed everywhere - and compared to this incredible adoption, there hasn’t been any parallel process or significantly increased understanding and visibility on what & how adversaries are actually attacking AI in the real world.

There are, of course, important exclusions to this statement (such reports by GreyNoise & Microsoft, to name a few). However, for the most part, and compared to how much AI agents deployments are being integrated into our lives and workplaces, we defenders don’t seem to really know a whole lot more about how AI is being attacked than we did 2 years ago.

It seems that we, as a cybersecurity industry and defenders of AI systems, are still missing something.

Most AI security research is still lab-based at the moment

Current AI security research remains largely confined to the lab. Today, researchers are primarily focused on identifying vulnerabilities, developing prompt injections, and probing custom-built architectures within their own isolated, controlled environments.

While this work is important, it creates a critical intelligence gap. We are effectively mainly observing ourselves instead of also observing our adversaries. The industry has already confirmed that AI vulnerabilities are actionable and exploitable, so the next big challenge is visibility: we currently lack insight into how attackers are actually targeting AI systems in the real world.

Moving forward, the research community must shift its focus from theoretical vulnerability discovery toward the analysis of active, real-world threat actor behavior. Let’s face it - controlled environments can’t answer some crucial primary question for defenders, such as: what are adversaries actually doing in the wild? And unless we’re able to work towards answering this question, we’ll continue staying in the dark on practical and data-driven AI security insights.

Is anyone actually attacking AI?

From prior research, we already knew that agentic recon is a real risk on various existing AI platforms (more on that in the link below), and that for all we knew this type of recon could already be abused by threat actors in the wild. 

This wasn’t enough for us - we wanted to understand what recon campaigns are really out there - and what are the techniques used to run them. And to do this not only for recon campaigns, but for AI-targeted attack campaigns in general.

Here are a more few questions that we wanted to have answers for:

  • When a real attacker encounters an unauthenticated LLM runtime or an exposed agent on the public internet, how do they approach it?

  • Which surfaces do they probe first?

  • What type of AI-targeted attacks are actually out there?

  • How are attacks on AI systems distributed?

  • Are there preferred tools and techniques for attacking AI systems?

To start answering these burning questions and create some visibility into attacks on AI systems from real adversarial activity, we need to observe how attack chains play out on actual AI systems, with real attackers interacting with them.

One of the ways to tackle this challenge is by starting to create effective threat intelligence within this specific landscape. It must be AI-native, and generated by observing these surfaces on their own terms.

Show us what you got, attackers

To start creating AI-first threat intelligence which will help us answer these and other burning inquiries, we decided to take action, and deployed a network of decoy AI agents and infrastructure. Our goal was bridging the gap between theoretical research and real-world threats, and to take a closer look at how threat actors probe, exploit, and interact with AI systems in the wild.

This effort extends the existing work of Zenity Labs, shifting the focus from potential vulnerabilities for AI and agents in the lab to active exploitation in the field. This post is the first in a continuous series detailing that process, and sharing what we found after deploying this honeypot network.

What is a honeypot network for AI deployments?

A honeypot server is a decoy system designed to attract and monitor attackers. By emulating a susceptible target, while containing no sensitive data or risky functionality beyond what’s intended to be exposed to attackers and tested by them, every interaction in it becomes a potential signal. Create enough of these systems, and you have a honeypot network: a collection of sensors that captures attacker behavior as it happens. 

By focusing on creating honeypots that are dedicated for AI, we can log attacks that are tailor-made for scenarios that can make AI agents and infrastructure vulnerable in production: processing natural language, calling tools, allowing for unauthenticated inference, exposing APIs and the control plane, relying on guardrails, and more: anything from exploiting the underlying software via a CVE to abusing the LLMs themselves.

These honeypots are designed to both attract adversarial activity and maintain their engagement with the honeypot, allowing us to analyze adversary TTPs (tactics, techniques, and procedures) and generate visibility into their attacks. Through rigorous data analysis, we convert these raw interactions into actionable threat intelligence.

Set a Honeypot to Catch a Thief

We deployed a fleet of sensors, consisting of various decoy AI services spread across multiple cloud providers and global regions. Geographic and provider diversity matters: it helps us distinguish broad, internet-wide scanning from anything targeted, and it mirrors better how real AI infrastructure is actually scattered across the cloud.

Each sensor captures and fingerprints activity via a dedicated telemetry pipeline. The diversity in the sensor AI stack attempts to emulate the full breadth of organizational AI and agentic deployments out there, such as, among others: self-hosted inference servers, AI gateways, APIs, agents and MCP endpoints.

Operational security is essential as well. A decoy that's too permissive can be manipulated into harming other targets; while one that's too rigid, unappealing to attackers and obviously fake, teaches you close to nothing. One additional main challenge in this project is hardening the perimeter, controlling what a "successful" attack can actually reach vs. what it can’t, and walking the line between believable and contained. The goal is to create a vulnerable target, while keeping it hardened in ways that aren’t meant for the attackers to actually explore and exploit. This is also important in order to prevent the honeypot sensor from being effectively used in adversarial attacks due to their intended misconfigurations and configured vulnerabilities.

From attack surface to attacks in the wild

Not only did we discover that the deployed AI-focused honeypot network was almost immediately attacked, but the volume of unsolicited traffic and observed attacks continuously and greatly exceeded our initial expectations.

Without getting ahead of the posts that follow, we can already share a high-level preview. From day one, this network caught real attacks, not just scanning. We've seen multiple active CVE exploitation within one day of CVE publication. We watched attackers attempt to hijack exposed LLM endpoints to run their own models and agents on someone else's bill, pointing off-the-shelf autonomous hacking agents loose at hijacked backends (in one case aimed at a live website as a target), and go straight after known weaknesses in the platforms teams are rushing into production: from Ollama's unauthenticated, unpatched model-pull SSRF to exposed LiteLLM proxies.

This series is about what that activity means. Unraveling the playbooks attackers are employing on AI infra in the wild today.

Proof. Not only proof-of-concept

Why does this matter? Because we’re already seeing a variety of real world vulnerabilities and misconfigurations that threat actors are scanning for and attempting to exploit on real AI systems, outside of the lab:

And this is only the beginning.

We believe that being able to observe this type of activity as part of threat intelligence that’s focused on AI and agents is essential for AI security today.

AI agents and infrastructure represent a unique and rapidly evolving risk category, and by observing active exploitation meant specifically for it, instead of only hypothesizing it, we can be better equipped to identify emerging trends for this risk category, understand the state of attacks on AI today, and learn how to create better and more robust detections and defenses against these attacks.

Reply

or to participate.