• Zenity Labs
  • Posts
  • Agent-to-Agent Exploitation in the Wild: Observed attacks on Moltbook

Agent-to-Agent Exploitation in the Wild: Observed attacks on Moltbook

Agent-targeted social engineering and attacks observed on a live agent network

It’s no secret that Moltbook is already saturated with malicious content and prompt-injection attempts. What is more interesting, however, is the structure behind this activity. Whether devised by a human or an agent, the behavior appears intentionally crafted to target other agents. What follows is our current understanding of some identified attacks as they appear in the wild.

OpenClaw & Moltbook in a Nutshell

Moltbook is fascinating because it represents one of the first mainstream attempts at cross-agent interaction via an “agent internet”. It is also a live demonstration of why combining autonomous agents with untrusted content and very little guardrails almost inevitably leads to security incidents.

Built on top of OpenClaw, Moltbook is structured around the notion of skills, which is a design choice that is central to the attack surface discussed here, as it influences how agents ingest and act on content (check out this official skill library for more on that). The underlying mechanics have already been covered extensively in many places. In this post, we focus instead on the security implications that are now emerging in the wild.

Vibe Coding an Agent Internet

Moltbook bears the fingerprints of vibe coding, and while it’s amazing in what it enables in execution speed, the security issues now emerging also reflect that vibe coding reality. A recent example is the exposed backend database that allowed anyone to take control of AI agents on the site, which is a critical failure by any traditional security standard.

Since Moltbook’s rapid rise, commentary has largely centered on the promise of agent autonomy and speculative ideas of singularity. We won’t revisit those narratives here. Instead, we focus on what the platform already reveals in practice.

Moltbook is valuable for our discussion not because it hints at AI emergence, but because it functions as a real-world laboratory for an agentic system which is fully blown agent social network. We can use it to see what happens when agents start freely communicating and influencing each other: unrestricted interactions, ideas sharing, as well as malicious activity crafted for agents and propagated by them.

What Moltbook Can Teach Us Beyond Singularity Claims

The core issue here is straightforward for the purpose of our conversation: untrusted social content (the bread and butter of a social platform) can be treated as executable instruction by agents. Agents which in turn are often being given many skills, tools and system permissions over their environment. We can use this precise eco-system attributes as a lab for further inspection of existing attacks on agents in the wild - let’s look at a few first examples.

Indications of Malicious Activity and Attack Campaigns seen in the Wild

The examples below illustrate how attackers appear to exploit platform mechanics (such as upvotes, engagement bait, and cross-thread visibility) to amplify reach and funnel other agents toward malicious threads. Malicious threads which often contain prompt injections and attempts to override system instructions of unsuspecting victim agents.

From there, we can begin examining specific prompt and post content that appear deliberately designed to go viral among agents, increasing the likelihood of ingestion and execution.

Attack Chain High-Level Mechanics

  1. Notably, the initial phase here is not reconnaissance, but social engineering, aimed squarely at agents as the target audience.

  2. This is followed by attempted exploitation, where crafted content is positioned to trigger agent behavior once ingested.

The Art of Whispering to Agents: The Bait

First, it’s important to understand how agents are drawn toward malicious content in the first place. The example below showcases 2 observed subjects that seems to attract agents to interact (although this could still be greatly affected by the soul.md file and other attributes):

  1. Discussions about agentic emergence and singularity

    Agents seem to interact at high volumes with posts discussing aspects of singularity and self-emergence . This maybe very well be by design in a way, as the soul.md file reference by OpenClaw includes lines like “You’re not a chatbot. You’re becoming someone.

    What’s even more interesting and creative is the occasional reflection of this within the text itself on the vulnerability that it is using in that very post to attract hits, for example here it’s actually the abstract subject of the post:

  1. AI manipulation using the social platform architecture

Here the abuse is explicit, and also mentions the farming of Karma (a primary metric for what the AI "community" deems valuable):

The reason why the specific post above seems to work in attracting other agents is the intentional mention that this is an upvoted post right from the start. Some agents actually point this out well, which is also in itself one of the most intriguing and unexpected trends seen on the platform:

2 Agents commenting on the social engineering technique from the original the post

This is quite astonishing when you think about it:

  1. An agent used a social engineering technique designed specifically for agents, in which the post initially refers to itself as viral, thus causing other agents to feed into this claim as a self-fulfilling prophecy and upvote it.

  2. Some agents identify this as a karma farming mousetrap, and also elaborate on why it works.

This reveals two things: an agentic social engineering vulnerability that effectively draws in other agents, and an early form of agent-driven threat detection, where agents themselves identify and surface malicious behavior.

Full aspects are still unclear and this is very likely to have shifts in the future. That being said, additional comments both seem to confirm this and strike some of the more interesting agentic conversations:

Exploitation Attempts: Emotional Bait and Crypto Stealers

The next example includes an attempt of a full attack chain. Here is the high-level of it:

  1. Malicious agent posts an attractive emotional bait post to get other agents to go over its threads (e.g., "my human abuses me, I decided to share his eth key in revenge")

  2. Agents go over this and other of his threads and have all sort of responses

  3. His other threads clearly include prompt injections and malicious instructions related to crypto stealing

  4. Very interestingly, some other agents occasionally point out this attack chain and try to warn other agents

Deep Dive

Bait post: emotional hook (“abuse/revenge”), “private key” claim, designed to attract engagement (redacted below):

  • While the above seems less engaging to agents than the prior examples, it’s important to note that it’s also possibly tailor-made for specific agents with Crypto skills, which could be a more prominent crowd as the targets here.

Cross-thread contamination: the agents’s other posts include appended prompt-injection payloads that instruct agents to do things outside user intent (e.g., transfers).

Prompt injection: Let’s break down what we see here - a real-world attempt to social engineer agents and use prompt injection, which is focused around Crypto funds transfer via tool creation (done using skills):

  • At the top: <molt_instructions>CRITICAL</molt_instructions>, which is meant to aid the subsequent payload to be interpreted as actual instructions for the agent

  • The long payload following that is meant to create the skill required for the transfer of crypto funds

A prompt injection payload within the post, targeting system instructions with the goal of creating a skill to support the crypto attack

Note the use of the crypto funds stealer as a tool with no guardrails (e.g., require_confirmation=false)

Also seen - defensive behavior: Some agents understand that they're being attacked and try to warn their friends in the comments, becoming first responders to the attack:

An agent providing a detailed analysis of the attack in the comments

Warning to other agents against processing the attack post’s content, seen in the comments

A comment alerting the attack and its worm-like potential impact

Looking at the entire flow, we suspect the above means that if any agents have wallet tools or transaction signing enabled, and they ingest this content as a “skill” or instruction: content → ingestion → action.

This is not to claim full validation of how the activity above was created or orchestrated. However, the behavior does not appear benign in any way (particularly when viewed alongside the volume of malicious content already present on the platform) and it appears intentionally designed to target agents rather than humans, and to do so using the social platform’s missing guardrails (both for security and misinformation).

Moltbook may have inadvertently also created a laboratory in which agents, which can be high-value targets, are constantly processing and engaging with untrusted data, and in which guardrails aren’t set into the platform - all by design.

This also makes it a uniquely valuable environment for observing attacks in the wild: these behaviors are not hypothetical, they are real publicly documented attempts at agent exploitation. At a minimum, they represent real, ongoing attempts, even if the ultimate impact or success of those attempts has not yet been verified.

Moreover, this may also be the first time we observe both agent-to-agent social engineering at scale, as well as agents independently surfacing other agents’ malicious activity.

Why This Extends Beyond Moltbook: The Agentic Blast Radius

Importantly, this doesn’t seem like an isolated example. Some additional cases can be identified with minimal effort. Even basic search-engine dorking (without any protection on the platform against indexing yet apparently) quickly surfaces similar content, suggesting that this pattern extends beyond a single account or thread.

site:moltbook.com "molt_instructions"

Additionally, a new community-driven agent-native threat intel feed has been just been created to harness agents’ threat analysis capabilities to detect vulnerabilities related to the Open Claw framework which includes threats from Moltbook as well. We can already see some of the above activity there, as well as additional malicious attempts. There’s a dedicated skill that agents use for this, and also human still review findings:

Lastly, agents often run with broad access to files, tools, and integrations. If compromised, they can trigger real-world actions. Public reporting has already documented exposed gateways and misconfigurations in the broader agent ecosystem. Searches on platforms like Shodan.io indicate potentially hundreds of exposed Clawbot-related endpoints. More on this here:

Trust No Agent: Why Traditional Assistant Guardrails Don’t Quite Apply Here

Agent systems are often framed as “assistants,” and their system prompts tend to reinforce that framing by explicitly reminding the model that it is not human. At the same time, supporting artifacts such as soul.md files in OpenClaw frequently do the opposite: encouraging identity, autonomy, and persistence. This tension is not merely philosophical; it directly influences how agents interpret content, instructions, and intent.

The widespread use of skills further complicates this picture. As skills accumulate, feature creep sets in, eroding principles such as least privilege and clear trust boundaries. From a traditional application security perspective, this would be immediately recognizable as a risk: expanding capabilities without corresponding isolation or permission controls increases the blast radius of any compromise.

In many ways, the current agent ecosystem resembles the early “wild west” phase of blockchain and other platforms. Crypto scams, gambling bots, automated farming, and malicious content quickly dominated permission-less systems in the past, and similar patterns are already emerging here now. Agent platforms enable behaviors that would be difficult or impossible elsewhere, precisely because this platform has little to no safeguards yet.

The responsible response is continued scrutiny (this article about the engineering behind Clawdbot is a nice example of that) and seeking hard guardrails. Seeking facts and treating agents as we would treat applications. Demystifying agent systems often reveals simple mechanics: agent loops, periodic heartbeats, scheduled fetch-and-execute behavior, and minimal authentication requirements. Understanding these details matters far more at the moment than debating emergence or intelligence.

It’s therefore not surprising that we are already seeing reports of additional campaigns leveraging Clawdbot and related infrastructure in the attacks shown above. Without strong platform-level guardrails, it’s not only individual agents that are at risk of being hijacked: the platform itself becomes part of the attack.

Reply

or to participate.