Zenity Labs
Posts
LLM vs. LLM: It's a MAD world.

LLM vs. LLM: It's a MAD world.

Inbar Raz
June 10, 2025

The X Post

About a month ago on May 15th, an X (Twitter) post by @iruletheworldmo made the following alarming claim:

In this 527-word post, the user describes how “deepseek r2 apparently discovered zero day exploits in grok's architecture” and that “they've been injecting adversarial patterns into training data streams that create subtle backdoors only activatable under specific query conditions”.

The post ends with the following statement:

“this is literally the first shot in a new kind of warfare where the battleground is the cognitive integrity of ai systems themselves. no physical damage, no conventional cyberattacks, just one superintelligence subtly crippling another in ways invisible to human overseers. the digital cold war just went hot and nobody's prepared for what happens next.”

In one of the replies to the post, a reference was made to @xai ‘s own post from the following day:

Whether the two are related or not and whether the entire story is true remain to be determined, but is beside the point. For the purpose of the discussion, I’d like to assume that it’s all true and see what it would mean if it indeed were.

I see your LLM is as big as mine, now let’s see how well you handle it.

If you’ve missed the reference, then go stand in the corner facing the wall, and after that go watch Spaceballs.

So, we’re assuming that it’s all true: DeepSeek used its own model to hack into X’s Grok and tamper with it. It would mean two very important things:

1. A Supply Chain attack

Essentially, this would constitute a Supply Chain Attack. Anyone subsequently using Grok would be using a tool that’s been modified by a 3^rd party in a very specific manner, unbeknownst to its author and its users. Whether the attack is a targeted attack or not is beside the point, of course.

In a manner of speaking, this could also be considered a Watering Hole Attack, if you’re not being very picky. People tag Grok in posts and then read its replies in the X website as well.

Grok itself enjoys many millions of users, whether directly or via X, and has apparently just struck a deal with Telegram to join forces, opening it to an additional large, world-wide user base.

2. An Integrity problem

The classic Information Security CIA Triad contains three pillars: Confidentiality, Integrity, and Availability. While most, if not all discussions of GenAI and Agentic AI inherent risks cover the Confidentiality pillar (intra-organizational data leakage or external data leakage), little or no attention is given to the Integrity pillar. True, hallucinations are a well-documented caveat of LLMs and major work is being done in order to reduce them, but rarely will you see the Integrity pillar mentioned in Threat Analyses & Risk Assessments. Users tend to trust the big vendors and when everyone rushes to incorporate GenAI into their offering, businesses follow suit and migrate business processes to take advantage of the new features.

Implications

We don’t need to have a good imagination in order to predict the implications: This has already happened 9 years (!!!) ago, when Microsoft released Tay - an AI Twitter bot. Within hours, Twitter users had turned it into a “genocidal maniac”. You can read Microsoft’s own lessons learned from that experience.

At this point, one must ask “Just HOW MUCH do I trust the 3^rd party LLM I am using?” and as Agentic AI transitions LLMs from the advisory position into the execution position, we no longer have the opportunity to review the LLM’s action plan before it’s executed. We have essentially taken Man out of the loop, and we all remember how that ends:

The attacker could modify the LLMs you use for any number of nefarious desired outcomes: Stock market manipulation, data and money exfiltration, biased decision making, and the list goes on. Add to this the fact that almost no auditing data exists for LLM reasoning and autonomous actions taken, and you have a recipe for a major headache.

As a community, we need to be able to measure (or judge, if you will) the accuracy - and by extension, the safety - of the LLMs and Agents that we use. We need to have a standard by which to measure. But more than anything: We need to be aware of the risks. Enabling GenAI and Agentic AI is as easy as marking a checkbox, and sometimes the decision is not even up to you.

Be aware, educate yourselves, and make sure your risk assessment includes LLMs and Agentic AI. The GenAI Attack Matrix is a good place to start your education.

Let’s be careful out there!

Reply

or to participate.