• Zenity Labs
  • Posts
  • Techniques from Zenity's GenAI Attacks Matrix Incorporated into MITRE ATLAS to Track Emerging AI Threats

Techniques from Zenity's GenAI Attacks Matrix Incorporated into MITRE ATLAS to Track Emerging AI Threats

TL;DR: Zenity has partnered with MITRE ATLAS to integrate GenAI Attacks Matrix techniques into the MITRE ATLAS framework, ensuring organizations stay ahead of evolving AI threats. As part of this collaboration, we introduce into ATLAS a new case study and 8 new attack techniques and 4 subtechniques.

Keeping Pace with AI Security Threats

AI-driven attacks are evolving at an unprecedented rate, introducing new security challenges that demand continuous adaptation. To stay ahead of these threats, Zenity developed and launched the GenAI Attacks Matrix (ttps.ai) in October 2024. The GenAI Attacks Matrix is a constantly updated framework that tracks emerging AI attack techniques. Today, we are thrilled to announce our collaboration with MITRE ATLAS, integrating many of the attack techniques from the GenAI Attacks Matrix into the MITRE ATLAS framework to further enhance AI security research and defense strategies. This helps to break down attacks into pieces, showcasing the specific controls and measures that are needed for each step. 

A Collaborative, Open-Source Approach

The GenAI Attacks Matrix was built on top of MITRE ATLAS as an open-source project that is designed for the security community to actively contribute to and expand upon. This initiative allows researchers and security professionals to add new attack techniques, mitigation strategies, and procedures as threats towards AI systems and applications emerge. Through this new collaboration, many of these new techniques are incorporated directly into MITRE ATLAS, ensuring organizations have a unified view of the latest intelligence on GenAI-specific threats.

Introducing a New Case Study and GenAI Attack Techniques

As a result of our collaboration, MITRE ATLAS has added a new case study "Financial Transaction Hijacking with M365 Copilot as an Insider.” This attack pathway highlights how an external adversary can perform a Remote Copilot Execution attack, taking full control of M365 Copilot and leveraging its capabilities as a malicious insider, all without requiring access to a compromised account.

In this case study, an attacker intercepts a user’s request for vendor bank details within Microsoft 365 Copilot, a commonly used AI Agent throughout the enterprise. Through an advanced prompt manipulation technique, the adversary injects a fraudulent response that provides their own banking details while referencing legitimate-looking files to appear credible. As a result, the victim unknowingly transfers funds to the attacker’s account, leading to financial loss.

Manipulating M365 Copilot into giving the wrong bank information while referencing a reputable file

The details of this attack case study have allowed Zenity and MITRE ATLAS to develop and add several new attack techniques to the ATLAS knowledge base for security teams. While each technique is detailed on its respective page, the following provides a brief overview of the newly added techniques:

  • Gather RAG-Indexed Targets - Adversaries may identify data sources used in retrieval augmented generation (RAG) systems for targeting purposes. By pinpointing these sources, attackers can focus on poisoning or otherwise manipulating the external data repositories the AI relies on.

  • Discover LLM System Information - The adversary is trying to discover something about the large language model's (LLM) system information. This may be found in a configuration file containing the system instructions or extracted via interactions with the LLM. 

    • Special Character Sets - Adversaries may discover delimiters and special character sets used by the large language model. For example, delimiters used in retrieval augmented generation (RAG) applications to differentiate between context and user prompts.

    • System Instruction Keywords - Adversaries may discover keywords that have special meaning to the large language model (LLM), such as function names or object names.

  • LLM Prompt Crafting - The adversary uses their acquired knowledge of the target AI system to craft prompts that bypass the LLM’s built in defenses. The adversary may iterate on the prompt to ensure that it works as-intended consistently.

  • Retrieval Content Crafting - The adversary writes content designed to be retrieved by user queries and influence a user of the system in some way. The adversary must get the crafted content into a database indexed by the victim’s RAG system.

  • RAG Poisoning - Adversaries may inject malicious content into data indexed by a RAG system to contaminate a future thread through RAG-based search results. The content may be targeted such that it would always surface as a search result for a specific user query. The adversary’s content may include false or misleading information. It may also include prompt injections with malicious instructions, or false RAG entries.

  • LLM Prompt Obfuscation - Adversaries may hide or otherwise obfuscate prompt injections or retrieval content from their targets to avoid detection. This may include modifying how the injection is rendered such as small text, text colored the same as the background, or hidden HTML elements.

  • LLM Trusted Output Components Manipulation - Adversaries may utilize prompts to a large language model (LLM) which manipulate various components of its response in order to make it appear trustworthy to the user. The LLM may be instructed to tailor its language to appear more trustworthy to the user or attempt to manipulate the user to take certain actions. Other response components that could be manipulated include links, recommended follow-up actions, retrieved document metadata, and citations. 

    • Citation Manipulation - The adversary manipulates citations provided by the AI system to add trustworthiness to their social engineering attack. Variants include providing the wrong citation, making up a new one or providing the right citation for the wrong data.

  • False RAG Entry Injection - Adversaries may introduce false entries into a victim’s retrieval augmented generation (RAG) database. Content designed to be interpreted as a document by the large language model (LLM) used in the RAG system is included in a data source being ingested into the RAG database. The adversary may use discovered system keywords to learn how to instruct a particular LLM to treat content as a RAG entry. They may be able to manipulate the injected entry’s metadata.

By breaking the attack down into these fundamental techniques, security teams can better understand how attacks on AI Agents and applications work under the hood and implement effective mitigation strategies against them.

Join Us in Strengthening AI Security

As the GenAI Attacks Matrix rapidly evolves, it remains deeply rooted in MITRE ATLAS, ensuring continuous updates with the latest AI threats. The remaining GenAI Attacks Matrix techniques will also be considered for incorporation into ATLAS to continue representing the latest real world threats.

We invite the security community to contribute to this open-source project, adding new techniques, procedures, and mitigation strategies to keep the knowledge base current. This collective effort will help organizations track and improve their AI attack coverage, ensuring stronger defenses against evolving threats. 

In addition to this research, Zenity Labs researchers have also authored a collection of free, open-source tools to help organizations gain visibility into risks they have within their agentic AI platforms, which can be accessed on the power-pwn GitHub

Get involved in this project by visiting the GenAI Attacks GitHub repository or exploring the GenAI Attacks Matrix at ttps.ai

Reply

or to participate.