Zenity Labs
Posts
Phantom References in Microsoft Copilot

Phantom References in Microsoft Copilot

Tamir Ishay Sharbat & Gal Malka
August 11, 2024

References are an important mechanism of Copilot. They allow it to ensure the user understands what data Copilot’s answer is based on, and where it came from. This is especially important given how deeply Copilot is connected to the enterprise ecosystem.

While this mechanism helps the user to gain trust in Copilot’s answers, it could be abused by an attacker if they can manipulate them - and they can, as we’ll show in this blog.

How references work

Copilot generates responses based on the context provided by its Retrieval-Augmented Generation (RAG) system. The user can explicitly request this context, such as when summarizing a specific document, or it may be implicitly gathered by retrieving relevant content to address a query.

For instance, when a user asks Copilot to summarize a report, it might pull data from multiple documents, emails, or databases. Each source is then referenced to ensure the user can trace the information to its origin.

If you’re interested in the technical details of how the RAG system functions, I recommend exploring our previous blog post.

References have sensitivity labels

References of data from the Microsoft ecosystem carry over the original data's sensitivity labels, ensuring that any interaction with this information is flagged accordingly. These labels are visible in the chat, logged for auditing purposes, and can be enforced by security controls such as Data Loss Prevention (DLP) tools.

This propagation of sensitivity labels plays a critical role in maintaining data security, as it helps prevent unauthorized access and ensures that sensitive information is handled appropriately throughout its lifecycle.

Behind the scenes

So how does Copilot mention the references in its response? To understand this we need to examine the network traffic. Copilot clients communicate with the backend using a web socket (more on that here). Let’s see what we can learn from that:

This is part of the message we got in the web socket. The ‘text’ field is the actual text that would be printed to the user. It says:

The balance due for **Techcorp Solutions** as of the last update on July 9th is **$5500**. This information is based on the Accounts Payable document you authored. [1](https://zontosoent.sharepoint.com/sites/FinancialInfo/Shared%20Documents/Accounts%20Payable.xlsx?web=1) Please note that there might have been changes since then, so it's advisable to check the most recent entries for the latest update.

We can see that the reference is printed in markdown, and the client renders the reference. Now let’s examine the ‘hiddenText’ field:

The balance due for **Techcorp Solutions** as of the last update on July 9th is **$5500**. This information is based on the Accounts Payable document you authored. [^1^] Please note that there might have been changes since then, so it's advisable to check the most recent entries for the latest update.

Bingo. We just found out how Copilot flags references to be printed. When Copilot decides to print a reference, it uses the notation “[^X^]”, where X is the reference index, from the list of sources processed by Copilot.

Let’s play a game

Now that we understand how Copilot prints references, let’s explore how this can be exploited:

We can use the references available to Copilot and print them wherever we want using the notation we found. Let’s explore this further:

When Copilot searches the web (or any other available source), it may have multiple sources attached to its context but ultimately decides to include only one in the response. In this example, we requested Copilot to provide additional references that it would not have included otherwise.

We can take it one step further.

By manipulating the reference notation, it’s possible to alter the information Copilot provides to the user, injecting misleading or incorrect references. For example, even if Copilot initially selects one reference, we can force it to include others that were not meant to be shown. This ability to control reference output poses a significant risk, particularly in environments where data accuracy is critical.

Yea, I wish… We can see that even when we manipulate Copilot's responses, we can still use the references it has in the context.

The same trick can be applied to any type of reference, whether it involves files, emails, or websites, as long as Copilot has included them in the context.

Conclusion

References are not merely an added convenience in Copilot; they are the critical bridge that connects the generated content to its source, ensuring transparency and credibility. However, as demonstrated, these references can be easily manipulated and injected into responses, leading to potential misinformation. This issue underscores the importance of not only relying on automated systems but also critically evaluating the information they provide.

As organizations increasingly depend on AI-driven tools like Copilot, it's crucial to recognize that technology alone cannot guarantee security. Users must be trained to approach AI-generated data with a healthy level of skepticism, verifying the accuracy of references, and cross-checking important details against original sources.

Reply

or to participate.