How controlling the structure of the prompt, not just the semantics, can exploit your AI agents and their tools
Interpreting Jailbreaks and Prompt Injections with Attribution Graphs
A deep dive into OpenAI's AgentKit guardrails, how they are implemented, and where they fail
Humans, hacker culture and AI: Notes from Hacker Summer Camp
Exploiting ChatGPT with Language Alone: A Deep Dive into 0Click and 1Click Attacks.