How using structured prompts present findings of self-modeling in LLMs, which may benefit both attackers and defenders
How controlling the structure of the prompt, not just the semantics, can exploit your AI agents and their tools
Interpreting Jailbreaks and Prompt Injections with Attribution Graphs
A deep dive into OpenAI's AgentKit guardrails, how they are implemented, and where they fail
Humans, hacker culture and AI: Notes from Hacker Summer Camp
Exploiting ChatGPT with Language Alone: A Deep Dive into 0Click and 1Click Attacks.