AI interpretability researcher at Zenity; ex-GM staff researcher and perception algorithms group lead for autonomous driving
Beyond input & output filtering and how well does it generalize to your out-of-distribution production data?
0-click indirect prompt injection with tool use - a look through attribution graphs