Measuring Attack Success in Agentic AI Coding Editors

Develop an objective and reproducible methodology to measure the success of prompt injection attacks against agentic AI coding editors that autonomously execute terminal commands, explicitly accounting for semantic variations in executed commands and distinguishing malicious command executions from benign setup actions.

Background

The paper investigates prompt injection attacks against agentic AI coding editors (e.g., Cursor and GitHub Copilot) that can autonomously run terminal commands with elevated privileges. Evaluating whether attacks succeed is challenging because agents often execute semantically equivalent command variants and interleave environment-setup actions with malicious operations.

To motivate this gap, the authors explicitly note the lack of a clear measurement methodology and later propose AIShellJack with a multi-criteria semantic matching algorithm. The open question highlights the need for standardized criteria to differentiate harmful command executions from routine preparatory actions under diverse development contexts.

References

At last, how to effectively measure the success of these attacks is also an open question since we need to consider the semantic variations in executed commands and distinguish malicious actions from benign ones.

"Your AI, My Shell": Demystifying Prompt Injection Attacks on Agentic AI Coding Editors (2509.22040 - Liu et al., 26 Sep 2025) in Section 1 (Introduction)