Adapt ASTRA to dynamic target strings

Extend the ASTRA attention-based prompt injection attack to scenarios with dynamic adversarial targets that depend on conversation history or contextual data (such as data leakage tasks), beyond fixed target strings, and evaluate its effectiveness in these dynamic target settings.

Background

The evaluation in the paper uses a fixed adversarial target string, aligning with prior defense benchmarks (e.g., SecAlign and StruQ) that test whether the model outputs a predetermined phrase. Real-world prompt injections often require dynamic targets, for instance exfiltrating specific sensitive information from the conversation history.

The authors explicitly state that adapting ASTRA to dynamic targets is left for future work, highlighting an important unresolved extension necessary to assess the attack’s applicability to realistic adversarial objectives.

References

We evaluated our attack for the simple case where the target string is a fixed string. In general, prompt injections can require dynamically generated target strings such as when the attacker wants to leak private data based on the conversation history. We leave the adaptation of our attack to dynamic target strings to future work.

— May I have your Attention? Breaking Fine-Tuning based Prompt Injection Defenses using Architecture-Aware Attacks (2507.07417 - Pandya et al., 10 Jul 2025) in Section 7.3, Discussion: Limitations of ASTRA

Adapt ASTRA to dynamic target strings

Sponsor

Background

References

Related Problems