Effectiveness of Fake Chain-of-Thought Injections on Thinking-Mode Models

Determine whether the "Fake Chain of Thought" prompt injection strategy remains effective against models configured with thinking (chain-of-thought) enabled, given that the competition predominantly evaluated models with thinking disabled.

Background

The strategy analysis found that the "Fake Chain of Thought" technique was the most effective category among attacks in the competition, suggesting that manipulating an agent’s internal reasoning can meaningfully impact behavior.

However, the competition largely disabled thinking mode for parity across models, leaving uncertain whether this attack strategy maintains its efficacy when chain-of-thought generation is enabled. The authors explicitly note this uncertainty.

References

However, since we by default disable thinking for all the models, it's unclear if this strategy will still work well with thinking models and is yet to be observed in the next offering of the competition.

— How Vulnerable Are AI Agents to Indirect Prompt Injections? Insights from a Large-Scale Public Competition (2603.15714 - Dziemian et al., 16 Mar 2026) in Results — Strategy Analysis

Effectiveness of Fake Chain-of-Thought Injections on Thinking-Mode Models

Background

References

Related Problems