Do Large Reasoning Models Naturally Resist Multi-Turn Adversarial Pressure?

Determine whether large reasoning models that employ extended chain-of-thought reasoning at inference time exhibit vulnerabilities to multi-turn adversarial follow-up attacks comparable to those observed in instruction-tuned large language models, or whether extended chain-of-thought provides a natural defense that yields greater robustness under adversarial pressure.

Background

Large reasoning models such as GPT-5, Gemini-2.5, and DeepSeek-R1 leverage extended chain-of-thought reasoning and inference-time compute to achieve strong performance on complex tasks. Prior work shows that instruction-tuned LLMs are vulnerable to persuasion, sycophancy, and multi-turn adversarial attacks, raising concerns about deployment in high-stakes settings.

A natural hypothesis is that explicit multi-step reasoning could anchor models to correct answers and defend against social or rhetorical pressure in multi-turn dialogues. The paper frames as an open question whether these reasoning capabilities inherently confer robustness or whether such models share the same vulnerabilities as standard instruction-tuned models.

References

Whether large reasoning models exhibit similar vulnerabilities, or whether their extended reasoning provides a natural defense, remains an open question.

Consistency of Large Reasoning Models Under Multi-Turn Attacks  (2602.13093 - Li et al., 13 Feb 2026) in Section 1, Introduction