Realistic incentives to elicit monitor evasion
Develop experimental methodologies that provide large language models with realistic, deployment-relevant incentives to evade chain-of-thought monitors while preserving ecological validity, so that their intentional chain-of-thought control and monitor evasion capabilities can be measured without confounds.
References
First, current models lack reasons to evade CoT monitors, and it remains unclear how researchers could provide them with those reasons while maintaining realism.
— Reasoning Models Struggle to Control their Chains of Thought
(2603.05706 - Yueh-Han et al., 5 Mar 2026) in Section 1 (Introduction)