Do Serious AI Harms Require Difficult Reasoning

Determine whether the most serious harms posed by advanced AI systems in deployment settings in fact require difficult reasoning to execute, or whether such harms can be carried out without extended reasoning and working memory, for example in scenarios like self-exfiltration or sabotage.

Background

The paper argues that severe risks from advanced AI systems likely require sophisticated planning and working memory, and that in Transformer-based models, long serial chains of cognition must pass through chain-of-thought at some point. This motivates the safety value of monitoring chain-of-thought, especially on difficult tasks that necessitate externalized reasoning.

However, the authors caution that not all dangerous actions may require substantial reasoning, and as models are entrusted with high-stakes tasks, some harms might be achievable without extended reasoning. This leaves open whether the most serious harms intrinsically depend on difficult reasoning, which would affect the reliability and scope of chain-of-thought monitoring for safety.

References

Finally, it remains an open question whether the most serious harms in fact require difficult reasoning.

— Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety (2507.11473 - Korbak et al., 15 Jul 2025) in Section 1.1, Thinking Out Loud is Necessary for Hard Tasks

Do Serious AI Harms Require Difficult Reasoning

Sponsor

Background

References

Related Problems