Disentangle Necessity versus Propensity in CoT Monitorability
Characterize the proportion of observed chain-of-thought monitorability attributable to the necessity of externalizing reasoning versus the natural propensity of models to externalize reasoning across the tasks considered for detecting misbehavior and alignment signals.
Sponsor
References
It is unclear what proportion of the CoT monitorability demonstrated in these examples is due to the necessity versus the propensity for a model to reason out loud in the tasks considered.
— Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
(2507.11473 - Korbak et al., 15 Jul 2025) in Section 1.2, Chain of Thought is Often Monitorable in Practice