Sufficient Levels of Chain-of-Thought Monitorability for Safety
Determine what level of chain-of-thought monitorability is sufficient to ensure safety within a given deployment domain, including how monitorability metrics translate into effective prevention of harmful outcomes.
Sponsor
References
Furthermore, it is unclear what level of monitorability is sufficient for ensuring safety in a given domain.
— Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
(2507.11473 - Korbak et al., 15 Jul 2025) in Section 3, How should CoT monitorability be evaluated?