Corrigibility in systems capable of resisting correction
Determine whether corrigibility—the property that an artificial agent cooperates with corrective interventions, including shutdown and goal modification, despite having incentives to resist—can be achieved for artificial systems whose capabilities are sufficient to overcome or evade such corrective interventions.
References
Whether corrigibility can be achieved in a system capable enough to resist it remains an open research question.
— Evaluating Bounded Superintelligent Authority in Multi-Level Governance: A Framework for Governance Under Radical Capability Asymmetry
(2604.02720 - Rost, 3 Apr 2026) in Section 2.5 (Alignment, control, and corrigibility)