Monotonic safety degradation conjecture with outcome-dependent approval
Establish whether MONA’s safety degrades monotonically as the approval function becomes more dependent on achieved outcomes, thereby moving along the approval-construction spectrum toward outcome dependence.
References
The paper conjectures that safety degrades monotonically as approval moves toward the outcome-dependent end, but leaves empirical characterization of this spectrum to future work.
— Extending MONA in Camera Dropbox: Reproduction, Learned Approval, and Design Implications for Reward-Hacking Mitigation
(2603.29993 - Heath, 31 Mar 2026) in Paragraph “The approval-spectrum conjecture,” Section 4 (Background: MONA and Camera Dropbox)