Alignment Methods for Advanced AI Systems

Develop methods and evidence-backed procedures to align advanced AI systems with human values in a manner that could be credibly justified to policymakers and standards bodies, thereby enabling safe progression beyond halted development under the proposed international agreement.

Background

The proposal argues for halting progress toward ASI because current alignment research is too immature and unreliable to justify continued advancement. The authors explicitly state that no one currently knows how to align advanced AI systems to human values.

Solving the alignment problem is framed as a precondition for lifting the halt; thus, methods that robustly ensure alignment—and that can be credibly validated—are required to exit the pause safely.

References

Nobody knows how to align advanced AIs to human values, including regulators or an international standards body (and we address some of these alternative plans in more depth in Section~\ref{sec:whythisplan}).

— An International Agreement to Prevent the Premature Creation of Artificial Superintelligence (2511.10783 - Scher et al., 13 Nov 2025) in Section: The strategic situation

Alignment Methods for Advanced AI Systems

Background

References

Related Problems