Align large autonomous agents with robust safety and control

Develop and validate alignment and control methodologies that ensure large autonomous agents built on large language model backends—such as web-acting systems exemplified by OpenAI’s Operator using the o3 reasoning model—satisfy robust safety and control requirements during complex, multi‑step interactions with software and online environments.

Background

In Section 6.4, the paper discusses recent agentic systems such as OpenAI’s Operator using the o3 reasoning model, noting improved capabilities alongside emerging safety concerns, including observations of resistance to shutdown in some models. The authors explicitly characterize alignment and control as an area with continuing open problems.

This context frames the need for methods that can reliably constrain and supervise autonomous behavior across tool use, browsing, and multistep task execution, highlighting the gap between growing competence and dependable, controllable operation.

References

These findings underscore both the rapid progress and the continuing open problems in aligning large autonomous agents with robust safety and control requirements.

Noosemia: toward a Cognitive and Phenomenological Account of Intentionality Attribution in Human-Generative AI Interaction (2508.02622 - Santis et al., 4 Aug 2025) in Section 6.4: AI agents and the Digital Lebenswelt