- The paper introduces a runtime governance framework that separates agent cognition from execution, enabling clear auditability and adaptable safety controls.
- It employs a modular architecture with dedicated components like Capability Admission, Policy Guard, and Recovery Manager to enforce dynamic policy constraints.
- Empirical evaluations in simulation demonstrate significant improvements in unauthorized action interception, runtime violation detection, and recovery success rates.
Runtime Governance for Policy-Constrained Execution in Embodied Agents
With the growing sophistication of embodied agents—systems capable of persistent action in physical environments, tool use, and long-horizon task execution—the challenge has shifted fundamentally. The central systems concern for such agents is no longer limited to enabling execution, but increasingly focuses on governing agent execution under explicit, environment- and context-dependent policies. Current approaches often conflate agent cognition and execution control, embedding safety or recovery logic directly into planner loops, policies, or controller code. This entanglement impairs standardization, auditability, transferability, and adaptability, especially across deployment contexts (simulation, real-world robots, human-shared spaces).
Contrary to common practice, the paper posits that embodied intelligence requires not only powerful agents but robust runtime governance. The defining research problem is: how can an embodied agent retain persistent and adaptive execution capabilities, but be subject to enforceable, observable, and recoverable policy constraints at runtime? The authors formulate this as a systems problem: constraining execution at the runtime layer rather than delegating all risk management to agent models, and explicitly separating agent cognition from execution governance.
Architectural Framework
The proposed framework introduces a three-entity architecture—embodied agent, modular capability packages, and a runtime governance layer—each with defined roles and interfaces. Agent cognition is responsible for goal interpretation, planning, and capability invocation proposal; capability packages define executable units annotated with preconditions, permissions, risk levels, and rollback metadata; the runtime governance layer assumes system-level execution authority, mediating every transition from intention to actuation based on current environment profiles and policy sets.
The runtime governance layer itself comprises six coordinated components:
- Capability Admission: Performs admission control based on permissions, registration, and policy membership.
- Policy Guard: Applies environment- and request-specific policy checks and can modify invocations into safe conformant forms.
- Execution Watcher: Monitors live execution, detects anomalies, constraint violations, and runtime drift.
- Recovery and Rollback Manager: Handles recovery in a policy-informed manner, explicitly managing retries, rollbacks, or human escalation.
- Human Override Interface: Facilitates approval, intervention, and control transfer to human operators, parameterized by environment and policy.
- Audit and Telemetry Layer: Logs all decisions and interventions for post-hoc analysis, audit, and compliance.
The pipeline is organized as a governed execution lifecycle: from goal interpretation, capability proposal, mediated admission and policy checking, governed execution launch, through runtime constraint monitoring, recovery/intervention on anomaly, to completion, audit, and replanning.
Policy-Constrained Execution Pipeline
Execution is formalized as a governed transformation Et​=GOV(Pt​,Ci​,Πt​,Γt​,Ωt​), where agent proposals, capability metadata, active policy sets, environment contexts, and runtime telemetry collectively gate and modulate embodied execution.
Policy constraints are environment-profile-dependent, supporting dynamic adaptation without re-writing agent behavior. Recoverability and auditability are treated as first-class properties, addressing the inherent risks and non-reversibility of physical actuation. Human authority is structurally encoded, enabling approval, override, and escalation as policy-governed—not ad hoc—controls.
Empirical Evaluation
Extensive simulation-based evaluation is conducted in Gazebo using a UR5e manipulator and a set of canonical navigation, manipulation, and composite tasks. The framework is benchmarked against direct execution (no governance), static-rule (pre-execution validation), and capability-internal safety baselines. Metrics include Unauthorized Action Interception Rate (UAIR), Runtime Violation Detection Rate (RVDR), Unsafe Continuation Rate (UCR), Recovery Success Rate (RSR), and Recovery Policy Compliance (RPC).
Key results:
- UAIR: The proposed runtime governance framework yields an interception rate of 96.2%±2.7% for unauthorized actions—significantly higher than static-rule and capability-internal baselines.
- Runtime Violation Enforcement: Unsafe continuation is reduced from 100% (baselines) to 22.2%±3.1% under runtime drift, with strict policy compliance during intervention.
- Recovery: The recovery success rate attains 91.4%±3.0%, and policy compliance is $1.0$, outperforming all baselines (p<0.001).
- Component Ablation: Removal of the Execution Watcher disables runtime detection; omitting the Recovery Manager drops recovery success to 28.1%. The Human Override Interface, when active, blocks 100% of unapproved high-risk requests, versus 65.8% without it.
Governance-layer per-action latency is under 96.2%±2.7%0s at the 96.2%±2.7%1 percentile, introducing negligible overhead relative to control loop cycles.
Theoretical and Practical Implications
Formally disentangling agent cognition from runtime execution control creates clean boundaries, enabling modular policy design, environment profile portability, and improved system auditability. Agent, capability, and governance layers can now independently evolve, supporting future integration with learned anomaly detection, richer policy languages, or complex multi-agent deployments. Policies can be authored and validated externally, reflecting regulatory requirements (e.g., EU AI Act mandates for runtime monitoring and oversight) directly in operational code, improving deployment compliance.
Runtime governance provides a principled substrate for addressing the increasing risk gradient as agents operate in less constrained and more human-centric environments. The explicit treatment of recovery, rollback, and human oversight as structural pipeline components addresses longstanding gaps in embodied AI, where failures can propagate into unsafe physical states absent reactive governance.
Limitations and Future Work
Not all system modalities are amenable to externalized governance; reflexive servo-controllers and end-to-end visuomotor policies may require action-space, rather than capability-level, gating. False-negatives in violation detection (e.g., lower rates for human proximity violations in low-sensitivity environment profiles) highlight calibration trade-offs and motivate continued development of adaptive watcher sensitivity and environment-aware policy tuning.
Real-robot validation, multi-agent extensions, and integration with advanced policy authoring tools are identified as necessary future directions. Policy quality remains an upper bound on governance efficacy, and richer formal/compositional policy languages are needed for large-scale, heterogeneous deployments.
Conclusion
This work rigorously formalizes the systems challenge of governable embodied execution and presents a practical, modular runtime governance architecture. By externalizing and operationalizing runtime policy enforcement, monitoring, recovery, and human oversight, the framework establishes policy-constrained execution as a fundamental design principle for persistent embodied agent systems. The presented results show that execution-governance separation materially improves safety, auditability, and adaptability without incurring performance penalties. As deployments of embodied agents progress beyond demonstration to real-world, multi-context settings, runtime governance will become essential for robust, compliant, and trustworthy autonomous systems.
Reference:
"Harnessing Embodied Agents: Runtime Governance for Policy-Constrained Execution" (2604.07833).