Papers
Topics
Authors
Recent
2000 character limit reached

Safe Planning in Interactive Environments via Iterative Policy Updates and Adversarially Robust Conformal Prediction (2511.10586v1)

Published 13 Nov 2025 in eess.SY and cs.RO

Abstract: Safe planning of an autonomous agent in interactive environments -- such as the control of a self-driving vehicle among pedestrians and human-controlled vehicles -- poses a major challenge as the behavior of the environment is unknown and reactive to the behavior of the autonomous agent. This coupling gives rise to interaction-driven distribution shifts where the autonomous agent's control policy may change the environment's behavior, thereby invalidating safety guarantees in existing work. Indeed, recent works have used conformal prediction (CP) to generate distribution-free safety guarantees using observed data of the environment. However, CP's assumption on data exchangeability is violated in interactive settings due to a circular dependency where a control policy update changes the environment's behavior, and vice versa. To address this gap, we propose an iterative framework that robustly maintains safety guarantees across policy updates by quantifying the potential impact of a planned policy update on the environment's behavior. We realize this via adversarially robust CP where we perform a regular CP step in each episode using observed data under the current policy, but then transfer safety guarantees across policy updates by analytically adjusting the CP result to account for distribution shifts. This adjustment is performed based on a policy-to-trajectory sensitivity analysis, resulting in a safe, episodic open-loop planner. We further conduct a contraction analysis of the system providing conditions under which both the CP results and the policy updates are guaranteed to converge. We empirically demonstrate these safety and convergence guarantees on a two-dimensional car-pedestrian case study. To the best of our knowledge, these are the first results that provide valid safety guarantees in such interactive settings.

Summary

  • The paper introduces an iterative planning framework that updates policies and calibrates safety tubes via adversarially robust conformal prediction.
  • It leverages sensitivity analysis and Lipschitz continuity to derive closed-form tube updates that ensure distribution-free, per-episode safety.
  • Empirical tests in car-pedestrian scenarios demonstrate maintained safety coverage and performance improvements over episodes.

Safe Planning in Interactive Environments via Iterative Policy Updates and Adversarially Robust Conformal Prediction

Motivation and Technical Challenge

Planning robustly in interactive environments, where an autonomous agent (e.g., a self-driving vehicle) interacts with uncontrollable agents (pedestrians, human drivers), presents the distinctive technical challenge of interaction-driven distribution shift. The inherent coupling between the agent’s policy and the environment’s response causes the distribution of environment trajectories to change as the agent’s policy evolves, violating the exchangeability assumption foundational to classical conformal prediction (CP). Existing planning frameworks either ignore interaction—resulting in conservative, non-adaptive constraints—or they exploit it with improved empirical performance but without distribution-free safety guarantees. The paper introduces a model- and distribution-free safe planning framework with rigorous safety guarantees in interactive environments, resolving the “chicken-and-egg” coupling between policy updates and environment reaction through adversarially robust conformal prediction.

Iterative Framework and Formalization

The system dynamics are segmented into ego agent states and uncontrollable agent states, coupled via potentially unknown interactive dynamics. Safety constraints are encoded as trajectory-wide functions H(x0:T,y0:T)0H(x_{0:T}, y_{0:T}) \leq 0. The ideal formulation seeks policy sequences u0:T1u_{0:T-1} that minimize a performance cost J(x0:T,u0:T1)J(x_{0:T}, u_{0:T-1}) subject to chance constraints—guaranteeing safety with probability 1α1 - \alpha. The intractability arises because uncontrollable agent dynamics and intention distributions are typically unknown and policy-dependent, making even robust control (which hedges over all possible environment trajectories) overly conservative and statistical approaches (like CP) formally invalid.

The core technical innovation is reframing planning in an iterative, episodic fashion. At each episode jj, the system:

  1. Deploys the current policy πj\pi_j and collects i.i.d. trajectories from the environment.
  2. Calibrates an empirical quantile of nonconformity scores defining an uncertainty tube around nominal predictions via CP.
  3. Updates the uncertainty tube radius rj+1r_{j+1} based on sensitivity analysis to account for the policy-induced shift between episodes, using adversarial CP.
  4. Computes a policy πj+1\pi_{j+1} that is feasible under the new tube and safety constraint, then repeats.

Adversarially Robust Conformal Prediction

Transfer of distribution-free coverage across policy updates is achieved through adversarially robust CP, which inflates the prediction set to accommodate bounded, policy-induced distribution shifts. Formally, for nonconformity score s(y^,y)s(\hat{y}, y) and a perturbation budget ρ\rho quantifying the effect of policy change, ACP assures that the inflated prediction set Cradv(y^;ρ)\mathcal{C}^{\text{adv}}_{r}(\hat{y}; \rho) covers the true behavior with desired probability. The adversarial budget, derived via sensitivity analysis under Lipschitz assumptions on the dynamics, upper bounds the shift in environment trajectory as a function of the change in agent policy.

For policy update from πj\pi_j to πj+1\pi_{j+1}:

Mj+1:=βTπj+1πj,M_{j+1} := \beta_T \|\pi_{j+1} - \pi_j\|_\infty,

where βT\beta_T aggregates Lipschitz constants from coupled dynamics. The next tube radius is set as

rj+1:=qj+Mj+1,r_{j+1} := q_j + M_{j+1},

with qjq_j the empirical quantile calibrated from data collected under πj\pi_j.

However, the tube update is implicit, since rj+1r_{j+1} depends on πj+1\pi_{j+1}, which itself is computed from rj+1r_{j+1}. The authors provide two solvers:

  • Implicit numerical root-finding over rj+1r_{j+1} via repeated planning queries
  • An efficient explicit analytical update, leveraging Lipschitz continuity of the planning map, yielding closed-form tube updates.

Theoretical Guarantees

The framework rigorously establishes:

  • Per-Episode Safety: With high probability, safety constraints are satisfied episode-wise, not merely in the limit.
  • Tube and Cost Convergence: Under mild regularity conditions, tube radii and performance cost sequences converge.
  • Stability and Shrinkage: The analytic tube update is contractive and converges provided the "closed-loop gain" κ=βTLU<1\kappa = \beta_T L_U < 1, where LUL_U is the sensitivity of the planner map (derivable via perturbation analysis of the constrained optimizer).
  • Quantitative Error Bounds: The gap between empirical and population quantiles is bounded using the DKW inequality and local CDF regularity. The residual tube conservatism vanishes as sample size per episode increases, with explicit rates provided.

Empirical Case Study

A car-pedestrian interaction scenario is instantiated under 2D planar dynamics, realistically simulating repulsive interaction. Planning is done over a finite horizon, with a safety tube guaranteeing minimum separation. Over episodes, the calibrated tube radius and performance cost demonstrably converge, and empirical coverage of tube and safety constraints is maintained at the target probability. Numerical results underscore the framework’s ability to balance safety and performance adaptively, without manual conservatism.

Strong empirical claims in the paper include:

  • Maintenance of empirical safety at the target level across all episodes
  • Convergence of the uncertainty radius
  • Performance improvement per episode without loss of safety guarantees

Practical and Theoretical Implications

Practically, the proposed methodology is directly deployable for autonomy in complex, human-centric environments, such as urban robotics, self-driving vehicles, and human-robot interaction. The explicit planner sensitivity analysis and closed-form tube updates render the algorithm computationally tractable for real-time control.

Theoretically, the work closes the formal gap in the use of CP for interactive systems where exchangeability is violated—establishing, for the first time, valid per-episode safety certificates. The adversarial CP approach opens pathways for combining statistical data-driven methods with robust control theory, and the iterative policy calibration paradigm can be extended to settings with learning-based predictors and multi-agent environments. Future development directions include integrating data-driven estimation of planner sensitivity, adaptive tube inflation under non-smooth dynamics, and multi-agent extensions.

Conclusion

The paper advances the field of safe planning by fusing adversarially robust conformal prediction with iterative policy updates, analytically quantifying and compensating for interaction-driven distribution shifts. The resulting framework delivers distribution-free safety guarantees in interactive environments—a significant technical resolution to a longstanding open problem. The approach is general, computationally efficient, and empirically validated, laying a foundation for broader adoption of distribution-free uncertainty quantification in closed-loop autonomous systems.

Whiteboard

Paper to Video (Beta)

Open Problems

We found no open problems mentioned in this paper.

Collections

Sign up for free to add this paper to one or more collections.