- The paper introduces an iterative planning framework that updates policies and calibrates safety tubes via adversarially robust conformal prediction.
- It leverages sensitivity analysis and Lipschitz continuity to derive closed-form tube updates that ensure distribution-free, per-episode safety.
- Empirical tests in car-pedestrian scenarios demonstrate maintained safety coverage and performance improvements over episodes.
Motivation and Technical Challenge
Planning robustly in interactive environments, where an autonomous agent (e.g., a self-driving vehicle) interacts with uncontrollable agents (pedestrians, human drivers), presents the distinctive technical challenge of interaction-driven distribution shift. The inherent coupling between the agent’s policy and the environment’s response causes the distribution of environment trajectories to change as the agent’s policy evolves, violating the exchangeability assumption foundational to classical conformal prediction (CP). Existing planning frameworks either ignore interaction—resulting in conservative, non-adaptive constraints—or they exploit it with improved empirical performance but without distribution-free safety guarantees. The paper introduces a model- and distribution-free safe planning framework with rigorous safety guarantees in interactive environments, resolving the “chicken-and-egg” coupling between policy updates and environment reaction through adversarially robust conformal prediction.
The system dynamics are segmented into ego agent states and uncontrollable agent states, coupled via potentially unknown interactive dynamics. Safety constraints are encoded as trajectory-wide functions H(x0:T,y0:T)≤0. The ideal formulation seeks policy sequences u0:T−1 that minimize a performance cost J(x0:T,u0:T−1) subject to chance constraints—guaranteeing safety with probability 1−α. The intractability arises because uncontrollable agent dynamics and intention distributions are typically unknown and policy-dependent, making even robust control (which hedges over all possible environment trajectories) overly conservative and statistical approaches (like CP) formally invalid.
The core technical innovation is reframing planning in an iterative, episodic fashion. At each episode j, the system:
- Deploys the current policy πj and collects i.i.d. trajectories from the environment.
- Calibrates an empirical quantile of nonconformity scores defining an uncertainty tube around nominal predictions via CP.
- Updates the uncertainty tube radius rj+1 based on sensitivity analysis to account for the policy-induced shift between episodes, using adversarial CP.
- Computes a policy πj+1 that is feasible under the new tube and safety constraint, then repeats.
Transfer of distribution-free coverage across policy updates is achieved through adversarially robust CP, which inflates the prediction set to accommodate bounded, policy-induced distribution shifts. Formally, for nonconformity score s(y^,y) and a perturbation budget ρ quantifying the effect of policy change, ACP assures that the inflated prediction set Cradv(y^;ρ) covers the true behavior with desired probability. The adversarial budget, derived via sensitivity analysis under Lipschitz assumptions on the dynamics, upper bounds the shift in environment trajectory as a function of the change in agent policy.
For policy update from πj to πj+1:
Mj+1:=βT∥πj+1−πj∥∞,
where βT aggregates Lipschitz constants from coupled dynamics. The next tube radius is set as
rj+1:=qj+Mj+1,
with qj the empirical quantile calibrated from data collected under πj.
However, the tube update is implicit, since rj+1 depends on πj+1, which itself is computed from rj+1. The authors provide two solvers:
- Implicit numerical root-finding over rj+1 via repeated planning queries
- An efficient explicit analytical update, leveraging Lipschitz continuity of the planning map, yielding closed-form tube updates.
Theoretical Guarantees
The framework rigorously establishes:
- Per-Episode Safety: With high probability, safety constraints are satisfied episode-wise, not merely in the limit.
- Tube and Cost Convergence: Under mild regularity conditions, tube radii and performance cost sequences converge.
- Stability and Shrinkage: The analytic tube update is contractive and converges provided the "closed-loop gain" κ=βTLU<1, where LU is the sensitivity of the planner map (derivable via perturbation analysis of the constrained optimizer).
- Quantitative Error Bounds: The gap between empirical and population quantiles is bounded using the DKW inequality and local CDF regularity. The residual tube conservatism vanishes as sample size per episode increases, with explicit rates provided.
Empirical Case Study
A car-pedestrian interaction scenario is instantiated under 2D planar dynamics, realistically simulating repulsive interaction. Planning is done over a finite horizon, with a safety tube guaranteeing minimum separation. Over episodes, the calibrated tube radius and performance cost demonstrably converge, and empirical coverage of tube and safety constraints is maintained at the target probability. Numerical results underscore the framework’s ability to balance safety and performance adaptively, without manual conservatism.
Strong empirical claims in the paper include:
- Maintenance of empirical safety at the target level across all episodes
- Convergence of the uncertainty radius
- Performance improvement per episode without loss of safety guarantees
Practical and Theoretical Implications
Practically, the proposed methodology is directly deployable for autonomy in complex, human-centric environments, such as urban robotics, self-driving vehicles, and human-robot interaction. The explicit planner sensitivity analysis and closed-form tube updates render the algorithm computationally tractable for real-time control.
Theoretically, the work closes the formal gap in the use of CP for interactive systems where exchangeability is violated—establishing, for the first time, valid per-episode safety certificates. The adversarial CP approach opens pathways for combining statistical data-driven methods with robust control theory, and the iterative policy calibration paradigm can be extended to settings with learning-based predictors and multi-agent environments. Future development directions include integrating data-driven estimation of planner sensitivity, adaptive tube inflation under non-smooth dynamics, and multi-agent extensions.
Conclusion
The paper advances the field of safe planning by fusing adversarially robust conformal prediction with iterative policy updates, analytically quantifying and compensating for interaction-driven distribution shifts. The resulting framework delivers distribution-free safety guarantees in interactive environments—a significant technical resolution to a longstanding open problem. The approach is general, computationally efficient, and empirically validated, laying a foundation for broader adoption of distribution-free uncertainty quantification in closed-loop autonomous systems.