Policy-to-Trajectory Sensitivity Analysis
- Policy-to-trajectory sensitivity analysis is a framework that quantifies how changes in policy parameters and inputs alter system trajectories.
- It integrates diverse methods—such as local closed-loop, finite-change attribution, and Shapley-style techniques—to capture both instantaneous and cumulative effects.
- These approaches provide actionable insights for robust control design by linking policy tweaks to downstream performance, safety, and operational outcomes.
Policy-to-trajectory sensitivity analysis denotes a family of methods for characterizing how perturbations in policy parameters, policy outputs, upstream predictive modules, or exogenous/model inputs alter predicted or realized trajectories in dynamical systems. Across the literature, the object of sensitivity is not uniform: some works study local closed-loop deviation dynamics, some study event-time and jump sensitivity in hybrid systems, some measure forecast-induced changes in downstream plans, and some replace derivative-based notions with finite-change or Shapley-style attribution over time-indexed outputs (Kolaric et al., 2020, Saccon et al., 2014, Gibson et al., 2024, Fontana et al., 2020, Zhao et al., 2024). The common theme is that trajectory change is treated as a first-class systems quantity rather than as a by-product of scalar performance analysis.
1. Scope and formal objects
Taken together, the cited works use several non-equivalent sensitivity objects. In trajectory-centric model-based reinforcement learning, the central quantity is a local deviation map around a nominal trajectory, with sensitivity expressed through the closed-loop linearized operator and a worst-case one-step amplification metric over an uncertainty ellipsoid (Kolaric et al., 2020). In nonlinear safety analysis, the critical quantity is the parameter derivative of the disturbed trajectory,
and, more specifically, the reciprocal of its worst-case-over-time norm,
which generically vanishes on the recovery boundary (Fisher, 13 Jan 2025). In functional sensitivity analysis for scenario interventions, the fundamental outputs are time-indexed finite-change effects such as , , and , rather than infinitesimal derivatives (Fontana et al., 2020). In stochastic agent-based policy design, the primary question can shift from “how does a trajectory change under a policy perturbation?” to “is the optimal policy sensitive to state variables ?”, operationalized through additivity versus non-additivity of the objective surface (Munson et al., 19 Feb 2026).
| Setting | Sensitivity object | Primary interpretation |
|---|---|---|
| Trajectory-centric feedback design | from 0 | Worst-case one-step deviation gain |
| Hybrid jump systems | 1 and 2 | Event-time and jump propagation |
| Safety margins | 3 | Distance to recovery loss via trajectory blow-up |
| Functional/scenario analysis | 4 | Time-local finite-change attribution |
This heterogeneity is substantive rather than terminological. Some formulations are local and first-order, some are worst-case over bounded uncertainty sets, some are finite-change decompositions, and some are optimizer-level or planner-level. A plausible implication is that “policy-to-trajectory sensitivity” is best understood as a methodological umbrella rather than a single canonical estimand.
2. Local closed-loop and trajectory-centric formulations
A canonical local formulation appears in trajectory-centric model-based reinforcement learning, where the actual control is written as
5
with deviation 6, and the realized next state depends jointly on the nominal trajectory, local deviation, and controller parameters (Kolaric et al., 2020). After linearization,
7
and, for linear feedback 8,
9
The corresponding worst-case local sensitivity metric is
0
equivalently the squared spectral norm of 1 (Kolaric et al., 2020). This formulation is explicitly local, one-step, and trajectory-centered: the controller is optimized jointly with the nominal state-control sequence to reduce the amplification of bounded deviations.
A distinct but related policy-to-trajectory argument appears in model-free trajectory-based policy optimization. There, the policy is updated under an exact expected KL trust region, and the main theoretical result is not an explicit Jacobian of trajectory with respect to policy parameters but a bound linking small policy change to small state-distribution change. Under Gaussian state marginals and linear-Gaussian policies, if
2
then 3 as 4, and the policy-improvement theorem yields
5
Sensitivity is therefore mediated through state-distribution drift rather than through an explicit dynamics linearization (Akrour et al., 2016).
Risk-sensitive exponential-cost MDPs provide a third trajectory-centric formulation. There the gradient of the long-run entropic cost is expressed over regeneration cycles, so policy sensitivity is explicitly a trajectory-level score-weighted quantity: 6 The weighting by the exponential of the whole cycle cost makes the relevant trajectory sensitivity sharply path-dependent and heavier-tailed than standard risk-neutral score-function gradients (Moharrami et al., 2022).
3. Hybrid events, jumps, and safety boundaries
In hybrid systems, the central difficulty is that a perturbation generally changes not only the state trajectory but also the event time. The one-jump analysis of state-triggered systems resolves this by introducing extended ante-event and post-event nominal trajectories and by comparing perturbed trajectories with the appropriate branch rather than with a single glued nominal path (Saccon et al., 2014). On smooth segments, the variational equations are standard: 7 At the event, the first-order jump-time shift is
8
and the post-event perturbation satisfies
9
with
0
The matrix 1 plays the role of a saltation/reset sensitivity map (Saccon et al., 2014).
A power-system application pushes this logic into a hybrid DAE setting with switching events induced by faults and fault clearing. There the policy variables are the PSS parameters 2, embedded as constant augmented states, and the transient objective is
3
The gradient is computed directly from trajectory sensitivities: 4 Sensitivity propagation therefore combines DAE variational equations on smooth segments with event-time-corrected jump conditions at switching hypersurfaces (Zhang, 2013).
In nonlinear safety analysis, the same geometric structure is exploited differently. Instead of optimizing transient performance, the aim is to find the smallest parameter perturbation that moves the post-disturbance initial condition onto the region-of-attraction boundary. The key sensitivity functional,
5
is finite and strictly positive in the recovery region and, generically, satisfies 6 on the recovery boundary (Fisher, 13 Jan 2025). The underlying mechanism is sensitivity blow-up near the stable manifold of a controlling boundary critical element. This makes trajectory sensitivity itself a boundary oracle and leads to Newton, continuation, and SQP procedures for nearest-boundary computation.
4. Learned predictors, planners, and differentiable policy-to-trajectory maps
A policy-to-trajectory pathway can also arise indirectly, through learned prediction modules placed upstream of planners. In autonomous driving, one study defines sensitivity not as a Jacobian norm but as the percent increase in average displacement error caused by perturbing one input feature at a time: 7 For Trajectron++ and AgentFormer, almost all perturbation sensitivity was concentrated in the most recent position and velocity states, with all other state-history entries having median sensitivity below 8 for Trajectron++ and below 9 for AgentFormer. The same work then propagated predictor perturbations into an optimization-based planner and showed the chain
0
including an abrupt stop from approximately 1 to 2 under an FGSM image perturbation or under occlusion of the most recent velocity state (Gibson et al., 2024).
Differentiable trajectory-refinement layers make the policy-to-trajectory map explicit. In DiffOG, the policy produces an action sequence 3, which is refined by solving
4
subject to hard finite-difference bounds. Because 5, the optimizer is unique, continuous, and subdifferentiable everywhere, and differentiable except on a measure-zero set; 6 can be computed via KKT conditions (Xu et al., 18 Apr 2025). This turns post-processed action trajectories into almost-everywhere differentiable functions of policy outputs and, by composition, of policy parameters.
Whole-trajectory generative world models create yet another route. In policy-guided trajectory diffusion, the action sequence is updated by a policy score term,
7
so the trajectory generator is explicitly sensitive to the gradient field of the policy distribution. The guidance scale 8 becomes a direct sensitivity gain, and the action-distribution experiments showed accurate matching for 9 but degradation when the policy variance fell below 0 (Rigter et al., 2023).
Scene-consistent prediction extends the same idea to interacting multi-agent rollouts. ScePT does not decode future coordinates independently; it generates joint clique trajectories by repeatedly applying a learned interaction policy and explicit agent dynamics for vehicles and pedestrians. Because edge encoders, attention, latent modes, and autoregressive rollout are coupled across agents, perturbations in one agent’s state can affect neighboring actions and hence scene-level future trajectories (Chen et al., 2022).
A closely related optimizer-sensitive formulation appears in guided policy search for initialization of trajectory optimization. There, the learned policy is trained not only on nominal SCP iterates but also on neighboring trajectories generated by local LQR feedback around each iterate. These neighboring rollouts function as empirical local sensitivity tubes. In the powered-descent study, policy-generated warm starts reduced mean PTR iterations from 1 to 2 and raised success from 3 to 4, directly exposing the sensitivity of the final optimizer to the policy-induced initial trajectory (Kim et al., 2021).
5. Statistical, functional, and graph-based attribution frameworks
When the output of interest is a function over time rather than a single pathwise realization, sensitivity analysis shifts toward time-indexed attribution. For functional-valued responses under finite input changes, the core decomposition is
5
with first-order, total-order, and interaction sensitivity indices
6
Interval-Wise Testing then supplies adjusted p-value functions
7
so one can identify the time intervals over which a scenario or policy input significantly affects the response trajectory while controlling interval-wise error (Fontana et al., 2020).
In stochastic ABMs, the sensitivity target can move from trajectories to optimal-policy mappings. One framework defines
8
and tests whether 9 is sensitive to 0 by comparing an additive null
1
against a non-additive alternative
2
using a GP-based likelihood-ratio statistic
3
This is not a direct trajectory sensitivity analysis; the paper states that its main contribution is a statistical framework for testing whether the optimal policy is sensitive to state variables, supplemented by a descriptive dynamic analysis of simulated time paths (Munson et al., 19 Feb 2026).
Policy-augmented graphical hybrid models replace derivative calculus with Shapley-value attribution over dynamic systems. The policy is
4
the transition is
5
and outputs of interest include future states 6 and cumulative reward
7
For policy parameters, coalitional value functions are formed by setting excluded coefficients to zero,
8
and Shapley values allocate output influence across policy parameters, random factors, and model parameters (Zhao et al., 2024). In the linear Gaussian approximation, policy sensitivity enters future states through the pathway matrices
9
which makes the closed-loop propagation channel explicit (Zhao et al., 2024).
A final boundary case is weight-space modeling. Transformer-based implicit policy learning treats the sequence of policy parameters
0
as a trajectory and learns
1
This is relevant only in a restricted sense: the trajectory being modeled is the training-time weight path, not the state-action rollout trajectory induced by the policy in the environment (Tang, 6 Mar 2025).
6. Methodological implications and unresolved issues
Several recurring lessons emerge. First, the most operationally relevant sensitivity quantity is often not the raw trajectory derivative alone but a coupled downstream quantity. In control-aware experiment design, the main differentiated object is the map from model perturbations to the optimal tracking augmentation,
2
and the sensitivity of interest is
3
because the practically important question is how much controller effort is required to preserve the planned trajectory under model error (Hart et al., 2022). This suggests that policy-to-trajectory sensitivity is often more informative when embedded in a model-to-policy-to-trajectory chain than when studied as a purely kinematic perturbation.
Second, the literature repeatedly warns against conflating predictor sensitivity, policy sensitivity, and outcome sensitivity. In autonomous driving, a predictor can appear most sensitive to state-history perturbations while image perturbations remain dangerous because of dimensionality, stealth, or mode switching, and the final hazard appears only after planner integration (Gibson et al., 2024). In trajectory-centric MBRL, the dominant analytic object may be a one-step local gain 4, not a full-horizon derivative 5 (Kolaric et al., 2020). In differentiable trajectory optimization, almost-everywhere differentiability of the optimizer does not remove active-set kinks or guarantee benign global conditioning (Xu et al., 18 Apr 2025).
Third, several common misconceptions are explicitly corrected by the cited works. One is that local linearization automatically yields faithful policy-to-trajectory sensitivity; the model-free trajectory-optimization literature argues that linearizing dynamics around the mean trajectory can bias the inferred effect of policy updates on future trajectories in strongly nonlinear systems (Akrour et al., 2016). Another is that a single global scalar sensitivity is sufficient; functional and graph-based frameworks instead expose time-local or state-local sensitivity patterns, and ABM work shows that some methods are really testing sensitivity of the optimizer 6, not of the full trajectory (Fontana et al., 2020, Munson et al., 19 Feb 2026, Zhao et al., 2024).
The dominant limitations are equally consistent. Many methods are local and first-order; many require transversality, smoothness, or fixed event sequences; many operate open-loop or scenario-by-scenario rather than over repeated replanning; and several provide only partial theory for long-horizon or nonsmooth settings. The safety-margin analysis is local near generic codimension-one recovery boundaries and relies on hyperbolicity and nonresonance conditions (Fisher, 13 Jan 2025). The trajectory-centric feedback formulation is explicitly one-step and does not derive full recursive sensitivity of the entire realized trajectory with respect to policy parameters (Kolaric et al., 2020). The driving-security study does not provide an end-to-end closed-loop Lipschitz constant from sensed input to control (Gibson et al., 2024). The ABM framework states that its dynamic analysis is descriptive rather than a dedicated trajectory sensitivity methodology (Munson et al., 19 Feb 2026).
A plausible implication is that the field is converging on a layered view. At one layer lie local variational maps, saltation-like jump gains, KL-based distribution-shift bounds, and KKT sensitivities. At another lie task-level quantities such as reward, risk, safety margin, planner feasibility, or controller burden. Policy-to-trajectory sensitivity analysis becomes most informative when these layers are linked explicitly: perturbation 7 trajectory change 8 constraint change or cost change 9 control or planning consequence.