Papers
Topics
Authors
Recent
Search
2000 character limit reached

Policy-to-Trajectory Sensitivity Analysis

Updated 3 July 2026
  • Policy-to-trajectory sensitivity analysis is a framework that quantifies how changes in policy parameters and inputs alter system trajectories.
  • It integrates diverse methods—such as local closed-loop, finite-change attribution, and Shapley-style techniques—to capture both instantaneous and cumulative effects.
  • These approaches provide actionable insights for robust control design by linking policy tweaks to downstream performance, safety, and operational outcomes.

Policy-to-trajectory sensitivity analysis denotes a family of methods for characterizing how perturbations in policy parameters, policy outputs, upstream predictive modules, or exogenous/model inputs alter predicted or realized trajectories in dynamical systems. Across the literature, the object of sensitivity is not uniform: some works study local closed-loop deviation dynamics, some study event-time and jump sensitivity in hybrid systems, some measure forecast-induced changes in downstream plans, and some replace derivative-based notions with finite-change or Shapley-style attribution over time-indexed outputs (Kolaric et al., 2020, Saccon et al., 2014, Gibson et al., 2024, Fontana et al., 2020, Zhao et al., 2024). The common theme is that trajectory change is treated as a first-class systems quantity rather than as a by-product of scalar performance analysis.

1. Scope and formal objects

Taken together, the cited works use several non-equivalent sensitivity objects. In trajectory-centric model-based reinforcement learning, the central quantity is a local deviation map around a nominal trajectory, with sensitivity expressed through the closed-loop linearized operator Ak+BkWA_k+B_kW and a worst-case one-step amplification metric over an uncertainty ellipsoid (Kolaric et al., 2020). In nonlinear safety analysis, the critical quantity is the parameter derivative of the disturbed trajectory,

ddpϕ(t,p)(y(p)),\frac{d}{dp}\phi_{(t,p)}(y(p)),

and, more specifically, the reciprocal of its worst-case-over-time norm,

G(p)=(supt0ddpϕ(t,p)(y(p))1)1,G(p)=\left(\sup_{t\ge 0}\left\|\frac{d}{dp}\phi_{(t,p)}(y(p))\right\|_1\right)^{-1},

which generically vanishes on the recovery boundary (Fisher, 13 Jan 2025). In functional sensitivity analysis for scenario interventions, the fundamental outputs are time-indexed finite-change effects such as ϕi1(t)\phi_i^1(t), ϕiT(t)\phi_i^T(t), and ϕiI(t)\phi_i^{\mathcal I}(t), rather than infinitesimal derivatives (Fontana et al., 2020). In stochastic agent-based policy design, the primary question can shift from “how does a trajectory change under a policy perturbation?” to “is the optimal policy x(θ)x^*(\theta) sensitive to state variables θ\theta?”, operationalized through additivity versus non-additivity of the objective surface f(x,θ)f(x,\theta) (Munson et al., 19 Feb 2026).

Setting Sensitivity object Primary interpretation
Trajectory-centric feedback design dmax,kd_{max,k} from ddpϕ(t,p)(y(p)),\frac{d}{dp}\phi_{(t,p)}(y(p)),0 Worst-case one-step deviation gain
Hybrid jump systems ddpϕ(t,p)(y(p)),\frac{d}{dp}\phi_{(t,p)}(y(p)),1 and ddpϕ(t,p)(y(p)),\frac{d}{dp}\phi_{(t,p)}(y(p)),2 Event-time and jump propagation
Safety margins ddpϕ(t,p)(y(p)),\frac{d}{dp}\phi_{(t,p)}(y(p)),3 Distance to recovery loss via trajectory blow-up
Functional/scenario analysis ddpϕ(t,p)(y(p)),\frac{d}{dp}\phi_{(t,p)}(y(p)),4 Time-local finite-change attribution

This heterogeneity is substantive rather than terminological. Some formulations are local and first-order, some are worst-case over bounded uncertainty sets, some are finite-change decompositions, and some are optimizer-level or planner-level. A plausible implication is that “policy-to-trajectory sensitivity” is best understood as a methodological umbrella rather than a single canonical estimand.

2. Local closed-loop and trajectory-centric formulations

A canonical local formulation appears in trajectory-centric model-based reinforcement learning, where the actual control is written as

ddpϕ(t,p)(y(p)),\frac{d}{dp}\phi_{(t,p)}(y(p)),5

with deviation ddpϕ(t,p)(y(p)),\frac{d}{dp}\phi_{(t,p)}(y(p)),6, and the realized next state depends jointly on the nominal trajectory, local deviation, and controller parameters (Kolaric et al., 2020). After linearization,

ddpϕ(t,p)(y(p)),\frac{d}{dp}\phi_{(t,p)}(y(p)),7

and, for linear feedback ddpϕ(t,p)(y(p)),\frac{d}{dp}\phi_{(t,p)}(y(p)),8,

ddpϕ(t,p)(y(p)),\frac{d}{dp}\phi_{(t,p)}(y(p)),9

The corresponding worst-case local sensitivity metric is

G(p)=(supt0ddpϕ(t,p)(y(p))1)1,G(p)=\left(\sup_{t\ge 0}\left\|\frac{d}{dp}\phi_{(t,p)}(y(p))\right\|_1\right)^{-1},0

equivalently the squared spectral norm of G(p)=(supt0ddpϕ(t,p)(y(p))1)1,G(p)=\left(\sup_{t\ge 0}\left\|\frac{d}{dp}\phi_{(t,p)}(y(p))\right\|_1\right)^{-1},1 (Kolaric et al., 2020). This formulation is explicitly local, one-step, and trajectory-centered: the controller is optimized jointly with the nominal state-control sequence to reduce the amplification of bounded deviations.

A distinct but related policy-to-trajectory argument appears in model-free trajectory-based policy optimization. There, the policy is updated under an exact expected KL trust region, and the main theoretical result is not an explicit Jacobian of trajectory with respect to policy parameters but a bound linking small policy change to small state-distribution change. Under Gaussian state marginals and linear-Gaussian policies, if

G(p)=(supt0ddpϕ(t,p)(y(p))1)1,G(p)=\left(\sup_{t\ge 0}\left\|\frac{d}{dp}\phi_{(t,p)}(y(p))\right\|_1\right)^{-1},2

then G(p)=(supt0ddpϕ(t,p)(y(p))1)1,G(p)=\left(\sup_{t\ge 0}\left\|\frac{d}{dp}\phi_{(t,p)}(y(p))\right\|_1\right)^{-1},3 as G(p)=(supt0ddpϕ(t,p)(y(p))1)1,G(p)=\left(\sup_{t\ge 0}\left\|\frac{d}{dp}\phi_{(t,p)}(y(p))\right\|_1\right)^{-1},4, and the policy-improvement theorem yields

G(p)=(supt0ddpϕ(t,p)(y(p))1)1,G(p)=\left(\sup_{t\ge 0}\left\|\frac{d}{dp}\phi_{(t,p)}(y(p))\right\|_1\right)^{-1},5

Sensitivity is therefore mediated through state-distribution drift rather than through an explicit dynamics linearization (Akrour et al., 2016).

Risk-sensitive exponential-cost MDPs provide a third trajectory-centric formulation. There the gradient of the long-run entropic cost is expressed over regeneration cycles, so policy sensitivity is explicitly a trajectory-level score-weighted quantity: G(p)=(supt0ddpϕ(t,p)(y(p))1)1,G(p)=\left(\sup_{t\ge 0}\left\|\frac{d}{dp}\phi_{(t,p)}(y(p))\right\|_1\right)^{-1},6 The weighting by the exponential of the whole cycle cost makes the relevant trajectory sensitivity sharply path-dependent and heavier-tailed than standard risk-neutral score-function gradients (Moharrami et al., 2022).

3. Hybrid events, jumps, and safety boundaries

In hybrid systems, the central difficulty is that a perturbation generally changes not only the state trajectory but also the event time. The one-jump analysis of state-triggered systems resolves this by introducing extended ante-event and post-event nominal trajectories and by comparing perturbed trajectories with the appropriate branch rather than with a single glued nominal path (Saccon et al., 2014). On smooth segments, the variational equations are standard: G(p)=(supt0ddpϕ(t,p)(y(p))1)1,G(p)=\left(\sup_{t\ge 0}\left\|\frac{d}{dp}\phi_{(t,p)}(y(p))\right\|_1\right)^{-1},7 At the event, the first-order jump-time shift is

G(p)=(supt0ddpϕ(t,p)(y(p))1)1,G(p)=\left(\sup_{t\ge 0}\left\|\frac{d}{dp}\phi_{(t,p)}(y(p))\right\|_1\right)^{-1},8

and the post-event perturbation satisfies

G(p)=(supt0ddpϕ(t,p)(y(p))1)1,G(p)=\left(\sup_{t\ge 0}\left\|\frac{d}{dp}\phi_{(t,p)}(y(p))\right\|_1\right)^{-1},9

with

ϕi1(t)\phi_i^1(t)0

The matrix ϕi1(t)\phi_i^1(t)1 plays the role of a saltation/reset sensitivity map (Saccon et al., 2014).

A power-system application pushes this logic into a hybrid DAE setting with switching events induced by faults and fault clearing. There the policy variables are the PSS parameters ϕi1(t)\phi_i^1(t)2, embedded as constant augmented states, and the transient objective is

ϕi1(t)\phi_i^1(t)3

The gradient is computed directly from trajectory sensitivities: ϕi1(t)\phi_i^1(t)4 Sensitivity propagation therefore combines DAE variational equations on smooth segments with event-time-corrected jump conditions at switching hypersurfaces (Zhang, 2013).

In nonlinear safety analysis, the same geometric structure is exploited differently. Instead of optimizing transient performance, the aim is to find the smallest parameter perturbation that moves the post-disturbance initial condition onto the region-of-attraction boundary. The key sensitivity functional,

ϕi1(t)\phi_i^1(t)5

is finite and strictly positive in the recovery region and, generically, satisfies ϕi1(t)\phi_i^1(t)6 on the recovery boundary (Fisher, 13 Jan 2025). The underlying mechanism is sensitivity blow-up near the stable manifold of a controlling boundary critical element. This makes trajectory sensitivity itself a boundary oracle and leads to Newton, continuation, and SQP procedures for nearest-boundary computation.

4. Learned predictors, planners, and differentiable policy-to-trajectory maps

A policy-to-trajectory pathway can also arise indirectly, through learned prediction modules placed upstream of planners. In autonomous driving, one study defines sensitivity not as a Jacobian norm but as the percent increase in average displacement error caused by perturbing one input feature at a time: ϕi1(t)\phi_i^1(t)7 For Trajectron++ and AgentFormer, almost all perturbation sensitivity was concentrated in the most recent position and velocity states, with all other state-history entries having median sensitivity below ϕi1(t)\phi_i^1(t)8 for Trajectron++ and below ϕi1(t)\phi_i^1(t)9 for AgentFormer. The same work then propagated predictor perturbations into an optimization-based planner and showed the chain

ϕiT(t)\phi_i^T(t)0

including an abrupt stop from approximately ϕiT(t)\phi_i^T(t)1 to ϕiT(t)\phi_i^T(t)2 under an FGSM image perturbation or under occlusion of the most recent velocity state (Gibson et al., 2024).

Differentiable trajectory-refinement layers make the policy-to-trajectory map explicit. In DiffOG, the policy produces an action sequence ϕiT(t)\phi_i^T(t)3, which is refined by solving

ϕiT(t)\phi_i^T(t)4

subject to hard finite-difference bounds. Because ϕiT(t)\phi_i^T(t)5, the optimizer is unique, continuous, and subdifferentiable everywhere, and differentiable except on a measure-zero set; ϕiT(t)\phi_i^T(t)6 can be computed via KKT conditions (Xu et al., 18 Apr 2025). This turns post-processed action trajectories into almost-everywhere differentiable functions of policy outputs and, by composition, of policy parameters.

Whole-trajectory generative world models create yet another route. In policy-guided trajectory diffusion, the action sequence is updated by a policy score term,

ϕiT(t)\phi_i^T(t)7

so the trajectory generator is explicitly sensitive to the gradient field of the policy distribution. The guidance scale ϕiT(t)\phi_i^T(t)8 becomes a direct sensitivity gain, and the action-distribution experiments showed accurate matching for ϕiT(t)\phi_i^T(t)9 but degradation when the policy variance fell below ϕiI(t)\phi_i^{\mathcal I}(t)0 (Rigter et al., 2023).

Scene-consistent prediction extends the same idea to interacting multi-agent rollouts. ScePT does not decode future coordinates independently; it generates joint clique trajectories by repeatedly applying a learned interaction policy and explicit agent dynamics for vehicles and pedestrians. Because edge encoders, attention, latent modes, and autoregressive rollout are coupled across agents, perturbations in one agent’s state can affect neighboring actions and hence scene-level future trajectories (Chen et al., 2022).

A closely related optimizer-sensitive formulation appears in guided policy search for initialization of trajectory optimization. There, the learned policy is trained not only on nominal SCP iterates but also on neighboring trajectories generated by local LQR feedback around each iterate. These neighboring rollouts function as empirical local sensitivity tubes. In the powered-descent study, policy-generated warm starts reduced mean PTR iterations from ϕiI(t)\phi_i^{\mathcal I}(t)1 to ϕiI(t)\phi_i^{\mathcal I}(t)2 and raised success from ϕiI(t)\phi_i^{\mathcal I}(t)3 to ϕiI(t)\phi_i^{\mathcal I}(t)4, directly exposing the sensitivity of the final optimizer to the policy-induced initial trajectory (Kim et al., 2021).

5. Statistical, functional, and graph-based attribution frameworks

When the output of interest is a function over time rather than a single pathwise realization, sensitivity analysis shifts toward time-indexed attribution. For functional-valued responses under finite input changes, the core decomposition is

ϕiI(t)\phi_i^{\mathcal I}(t)5

with first-order, total-order, and interaction sensitivity indices

ϕiI(t)\phi_i^{\mathcal I}(t)6

Interval-Wise Testing then supplies adjusted p-value functions

ϕiI(t)\phi_i^{\mathcal I}(t)7

so one can identify the time intervals over which a scenario or policy input significantly affects the response trajectory while controlling interval-wise error (Fontana et al., 2020).

In stochastic ABMs, the sensitivity target can move from trajectories to optimal-policy mappings. One framework defines

ϕiI(t)\phi_i^{\mathcal I}(t)8

and tests whether ϕiI(t)\phi_i^{\mathcal I}(t)9 is sensitive to x(θ)x^*(\theta)0 by comparing an additive null

x(θ)x^*(\theta)1

against a non-additive alternative

x(θ)x^*(\theta)2

using a GP-based likelihood-ratio statistic

x(θ)x^*(\theta)3

This is not a direct trajectory sensitivity analysis; the paper states that its main contribution is a statistical framework for testing whether the optimal policy is sensitive to state variables, supplemented by a descriptive dynamic analysis of simulated time paths (Munson et al., 19 Feb 2026).

Policy-augmented graphical hybrid models replace derivative calculus with Shapley-value attribution over dynamic systems. The policy is

x(θ)x^*(\theta)4

the transition is

x(θ)x^*(\theta)5

and outputs of interest include future states x(θ)x^*(\theta)6 and cumulative reward

x(θ)x^*(\theta)7

For policy parameters, coalitional value functions are formed by setting excluded coefficients to zero,

x(θ)x^*(\theta)8

and Shapley values allocate output influence across policy parameters, random factors, and model parameters (Zhao et al., 2024). In the linear Gaussian approximation, policy sensitivity enters future states through the pathway matrices

x(θ)x^*(\theta)9

which makes the closed-loop propagation channel explicit (Zhao et al., 2024).

A final boundary case is weight-space modeling. Transformer-based implicit policy learning treats the sequence of policy parameters

θ\theta0

as a trajectory and learns

θ\theta1

This is relevant only in a restricted sense: the trajectory being modeled is the training-time weight path, not the state-action rollout trajectory induced by the policy in the environment (Tang, 6 Mar 2025).

6. Methodological implications and unresolved issues

Several recurring lessons emerge. First, the most operationally relevant sensitivity quantity is often not the raw trajectory derivative alone but a coupled downstream quantity. In control-aware experiment design, the main differentiated object is the map from model perturbations to the optimal tracking augmentation,

θ\theta2

and the sensitivity of interest is

θ\theta3

because the practically important question is how much controller effort is required to preserve the planned trajectory under model error (Hart et al., 2022). This suggests that policy-to-trajectory sensitivity is often more informative when embedded in a model-to-policy-to-trajectory chain than when studied as a purely kinematic perturbation.

Second, the literature repeatedly warns against conflating predictor sensitivity, policy sensitivity, and outcome sensitivity. In autonomous driving, a predictor can appear most sensitive to state-history perturbations while image perturbations remain dangerous because of dimensionality, stealth, or mode switching, and the final hazard appears only after planner integration (Gibson et al., 2024). In trajectory-centric MBRL, the dominant analytic object may be a one-step local gain θ\theta4, not a full-horizon derivative θ\theta5 (Kolaric et al., 2020). In differentiable trajectory optimization, almost-everywhere differentiability of the optimizer does not remove active-set kinks or guarantee benign global conditioning (Xu et al., 18 Apr 2025).

Third, several common misconceptions are explicitly corrected by the cited works. One is that local linearization automatically yields faithful policy-to-trajectory sensitivity; the model-free trajectory-optimization literature argues that linearizing dynamics around the mean trajectory can bias the inferred effect of policy updates on future trajectories in strongly nonlinear systems (Akrour et al., 2016). Another is that a single global scalar sensitivity is sufficient; functional and graph-based frameworks instead expose time-local or state-local sensitivity patterns, and ABM work shows that some methods are really testing sensitivity of the optimizer θ\theta6, not of the full trajectory (Fontana et al., 2020, Munson et al., 19 Feb 2026, Zhao et al., 2024).

The dominant limitations are equally consistent. Many methods are local and first-order; many require transversality, smoothness, or fixed event sequences; many operate open-loop or scenario-by-scenario rather than over repeated replanning; and several provide only partial theory for long-horizon or nonsmooth settings. The safety-margin analysis is local near generic codimension-one recovery boundaries and relies on hyperbolicity and nonresonance conditions (Fisher, 13 Jan 2025). The trajectory-centric feedback formulation is explicitly one-step and does not derive full recursive sensitivity of the entire realized trajectory with respect to policy parameters (Kolaric et al., 2020). The driving-security study does not provide an end-to-end closed-loop Lipschitz constant from sensed input to control (Gibson et al., 2024). The ABM framework states that its dynamic analysis is descriptive rather than a dedicated trajectory sensitivity methodology (Munson et al., 19 Feb 2026).

A plausible implication is that the field is converging on a layered view. At one layer lie local variational maps, saltation-like jump gains, KL-based distribution-shift bounds, and KKT sensitivities. At another lie task-level quantities such as reward, risk, safety margin, planner feasibility, or controller burden. Policy-to-trajectory sensitivity analysis becomes most informative when these layers are linked explicitly: perturbation θ\theta7 trajectory change θ\theta8 constraint change or cost change θ\theta9 control or planning consequence.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (16)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Policy-to-Trajectory Sensitivity Analysis.