Trace-Conditioned Controls
- Trace-conditioned controls are policies that use a full or partial trajectory history instead of just current states, enabling informed decision-making in non-Markovian settings.
- In robotics and world modeling, these controls exploit memory architectures and path signatures, leading to significant improvements in long-horizon and multi-stage tasks.
- Applied in dynamic systems and economic risk management, trace-conditioning reduces tail losses and boosts performance under delayed evidence and ambiguous state conditions.
Trace-conditioned controls are a class of control policies in which the governing decision or dynamic law is explicitly conditioned on the history or geometry of a trajectory—referred to as a "trace"—rather than solely on instantaneous observations or purely Markovian state descriptors. Trace-conditioning is a foundational principle wherever critical information for optimal control or risk management is revealed or can be reconstructed only from the past evolution of the system, tool-use history, or latent state observations. This paradigm is prominent in modern robotics, stochastic control, economic risk underwriting, and dynamical systems theory. Trace-conditioned controls enable effective handling of delayed-evidence, ambiguous state, and non-Markovian circumstances by leveraging structured memory or explicit constraints arising from full or partial trajectory data.
1. Formal Definitions and Principle
A trace-conditioned control policy acts based on either the full or partial history of realized states, actions, or actions–observations pairs. Let a controlled episode be represented as a trajectory
or, in operational settings, as a sequence of tool calls, events, or system states. Unlike strictly Markovian controls, where the policy is a function of (i.e., ), a trace-conditioned control or (in batch decisions) incorporates features dependent on the order and geometry of the trace. This approach generalizes to systems where two histories arriving at the same current observed state necessitate different actions (Li et al., 12 Jun 2026).
In optimal control theory, trace-conditioning may manifest as constraints on the terminal state or entire path of tracer particles within a flow (Eldesoukey et al., 4 May 2025), or as a batch-policy in risk underwriting, mapping the realized trace to an intervention or risk-adjusted action (Xu et al., 15 Jun 2026).
2. Trace-Conditioned Memory Architectures in Robotics
Trace-conditioning is implemented in robotic control where non-local and occluded evidence must be retained across long task horizons. The TRACE memory architecture (Li et al., 12 Jun 2026) exemplifies this approach for visuomotor imitation:
- Path Signatures: The robot's executed trajectory is encoded via truncated path signatures , capturing the order and geometry of the path and providing reparameterization invariance. For 0-dimensional state and 1, the signature dimension is 2 (e.g., 3 yields 4).
- Signature-Keyed Memory: At early cue moments, task-relevant visual–state embeddings are written to a slot-based memory indexed by learned MLPs over the path signature. Reads and writes are managed via attention over trajectory-conditioned keys, not wall-clock time.
- Policy Conditioning: The memory output is concatenated or cross-attended via lightweight adapters to the policy's backbone, restoring the crucial but unobservable task context when ambiguities arise at decision points.
Empirical evaluations show trace-conditioned memory achieves large improvements over short-history, recurrent, and transformer policies in long-horizon, multistage real-robot tasks with delayed-evidence structure. For example, a baseline regression policy achieves 25.5% stage progress, increasing to 69.2% with TRACE memory, and outperforming the best transformer baselines by over 18 points (Li et al., 12 Jun 2026).
3. Trace-Based Representation in World Modeling and Manipulation
In scalable robot world modeling, the 5 model (Lee et al., 11 Jun 2026) predicts future 3D traces of interaction points (e.g., end-effectors, objects) via B-spline control points, entirely bypassing direct action labels. The TraceExtract pipeline automatically constructs these trace-based labels from video, enabling policy learning and imitation purely from large-scale, action-free data. Key details include:
- B-spline Representation: Each segment is parameterized as 6.
- Permutation-invariant Trace Expert: A transformer-based expert processes per-keypoint tokens, where each keypoint's history and predicted control points are inferred.
- Trace-Conditioned Policy: The downstream control policy is conditioned on hidden motion features from the trace model fused into the action expert.
Trace-conditioned 7 policies attain robot success rates competitive with or exceeding action-supervised VLA models, and show high embodiment transferability, as traces abstract from raw kinematic details (Lee et al., 11 Jun 2026).
4. Planning and Long-Horizon Policy Design via Trace Prompts
LoHo-Manip (Liu et al., 23 Apr 2026) formalizes trace-conditioning in long-horizon planning by predicting explicit visual traces (e.g., 2D keypoint waypoints) for the remainder of a task at each decision point. The architecture comprises:
- Task Manager: A vision–LLM predicts both a textual task plan (done and remaining steps) and a visual trace 8 overlay for each control phase.
- Executor VLA Policy: Policies 9 condition on the rendered trace 0 and subtask language.
- Implicit Progress Tracking: The manager repeatedly replans from current state, automatically recovering from failures and obviating the need for explicit state memory or failure logic.
Closed-loop experiments demonstrate robust performance, with success rates increases of 20–50% in OOD and multi-step manipulation tasks (Liu et al., 23 Apr 2026).
5. Trace-Conditioned Controls in Risk, Insurance, and Economic Policy
Trace-conditioned control in AI risk underwriting and automation management (Xu et al., 15 Jun 2026) applies economic minimization to the observed tool-use trace 1. Formally, for each trace, the control policy
2
selects (e.g., allow, review, sandbox) only when the expected averted loss 3 exceeds the intervention cost 4. The economic effectiveness, identifiability, and fairness of such policies are formally characterized by a Bayes-risk gap 5 and finite-sample scope theorem, requiring bounded role and action set.
Empirically, trace-conditioned controls reduce CVaR6 tail loss by 72% (from 710.98\$\pi(a_t|x_t)$9k) on 1,000 agent trajectories while halving the human review burden, outperforming both static rule-based and flat-pricing approaches (Xu et al., 15 Jun 2026).
6. Optimal Control with Tracer Constraints in Dynamical Systems
In the context of dynamical systems, trace-conditioned control laws are derived to satisfy constraints tied to specific tracer (particle) trajectories or endpoints (Eldesoukey et al., 4 May 2025). In a linear–Gaussian setting,
$\pi(a_t|\tau_{0:t})$0
with state covariances and tracer constraints, the optimal $\pi(a_t|\tau_{0:t})$1 is computed to drive the ensemble along prescribed covariance paths and individual tracers along exact target trajectories $\pi(a_t|\tau_{0:t})$2 or endpoints $\pi(a_t|\tau_{0:t})$3. Control costs are defined as action (kinetic energy) and attention (complexity) integrals, leading to closed-form feedback expressions via the Euler–Lagrange equations and two-point boundary-value problems. Inference analogs treat the tracer path as a measurement and the optimal $\pi(a_t|\tau_{0:t})$4 as a maximum a posteriori estimator. Extensions include stochastic noise, nonlinear dynamics, and non-Gaussian ensembles (Eldesoukey et al., 4 May 2025).
7. Limitations, Scope, and Ongoing Research
Trace-conditioned policies require contractually or architecturally bounded roles (finite action vocabulary, max trace length, predefined tools) for economic identifiability and operational auditability (Xu et al., 15 Jun 2026). In robotic memory architectures, the expressiveness and slot design of trace-indexed memory are performance-critical; underfitting signature order or memory width sharply degrades accuracy (Li et al., 12 Jun 2026). In dynamical systems, the approach relies on tractable state evolutions (linear, Gaussian) for analytic solutions; nonlinear and non-Gaussian settings require approximations or particle methods (Eldesoukey et al., 4 May 2025). Ongoing research addresses extensions to richer modalities, robustness under partial observability, scaling to internet-scale data, and integration with reinforcement learning or causal inference frameworks.