TOLD: Task-Oriented Latent Dynamics

Updated 17 November 2025

TOLD is a framework that learns low-dimensional, task-relevant latent representations from high-dimensional systems for effective control and decoding.
It integrates contrastive encoding and differentiable controller synthesis—using methods like the Koopman operator—to optimize performance.
The approach has proven superior decodability and robustness across benchmark control domains and neural data applications.

Task-Oriented Latent Dynamics (TOLD) is a methodology framework for learning low-dimensional representations of high-dimensional systems or processes, in which the latent state encoding and the latent dynamics are explicitly oriented towards capturing the underlying control or task-relevant structure. This concept formalizes and optimizes for a latent space and dynamics that support effective control or decoding, as opposed to purely reconstructive or generative modeling. TOLD has emerged as a central paradigm for both (a) control systems using approaches such as the Koopman operator and (b) neural and behavioral data analysis via sequential variational autoencoders with structured latent dynamics (Lyu et al., 2023, Geenjaar et al., 2023).

1. Mathematical Foundations

In the control-theoretic setting, TOLD addresses the discrete-time, nonlinear control problem: $\min_u J(x_{0:T}, u_{0:T-1}) = \sum_{k=0}^{T-1} c(x_k, u_k) \qquad \text{subject to} \quad x_{k+1}=f(x_k,u_k)$ The strategy is to learn an encoder $\psi_\theta$ such that

$z_k = \psi_\theta(x_k) \in \mathbb{R}^d$

and in this latent space, enforce approximately linear image dynamics via a Koopman approximation: $z_{k+1} = A z_k + B u_k$ Controller design then solves the task cost not in the original space, but in the latent space, optimizing

$J = \mathbb{E}_{\tau\sim\pi} \left[\sum_{k=0}^\infty (z_k - z_\text{ref})^\top Q (z_k - z_\text{ref}) + u_k^\top R u_k\right]$

where $Q,R$ are positive-definite and $z_\text{ref}$ encodes the desired goal in latent space. A contrastive encoder regularization and one-step latent dynamics loss are combined in a single objective, supporting end-to-end learning (Lyu et al., 2023).

In whole-brain neuroscience data, TOLD is instantiated using a sequential variational autoencoder with latent dynamics parameterized either as a Neural Ordinary Differential Equation (NODE) or as a recurrent neural network (RNN). The generative model factors as: $p(\mathbf{z}_0) = \mathcal{N}(\mathbf{z}_0; 0, I),\;\;\; p(\mathbf{z}_t \mid \mathbf{z}_{t-1}) = \delta\left(\mathbf{z}_t - \left[\mathbf{z}_{t-1} + \int_{t-1}^t f_\theta(\mathbf{z}(\tau), \tau) d\tau\right]\right),\ p(\mathbf{x}_t \mid \mathbf{z}_t) = \mathcal{N}(\mathbf{x}_t; \mathrm{Dec}_\psi(\mathbf{z}_t), \sigma_x^2 I)$ where $f_\theta$ describes the latent dynamics (NODE or RNN parameterizations) and $\mathrm{Dec}_\psi$ is the spatial decoder from latent to observed space (Geenjaar et al., 2023).

2. Encoder Architecture and Dynamics Learning

In control, TOLD employs a contrastive encoder scheme incorporating two neural networks—query-encoder and momentum key-encoder—trained with the InfoNCE loss. For each sample, positive and negative pairs are generated using augmentation and batch negatives. The batch-based contrastive loss is

$\mathcal{L}_\text{cst} = \mathbb{E}_B \left[-\log\frac{\exp(z_i^q{}^\top W z_i^+)}{\exp(z_i^q{}^\top W z_i^+) + \sum_{j\neq i} \exp(z_i^q{}^\top W z_j^-)}\right]$

This enforces invariance in the latent space to observation-level nuisances and enhances alignment of task-relevant state representations (Lyu et al., 2023).

In neural data modeling, TOLD uses high-capacity spatial encoders (linear or nonlinear) and recurrent or continuous-time latent propagation. The initial-state encoder is a bidirectional GRU network. In NODE variants, latent states are propagated by integrating $f_\theta(z, t)$ with standard ODE solvers (e.g., Runge-Kutta). In RNN variants, propagation uses a GRU cell and an affine projection to the latent space. This structure supports explicit modeling of smooth, continuous latent dynamics that can be probed for fixed points and stability (Geenjaar et al., 2023).

3. Joint Optimization Objectives

For control, TOLD employs a composite loss: $\mathcal{L}_\text{total} = \mathcal{L}_\text{sac} + \lambda_c\,\mathcal{L}_\text{cst} + \lambda_m\,\mathcal{L}_m$ where:

$\mathcal{L}_\text{sac}$ : Soft-actor-critic loss over the policy and value networks in latent space,
$\mathcal{L}_\text{cst}$ : Contrastive encoder loss,
$\mathcal{L}_m$ : One-step Koopman model prediction loss,
$\lambda_c, \lambda_m$ : weighting hyperparameters.

This allows all modules (encoder, linearized dynamics, and controller) to be learned in an end-to-end, task-driven fashion.

For neural data, the canonical evidence lower bound (ELBO) is used: $\mathcal{L}_\text{ELBO} = \mathbb{E}_{q(\mathbf{z}_0|\mathbf{x}_{1:T})}\left[\sum_{t=1}^T \log p(\mathbf{x}_t|\mathbf{z}_t)\right] - \mathrm{KL}(q(\mathbf{z}_0|\mathbf{x})\|\;p(\mathbf{z}_0))$ No explicit task labels are incorporated during training; all task-related structure is recovered post hoc from the learned latent trajectories.

4. Latent-Space Control and Linear Quadratic Regulation

TOLD control design in latent space focuses on infinite-horizon Linear Quadratic Regulation (LQR) using the learned linear dynamics: $z_{k+1} = A z_k + B u_k$ Optimal gain $G$ is computed by iteratively solving the discrete algebraic Riccati equation (DARE) for $P$ : $P_m = A^\top P_{m+1} A - A^\top P_{m+1} B (R+B^\top P_{m+1} B)^{-1}B^\top P_{m+1} A + Q$ with controller $u = -G z$ , $G = (R+B^\top P_1 B)^{-1} B^\top P_1 A$ . Critically, all steps are differentiable in $\{A,B,Q,R,\psi_\theta\}$ , enabling end-to-end backpropagation through the controller synthesis process (Lyu et al., 2023).

5. Experimental Protocols and Evaluation Metrics

TOLD has been evaluated on benchmark control domains (CartPole Swing-up, Cheetah Running, pixel-based CartPole) and real-robot (TurtleBot3) LiDAR navigation. Metrics include:

Cumulative task cost/negative reward,
Latent model one-step prediction error (MSE in $z$ -space),
Qualitative assessment of latent trajectory alignment (t-SNE visualization),
Ablation for encoder type, model error robustness, dimensionality of latent space, and LQR iteration count.

In brain data analysis, TOLD's evaluation includes:

Sub-task classification accuracy from latent trajectories,
Spatial specificity of extracted components,
Fixed-point stability analysis for the learned dynamical system,
Robustness to random initialization (across seeds) (Geenjaar et al., 2023).

A summary of benchmark classification accuracy for latent dimension $d=8$ : | Method | Hand vs. Foot | 5-Way Motor | 0-Back vs 2-Back | Relational vs. Control | |------------------------|:-------------:|:-----------:|:----------------:|:----------------------:| | PCA | 0.72 | 0.60 | 0.68 | 0.65 | | VAE (Linear) | 0.78 | 0.65 | 0.73 | 0.70 | | VAE (Nonlinear) | 0.80 | 0.68 | 0.75 | 0.72 | | TOLD-RNN (Nonlinear) | 0.88 | 0.74 | 0.82 | 0.78 | | TOLD-NODE (Nonlinear) | 0.91 | 0.78 | 0.85 | 0.80 |

TOLD exhibits superior task-decodability and spatial resolution relative to standard techniques (Geenjaar et al., 2023).

6. Ablation Studies, Interpretability, and Limitations

Ablation experiments demonstrate:

The necessity of contrastive encoding for robustness and high-dimensional observables (pixel and sensor modalities).
TOLD's robustness to model error is superior to two-stage Koopman methods; latent cost degrades by $<8\%$ under moderate misspecification, versus $>40\%$ for model-oriented baselines.
Automated, end-to-end learning of controller weighting matrices (Q) outperforms hand-tuned versions by a wide margin.

Interpretability is enabled by analysis of the learned latent controller matrices (Q, R) and visualization (e.g., Q correlates with focus on specific object locations in observation space for CartPole). The method supports sim-to-real zero-shot transfer for navigation tasks (Lyu et al., 2023).

Limitations include:

Data/sample efficiency remains a bottleneck for end-to-end RL training—potentially addressable by warm-starting from system identification or offline data.
The approach requires further validation on diverse hardware platforms.
For extremely high-dimensional tasks, increasing latent dimension may lead to intractable or unstable controllers.

7. Significance and Future Directions

TOLD establishes a framework for learning task-oriented latent dynamics in both control and representation learning settings, uniting the benefits of structured latent embeddings and model-based control or decoding. It moves beyond reconstructive models by enforcing that latent structure serve explicit task- or control-based objectives, enabling scalability from low-dimensional systems to complex, high-dimensional scenarios such as vision-based control or large-scale neural recordings.

Possible extensions include incorporation of multi-task objectives, transfer or generalization across environments, and further architectural innovations in latent dynamics modeling. Adoption in RL, neuroscience, and robotics demonstrates broad applicability, while challenges remain in improving sample efficiency and furthering interpretability (Lyu et al., 2023, Geenjaar et al., 2023).