Papers
Topics
Authors
Recent
2000 character limit reached

Tiny Recursive Control (TRC)

Updated 25 December 2025
  • Tiny Recursive Control (TRC) is a neural architecture that uses recursive, weight-shared refinement to iteratively update control sequences for continuous optimal control.
  • It leverages a hierarchical latent structure with on-the-fly simulation and error correction to approach near-optimal solutions in nonlinear tasks such as the Van der Pol oscillator and powered descent.
  • TRC offers a scalable tradeoff between latency and control quality while maintaining a fixed, compact memory footprint, making it ideal for embedded aerospace applications.

Tiny Recursive Control (TRC) is a neural architecture for continuous optimal control that leverages recursive, weight-shared refinement operators to achieve high control quality while maintaining a compact memory and computational footprint. Departing from conventional feed-forward and large-scale LLM-based controllers, which require parameter counts in the millions or billions, TRC is designed to match or exceed their capacity through iteration depth, not parameter number. By repeatedly applying the same compact network through a hierarchical latent structure, TRC efficiently refines candidate control sequences using on-the-fly simulation and tracking error correction, enabling deployment in resource-constrained aerospace environments (Jain et al., 18 Dec 2025).

1. Motivation and Foundational Principle

Conventional neural controllers—including standard feed-forward networks and transformer-based models—memorize mappings from state and goal to control. Realizing strong performance on diverse, high-dimensional tasks necessitates very large parameter counts, often exceeding millions or billions, which results in prohibitive memory and latency demands for embedded guidance computers typical in satellite, UAV, and launch vehicle applications. For example, a 7 billion-parameter LLM controller may require hundreds of MB of memory and inference times beyond 100 ms, which is infeasible for high-frequency (e.g., 100 Hz) control loops.

TRC is motivated by the insight from Tiny Recursive Models (TRM) in NLP, where model capacity is achieved by repeatedly reusing a single refinement operator, rather than increasing model width or depth. Each TRC iteration simulates a candidate trajectory under the current control sequence, measures the discrepancy at the goal (terminal tracking error), and updates the control using a weight-shared hierarchical network. This iterative process enables substantial expressivity with a fixed ~1.5M-parameter core, providing an adjustable compute knob (number of iterations) without increasing memory or model size (Jain et al., 18 Dec 2025).

2. Mathematical Formulation

TRC addresses the finite-horizon, discrete-time optimal control problem with horizon TT:

  • State xtRdxx_t \in \mathbb{R}^{d_x}, control utRduu_t \in \mathbb{R}^{d_u}
  • System dynamics:

xt+1=f(xt,ut),t=0,...,T1x_{t+1} = f(x_t, u_t), \quad t=0, ..., T-1

  • Cost function:

J(u0:T1)=t=0T1(xt,ut)+f(xT,xtarget)J(u_{0:T-1}) = \sum_{t=0}^{T-1} \ell(x_t, u_t) + \ell_f(x_T, x_{\rm target})

where \ell and f\ell_f are quadratic stage and terminal costs. Given initial state x0x_0 and target xtargetx_{\rm target}, the controller iteratively refines u(k)u^{(k)} such that the terminal simulated state x^T(k)\hat x_T^{(k)} approaches xtargetx_{\rm target}.

Latent variables in TRC include:

  • z0z_0: Initial context encoding (x0,xtarget,tremaining)(x_0, x_{\rm target}, t_{\rm remaining})
  • zH(k)z_H^{(k)}, zL(k)z_L^{(k)}: High- and low-level latents per iteration kk

The recursive refinement operator Rθ\mathcal{R}_\theta applies:

u(k)=Rθ(u(k1),x0,xtarget,e(k1))u^{(k)} = \mathcal{R}_\theta\big(u^{(k-1)}, x_0, x_{\rm target}, e^{(k-1)}\big)

with terminal tracking error

e(k)=x^T(k)xtargete^{(k)} = \hat x_T^{(k)} - x_{\rm target}

This framework supports multi-level latent reasoning, simulates new trajectories at each iteration, and updates control sequences based on feedback from previous iterations.

3. Network Architecture and Hierarchical Latent Structure

TRC comprises five main modules, sharing approximately 1.5M parameters:

  • StateEncoder: MLP(2dx+1dz2d_x+1 \to d_z) with LayerNorm and GELU, producing z0z_0.
  • InitialDecoder: MLP(dzTdud_z \to T \cdot d_u), generating initial control u(0)u^{(0)}.
  • ErrorEmbed: MLP(dxdzd_x \to d_z), embedding the terminal error.
  • ControlEmbed: Linear (flattened u(k1)dzu^{(k-1)} \to d_z), encoding candidate controls.
  • ResidualDecoder: MLP([zHz_H; u(k1)u^{(k-1)}] \to TduT \cdot d_u), computing control refinements Δu(k)\Delta u^{(k)}.

The two-level hierarchical reasoning module Lθ\mathcal{L}_\theta contains:

  • High-level latent zHRdhz_H \in \mathbb{R}^{d_h} (strategic/contextual)
  • Low-level latent zLRdhz_L \in \mathbb{R}^{d_h} (tactical/error correction)

Architectural hyperparameters include dz=256d_z=256, dh=512d_h=512, three transformer blocks per Lθ\mathcal{L}_\theta module, eight attention heads, and GELU activations. All modules share weights across recursive iterations k=1,...,Kk=1,...,K and low-level cycles i=1,...,ni=1,...,n within each iteration.

The computational workflow at each iteration kk is:

  1. Encode state and target into z0z_0
  2. Initialize latent variables zHz_H, zLz_L
  3. Generate initial control u(0)u^{(0)}
  4. For k=1k = 1 to KK:
    • Simulate trajectory x^(k1)\hat x^{(k-1)} under u(k1)u^{(k-1)}
    • Compute e(k1)=x^T(k1)xtargete^{(k-1)} = \hat x_T^{(k-1)} - x_{\rm target}
    • Update context zctx=z0+ErrorEmbed(e(k1))+ControlEmbed(u(k1))z_{\rm ctx} = z_0 + \text{ErrorEmbed}(e^{(k-1)}) + \text{ControlEmbed}(u^{(k-1)})
    • Update low-level latent nn times; then update high-level latent
    • Decode residual control, Δu(k)\Delta u^{(k)}
    • Update u(k)=clip(u(k1)+Δu(k),umin,umax)u^{(k)} = \mathrm{clip}(u^{(k-1)} + \Delta u^{(k)}, u_{\min}, u_{\max})

4. Recursive Reasoning and Control Refinement

TRC introduces a recursive reasoning loop where, at each iteration, the candidate control u(k1)u^{(k-1)} is evaluated via forward simulation, and the resulting terminal error e(k1)e^{(k-1)} is embedded and combined into a context vector. This context, together with the persistent high/low-level latents, is processed by the shared module Lθ\mathcal{L}_\theta in a series of tactical and strategic updates, culminating in a refined control increment Δu(k)\Delta u^{(k)}.

The update can be interpreted as a form of learned gradient descent on the terminal cost:

J(u)=12x^T(u)xtarget22,uJ=(xTu)eJ(u) = \tfrac{1}{2} \|\hat x_T(u) - x_{\rm target}\|_2^2, \qquad \nabla_u J = \left(\frac{\partial x_T}{\partial u}\right)^\top e

where the residual decoder approximates ηuJ- \eta \nabla_u J at each step. This recursive structure facilitates progressive error minimization, analogously to iterative optimization in model-predictive control, but using a learned operator with fixed parameters.

5. Computational and Memory Characteristics

TRC is explicitly designed for efficiency in memory-constrained, low-latency settings. The model comprises approximately 1.5M parameters (sub-10 MB memory footprint), with inference latency measured at 5 ms (Van der Pol) to 8 ms (Powered Descent) for three refinement iterations on an NVIDIA RTX 3080. Additional iterations add only 2–3 ms per step, with memory usage constant since parameters and activations are shared and reused.

A comparative summary:

Controller Type Parameters Memory Inference (GPU) Iterative Notes
TRC 1.5 M <10 MB 5–8 ms (K=3) Yes No explicit gradients
Feedforward/MPC NN 10–50 M 100–200MB 10–20 ms No Fixed complexity
LLM-based 100 M–7 B 400MB–20GB 100–500 ms No Large memory footprint

TRC’s memory use is two orders of magnitude smaller than LLM alternatives, making it suitable for on-board deployment where hardware resources are severely constrained (Jain et al., 18 Dec 2025).

6. Empirical Performance and Ablations

TRC has been evaluated on two nonlinear control tasks:

  • Van der Pol Oscillator: (dx=2,du=1,T=100)(d_x=2,\, d_u=1,\, T=100), trained on 10,000 initial states, evaluated on 1,000 test cases. With three recursion steps (K=3)(K=3), TRC achieves mean control cost of $79.6$, exactly matching the optimal value computed by SQP. Cost reduction per iteration is approximately 32%32\%, with total 90%90\% reduction from k=0k=0 to k=3k=3. Ablation shows that with K=1K=1, cost is 4×4\times the optimum; K=2K=2, 1.2×1.2\times optimum; and K=3K=3, matches optimum. Inference time is 5 ms; memory is 8 MB.
  • Powered Descent Task: (dx=7,du=3,T=50)(d_x=7,\, d_u=3,\, T=50), using 4,812 optimal trajectories computed by successive convexification. TRC achieves mean fuel cost 1.02×1.02\times optimal (2% above), with per-iteration cost reduction of 32% and inference time 8 ms (9 MB memory). Thrust profiles match bang-bang structures of fuel-optimal solutions. Ablation: K=1K=1 gives 2.5×2.5\times optimal cost, K=2K=2 is 1.3×1.3\times, and K=3K=3 is 1.02×1.02\times optimal.

These results indicate that, as recursion depth increases, TRC approaches optimal solution quality while remaining within tight resource budgets.

7. Limitations, Implications, and Future Directions

TRC empirically demonstrates that recursive, weight-shared reasoning transfers from discrete domains to continuous optimal control. A key implication is that iteration depth provides an adjustable tradeoff between latency and control quality: fewer iterations yield faster responses suitable for time-critical contexts, while more iterations can be used where higher optimality is required. Critically, the memory footprint remains fixed regardless of iteration count.

The approach also offers auxiliary benefits such as inspectability—intermediate control sequences u(1),...,u(K1)u^{(1)},...,u^{(K-1)} are available for diagnostics, verification, or downstream safety planning. However, several limitations are noted:

  • No formal guarantees of stability or constraint satisfaction; Lyapunov/certificate-based approaches or differentiable barrier functions remain to be integrated.
  • Offline training requires large datasets of optimal trajectories, although future work could relax this with reinforcement learning or meta-learning.
  • Current constraint handling leverages per-step clipping; more sophisticated methods would improve explicit constraint satisfaction.
  • Full validation on actual flight hardware is pending.

In summary, TRC achieves near-optimal control on challenging nonlinear systems using up to two orders of magnitude fewer parameters and sub-10 MB memory, by trading model size for recursive computation. This establishes a path for deploying efficient, embedded neural optimal controllers in resource-constrained domains such as aerospace (Jain et al., 18 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Tiny Recursive Control (TRC).