Tiny Recursive Control (TRC)
- Tiny Recursive Control (TRC) is a neural architecture that uses recursive, weight-shared refinement to iteratively update control sequences for continuous optimal control.
- It leverages a hierarchical latent structure with on-the-fly simulation and error correction to approach near-optimal solutions in nonlinear tasks such as the Van der Pol oscillator and powered descent.
- TRC offers a scalable tradeoff between latency and control quality while maintaining a fixed, compact memory footprint, making it ideal for embedded aerospace applications.
Tiny Recursive Control (TRC) is a neural architecture for continuous optimal control that leverages recursive, weight-shared refinement operators to achieve high control quality while maintaining a compact memory and computational footprint. Departing from conventional feed-forward and large-scale LLM-based controllers, which require parameter counts in the millions or billions, TRC is designed to match or exceed their capacity through iteration depth, not parameter number. By repeatedly applying the same compact network through a hierarchical latent structure, TRC efficiently refines candidate control sequences using on-the-fly simulation and tracking error correction, enabling deployment in resource-constrained aerospace environments (Jain et al., 18 Dec 2025).
1. Motivation and Foundational Principle
Conventional neural controllers—including standard feed-forward networks and transformer-based models—memorize mappings from state and goal to control. Realizing strong performance on diverse, high-dimensional tasks necessitates very large parameter counts, often exceeding millions or billions, which results in prohibitive memory and latency demands for embedded guidance computers typical in satellite, UAV, and launch vehicle applications. For example, a 7 billion-parameter LLM controller may require hundreds of MB of memory and inference times beyond 100 ms, which is infeasible for high-frequency (e.g., 100 Hz) control loops.
TRC is motivated by the insight from Tiny Recursive Models (TRM) in NLP, where model capacity is achieved by repeatedly reusing a single refinement operator, rather than increasing model width or depth. Each TRC iteration simulates a candidate trajectory under the current control sequence, measures the discrepancy at the goal (terminal tracking error), and updates the control using a weight-shared hierarchical network. This iterative process enables substantial expressivity with a fixed ~1.5M-parameter core, providing an adjustable compute knob (number of iterations) without increasing memory or model size (Jain et al., 18 Dec 2025).
2. Mathematical Formulation
TRC addresses the finite-horizon, discrete-time optimal control problem with horizon :
- State , control
- System dynamics:
- Cost function:
where and are quadratic stage and terminal costs. Given initial state and target , the controller iteratively refines such that the terminal simulated state approaches .
Latent variables in TRC include:
- : Initial context encoding
- , : High- and low-level latents per iteration
The recursive refinement operator applies:
with terminal tracking error
This framework supports multi-level latent reasoning, simulates new trajectories at each iteration, and updates control sequences based on feedback from previous iterations.
3. Network Architecture and Hierarchical Latent Structure
TRC comprises five main modules, sharing approximately 1.5M parameters:
- StateEncoder: MLP() with LayerNorm and GELU, producing .
- InitialDecoder: MLP(), generating initial control .
- ErrorEmbed: MLP(), embedding the terminal error.
- ControlEmbed: Linear (flattened ), encoding candidate controls.
- ResidualDecoder: MLP([; ] ), computing control refinements .
The two-level hierarchical reasoning module contains:
- High-level latent (strategic/contextual)
- Low-level latent (tactical/error correction)
Architectural hyperparameters include , , three transformer blocks per module, eight attention heads, and GELU activations. All modules share weights across recursive iterations and low-level cycles within each iteration.
The computational workflow at each iteration is:
- Encode state and target into
- Initialize latent variables ,
- Generate initial control
- For to :
- Simulate trajectory under
- Compute
- Update context
- Update low-level latent times; then update high-level latent
- Decode residual control,
- Update
4. Recursive Reasoning and Control Refinement
TRC introduces a recursive reasoning loop where, at each iteration, the candidate control is evaluated via forward simulation, and the resulting terminal error is embedded and combined into a context vector. This context, together with the persistent high/low-level latents, is processed by the shared module in a series of tactical and strategic updates, culminating in a refined control increment .
The update can be interpreted as a form of learned gradient descent on the terminal cost:
where the residual decoder approximates at each step. This recursive structure facilitates progressive error minimization, analogously to iterative optimization in model-predictive control, but using a learned operator with fixed parameters.
5. Computational and Memory Characteristics
TRC is explicitly designed for efficiency in memory-constrained, low-latency settings. The model comprises approximately 1.5M parameters (sub-10 MB memory footprint), with inference latency measured at 5 ms (Van der Pol) to 8 ms (Powered Descent) for three refinement iterations on an NVIDIA RTX 3080. Additional iterations add only 2–3 ms per step, with memory usage constant since parameters and activations are shared and reused.
A comparative summary:
| Controller Type | Parameters | Memory | Inference (GPU) | Iterative | Notes |
|---|---|---|---|---|---|
| TRC | 1.5 M | <10 MB | 5–8 ms (K=3) | Yes | No explicit gradients |
| Feedforward/MPC NN | 10–50 M | 100–200MB | 10–20 ms | No | Fixed complexity |
| LLM-based | 100 M–7 B | 400MB–20GB | 100–500 ms | No | Large memory footprint |
TRC’s memory use is two orders of magnitude smaller than LLM alternatives, making it suitable for on-board deployment where hardware resources are severely constrained (Jain et al., 18 Dec 2025).
6. Empirical Performance and Ablations
TRC has been evaluated on two nonlinear control tasks:
- Van der Pol Oscillator: , trained on 10,000 initial states, evaluated on 1,000 test cases. With three recursion steps , TRC achieves mean control cost of $79.6$, exactly matching the optimal value computed by SQP. Cost reduction per iteration is approximately , with total reduction from to . Ablation shows that with , cost is the optimum; , optimum; and , matches optimum. Inference time is 5 ms; memory is 8 MB.
- Powered Descent Task: , using 4,812 optimal trajectories computed by successive convexification. TRC achieves mean fuel cost optimal (2% above), with per-iteration cost reduction of 32% and inference time 8 ms (9 MB memory). Thrust profiles match bang-bang structures of fuel-optimal solutions. Ablation: gives optimal cost, is , and is optimal.
These results indicate that, as recursion depth increases, TRC approaches optimal solution quality while remaining within tight resource budgets.
7. Limitations, Implications, and Future Directions
TRC empirically demonstrates that recursive, weight-shared reasoning transfers from discrete domains to continuous optimal control. A key implication is that iteration depth provides an adjustable tradeoff between latency and control quality: fewer iterations yield faster responses suitable for time-critical contexts, while more iterations can be used where higher optimality is required. Critically, the memory footprint remains fixed regardless of iteration count.
The approach also offers auxiliary benefits such as inspectability—intermediate control sequences are available for diagnostics, verification, or downstream safety planning. However, several limitations are noted:
- No formal guarantees of stability or constraint satisfaction; Lyapunov/certificate-based approaches or differentiable barrier functions remain to be integrated.
- Offline training requires large datasets of optimal trajectories, although future work could relax this with reinforcement learning or meta-learning.
- Current constraint handling leverages per-step clipping; more sophisticated methods would improve explicit constraint satisfaction.
- Full validation on actual flight hardware is pending.
In summary, TRC achieves near-optimal control on challenging nonlinear systems using up to two orders of magnitude fewer parameters and sub-10 MB memory, by trading model size for recursive computation. This establishes a path for deploying efficient, embedded neural optimal controllers in resource-constrained domains such as aerospace (Jain et al., 18 Dec 2025).