Tiny Recursive Control (TRC)

Updated 25 December 2025

Tiny Recursive Control (TRC) is a neural architecture that uses recursive, weight-shared refinement to iteratively update control sequences for continuous optimal control.
It leverages a hierarchical latent structure with on-the-fly simulation and error correction to approach near-optimal solutions in nonlinear tasks such as the Van der Pol oscillator and powered descent.
TRC offers a scalable tradeoff between latency and control quality while maintaining a fixed, compact memory footprint, making it ideal for embedded aerospace applications.

Tiny Recursive Control (TRC) is a neural architecture for continuous optimal control that leverages recursive, weight-shared refinement operators to achieve high control quality while maintaining a compact memory and computational footprint. Departing from conventional feed-forward and large-scale LLM-based controllers, which require parameter counts in the millions or billions, TRC is designed to match or exceed their capacity through iteration depth, not parameter number. By repeatedly applying the same compact network through a hierarchical latent structure, TRC efficiently refines candidate control sequences using on-the-fly simulation and tracking error correction, enabling deployment in resource-constrained aerospace environments (Jain et al., 18 Dec 2025).

1. Motivation and Foundational Principle

Conventional neural controllers—including standard feed-forward networks and transformer-based models—memorize mappings from state and goal to control. Realizing strong performance on diverse, high-dimensional tasks necessitates very large parameter counts, often exceeding millions or billions, which results in prohibitive memory and latency demands for embedded guidance computers typical in satellite, UAV, and launch vehicle applications. For example, a 7 billion-parameter LLM controller may require hundreds of MB of memory and inference times beyond 100 ms, which is infeasible for high-frequency (e.g., 100 Hz) control loops.

TRC is motivated by the insight from Tiny Recursive Models (TRM) in NLP, where model capacity is achieved by repeatedly reusing a single refinement operator, rather than increasing model width or depth. Each TRC iteration simulates a candidate trajectory under the current control sequence, measures the discrepancy at the goal (terminal tracking error), and updates the control using a weight-shared hierarchical network. This iterative process enables substantial expressivity with a fixed ~1.5M-parameter core, providing an adjustable compute knob (number of iterations) without increasing memory or model size (Jain et al., 18 Dec 2025).

2. Mathematical Formulation

TRC addresses the finite-horizon, discrete-time optimal control problem with horizon $T$ :

State $x_t \in \mathbb{R}^{d_x}$ , control $u_t \in \mathbb{R}^{d_u}$
System dynamics:

$x_{t+1} = f(x_t, u_t), \quad t=0, ..., T-1$

Cost function:

$J(u_{0:T-1}) = \sum_{t=0}^{T-1} \ell(x_t, u_t) + \ell_f(x_T, x_{\rm target})$

where $\ell$ and $\ell_f$ are quadratic stage and terminal costs. Given initial state $x_0$ and target $x_{\rm target}$ , the controller iteratively refines $u^{(k)}$ such that the terminal simulated state $\hat x_T^{(k)}$ approaches $x_{\rm target}$ .

Latent variables in TRC include:

$z_0$ : Initial context encoding $(x_0, x_{\rm target}, t_{\rm remaining})$
$z_H^{(k)}$ , $z_L^{(k)}$ : High- and low-level latents per iteration $k$

The recursive refinement operator $\mathcal{R}_\theta$ applies:

$u^{(k)} = \mathcal{R}_\theta\big(u^{(k-1)}, x_0, x_{\rm target}, e^{(k-1)}\big)$

with terminal tracking error

$e^{(k)} = \hat x_T^{(k)} - x_{\rm target}$

This framework supports multi-level latent reasoning, simulates new trajectories at each iteration, and updates control sequences based on feedback from previous iterations.

3. Network Architecture and Hierarchical Latent Structure

TRC comprises five main modules, sharing approximately 1.5M parameters:

StateEncoder: MLP( $2d_x+1 \to d_z$ ) with LayerNorm and GELU, producing $z_0$ .
InitialDecoder: MLP( $d_z \to T \cdot d_u$ ), generating initial control $u^{(0)}$ .
ErrorEmbed: MLP( $d_x \to d_z$ ), embedding the terminal error.
ControlEmbed: Linear (flattened $u^{(k-1)} \to d_z$ ), encoding candidate controls.
ResidualDecoder: MLP([ $z_H$ ; $u^{(k-1)}$ ] $\to$ $T \cdot d_u$ ), computing control refinements $\Delta u^{(k)}$ .

The two-level hierarchical reasoning module $\mathcal{L}_\theta$ contains:

High-level latent $z_H \in \mathbb{R}^{d_h}$ (strategic/contextual)
Low-level latent $z_L \in \mathbb{R}^{d_h}$ (tactical/error correction)

Architectural hyperparameters include $d_z=256$ , $d_h=512$ , three transformer blocks per $\mathcal{L}_\theta$ module, eight attention heads, and GELU activations. All modules share weights across recursive iterations $k=1,...,K$ and low-level cycles $i=1,...,n$ within each iteration.

The computational workflow at each iteration $k$ is:

Encode state and target into $z_0$
Initialize latent variables $z_H$ , $z_L$
Generate initial control $u^{(0)}$
For $k = 1$ $k = 1$ to $K$ $K$ :
- Simulate trajectory $\hat x^{(k-1)}$ under $u^{(k-1)}$
- Compute $e^{(k-1)} = \hat x_T^{(k-1)} - x_{\rm target}$
- Update context $z_{\rm ctx} = z_0 + \text{ErrorEmbed}(e^{(k-1)}) + \text{ControlEmbed}(u^{(k-1)})$
- Update low-level latent $n$ times; then update high-level latent
- Decode residual control, $\Delta u^{(k)}$
- Update $u^{(k)} = \mathrm{clip}(u^{(k-1)} + \Delta u^{(k)}, u_{\min}, u_{\max})$

TRC introduces a recursive reasoning loop where, at each iteration, the candidate control $u^{(k-1)}$ is evaluated via forward simulation, and the resulting terminal error $e^{(k-1)}$ is embedded and combined into a context vector. This context, together with the persistent high/low-level latents, is processed by the shared module $\mathcal{L}_\theta$ in a series of tactical and strategic updates, culminating in a refined control increment $\Delta u^{(k)}$ .

The update can be interpreted as a form of learned gradient descent on the terminal cost:

$J(u) = \tfrac{1}{2} \|\hat x_T(u) - x_{\rm target}\|_2^2, \qquad \nabla_u J = \left(\frac{\partial x_T}{\partial u}\right)^\top e$

where the residual decoder approximates $- \eta \nabla_u J$ at each step. This recursive structure facilitates progressive error minimization, analogously to iterative optimization in model-predictive control, but using a learned operator with fixed parameters.

5. Computational and Memory Characteristics

TRC is explicitly designed for efficiency in memory-constrained, low-latency settings. The model comprises approximately 1.5M parameters (sub-10 MB memory footprint), with inference latency measured at 5 ms (Van der Pol) to 8 ms (Powered Descent) for three refinement iterations on an NVIDIA RTX 3080. Additional iterations add only 2–3 ms per step, with memory usage constant since parameters and activations are shared and reused.

A comparative summary:

Controller Type	Parameters	Memory	Inference (GPU)	Iterative	Notes
TRC	1.5 M	<10 MB	5–8 ms (K=3)	Yes	No explicit gradients
Feedforward/MPC NN	10–50 M	100–200MB	10–20 ms	No	Fixed complexity
LLM-based	100 M–7 B	400MB–20GB	100–500 ms	No	Large memory footprint

TRC’s memory use is two orders of magnitude smaller than LLM alternatives, making it suitable for on-board deployment where hardware resources are severely constrained (Jain et al., 18 Dec 2025).

6. Empirical Performance and Ablations

TRC has been evaluated on two nonlinear control tasks:

Van der Pol Oscillator: $(d_x=2,\, d_u=1,\, T=100)$ , trained on 10,000 initial states, evaluated on 1,000 test cases. With three recursion steps $(K=3)$ , TRC achieves mean control cost of $79.6$, exactly matching the optimal value computed by SQP. Cost reduction per iteration is approximately $32\%$ , with total $90\%$ reduction from $k=0$ to $k=3$ . Ablation shows that with $K=1$ , cost is $4\times$ the optimum; $K=2$ , $1.2\times$ optimum; and $K=3$ , matches optimum. Inference time is 5 ms; memory is 8 MB.
Powered Descent Task: $(d_x=7,\, d_u=3,\, T=50)$ , using 4,812 optimal trajectories computed by successive convexification. TRC achieves mean fuel cost $1.02\times$ optimal (2% above), with per-iteration cost reduction of 32% and inference time 8 ms (9 MB memory). Thrust profiles match bang-bang structures of fuel-optimal solutions. Ablation: $K=1$ gives $2.5\times$ optimal cost, $K=2$ is $1.3\times$ , and $K=3$ is $1.02\times$ optimal.

These results indicate that, as recursion depth increases, TRC approaches optimal solution quality while remaining within tight resource budgets.

7. Limitations, Implications, and Future Directions

TRC empirically demonstrates that recursive, weight-shared reasoning transfers from discrete domains to continuous optimal control. A key implication is that iteration depth provides an adjustable tradeoff between latency and control quality: fewer iterations yield faster responses suitable for time-critical contexts, while more iterations can be used where higher optimality is required. Critically, the memory footprint remains fixed regardless of iteration count.

The approach also offers auxiliary benefits such as inspectability—intermediate control sequences $u^{(1)},...,u^{(K-1)}$ are available for diagnostics, verification, or downstream safety planning. However, several limitations are noted:

No formal guarantees of stability or constraint satisfaction; Lyapunov/certificate-based approaches or differentiable barrier functions remain to be integrated.
Offline training requires large datasets of optimal trajectories, although future work could relax this with reinforcement learning or meta-learning.
Current constraint handling leverages per-step clipping; more sophisticated methods would improve explicit constraint satisfaction.
Full validation on actual flight hardware is pending.

In summary, TRC achieves near-optimal control on challenging nonlinear systems using up to two orders of magnitude fewer parameters and sub-10 MB memory, by trading model size for recursive computation. This establishes a path for deploying efficient, embedded neural optimal controllers in resource-constrained domains such as aerospace (Jain et al., 18 Dec 2025).

PDF Markdown Chat (Pro)

References (1)

Tiny Recursive Control: Iterative Reasoning for Efficient Optimal Control (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Tiny Recursive Control (TRC).

Tiny Recursive Control (TRC)

1. Motivation and Foundational Principle

2. Mathematical Formulation

3. Network Architecture and Hierarchical Latent Structure

4. Recursive Reasoning and Control Refinement

5. Computational and Memory Characteristics

6. Empirical Performance and Ablations

7. Limitations, Implications, and Future Directions

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Tiny Recursive Control (TRC)

1. Motivation and Foundational Principle

2. Mathematical Formulation

3. Network Architecture and Hierarchical Latent Structure

4. Recursive Reasoning and Control Refinement

5. Computational and Memory Characteristics

6. Empirical Performance and Ablations

7. Limitations, Implications, and Future Directions

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics