Zero-Shot Adaptive Control

Updated 4 June 2026

Zero-shot adaptive control is a paradigm that designs controllers to operate instantly in new environments by leveraging offline learned structures.
It employs methodologies such as function encoder-based policies, cascaded architectures, and closed-form inference without online gradient tuning.
Empirical evaluations show that zero-shot controllers achieve near-expert performance with rapid deployment and enhanced safety across diverse tasks.

Zero-shot adaptive control refers to the design of control policies capable of immediately producing effective and reliable behavior in novel environments or under new system parameters, without requiring additional data collection, gradient-based adaptation, or any task-specific fine-tuning at deployment. The central challenge is to generalize across parametric families of dynamics, objectives, or environments by leveraging structure learned during an offline phase. This paradigm stands in contrast to classical robust control, which focuses on worst-case designs, and to adaptive control, which typically requires online parameter estimation or iterative retuning. Zero-shot adaptive control integrates and extends concepts from reinforcement learning, function approximation, and model-based optimal control, and has recently seen accelerated progress through neural network-based architectures, function encoder methods, and advances in differentiable predictive control.

1. Core Problem Setting and Formalization

Zero-shot adaptive control is fundamentally concerned with families of control systems parameterized by latent or explicit variables (such as mass, friction, controller objectives, or actuator configurations), with the aim of producing policies or controllers that are instantly performant on any valid parameterization.

Mathematically, models assume a family of nonlinear or stochastic systems:

$\dot{x}(t) = f\big(x(t), u(t); \theta\big), \quad \theta \in \Theta \subseteq \mathbb{R}^p,$

where $x(t) \in \mathbb{R}^n$ is the system state, $u(t) \in \mathbb{R}^m$ the control input, and $\theta$ parametrizes the system family (e.g., physical constants, morphology descriptors, environmental variables) (Iqbal et al., 7 Nov 2025). Control objectives are typically formulated either as maximizing expected cumulative reward in an MDP setting (Malik, 2019, Liu et al., 2024), or as minimizing a cost functional

$\mathcal{J}(u; s) = \int_0^T L\big(x(t), u(t), t; s\big) dt + G\big(x(T); s\big)$

for some objective parameter $s$ (which could encode target locations, obstacle descriptions, or desired trajectories) (Li et al., 22 Sep 2025).

Zero-shot adaptation is defined as the ability to deploy a policy $\pi$ —precomputed or trained offline—in a new system or task parameterization $(\theta^*, s^*)$ , with no further policy optimization, retraining, or online gradient-based adaptation. Only lightweight “meta-inference,” such as estimation of coefficients (using, e.g., closed-form projection or regression) from a short rollout or explicit parameter specification, is permitted (Ingebrand et al., 2024, Iqbal et al., 7 Nov 2025). This is distinct from few-shot adaptation or transfer learning, which may require a limited subsequent online update.

2. Representative Methodologies

A variety of architectures and algorithms realize zero-shot adaptive control, unified by a two-stage “offline–online decomposition”:

Offline: Representation Learning and Policy/Model Training.
- Learn a general representation—either of policy space, value functions, or system dynamics—by exposing the learner to a wide range of task/system parameterizations.
- This includes cascaded or modular architectures (Malik, 2019), function encoder bases (Ingebrand et al., 2024, Li et al., 22 Sep 2025, Iqbal et al., 7 Nov 2025), or goal-conditioned imitation via GAIL with structured latent spaces (Wu et al., 20 Oct 2025).
- Optimization may use RL (e.g., PPO, SAC), imitation, or self-supervised policy rollout.
Online: Parameter Identification and Policy Execution.
- Quickly infer the relevant coefficients or latent codes for the new environment, via closed-form projection, operator networks, or latent extraction from short trajectory windows (Li et al., 22 Sep 2025, Iqbal et al., 7 Nov 2025, Liu et al., 2024).
- Deploy the policy conditioned on these inferred codes, achieving immediate adaptation without task-specific gradient steps.

Key representative methods include:

a) Function Encoder-Based Policy and Model Representations

Policies and/or dynamics are represented as linear combinations of globally trained neural basis functions $\{\phi_j\}$ :

$u_{\rm FE}(x, t; s) = \sum_{j=1}^p c_j(s) \phi_j(x, t; \omega_j),$

where $x(t) \in \mathbb{R}^n$ 0 are task-dependent coefficients, and $x(t) \in \mathbb{R}^n$ 1 are parameters of the basis (Li et al., 22 Sep 2025). The dynamics family can likewise be approximated via neural ODE bases (Ingebrand et al., 2024, Iqbal et al., 7 Nov 2025).

Online, new task or system parameters are encoded as coefficient vectors ( $x(t) \in \mathbb{R}^n$ 2), found from explicit task specifications or from short observed rollouts using ridge regression.

b) Cascaded and Modular Architectures

CASNET leverages cascaded recurrent encoders and decoders over system components (links, nodes) to produce latent, morphology-agnostic embeddings $x(t) \in \mathbb{R}^n$ 3 for robot morphologies. An abstract action policy operates in a shared latent space, with decoders re-instantiating system-specific outputs (Malik, 2019).

c) Successor Feature and Latent-State Approaches

Adaptive agents trained in simulation with domain randomization extract a latent environment representation from recent trajectories, which then conditions the modular policy at deployment for immediate adaptation in the real world. Successor feature heads and an arbiter enable the zero-shot composition of primitive policies for new tasks (Liu et al., 2024).

d) Correction Policies for Sim2Real Transfer

The Reverse Action Transformation (RAT) approach learns an auxiliary policy on top of a universal policy network (UPN), correcting simulator-derived actions to better track state evolution in perturbed real environments. Once trained, it can be composed in zero-shot fashion with new real-world environments without task-specific finetuning (Semage et al., 2023).

e) Goal-Conditioned Imitation with Structured Latent Spaces

Zero-shot generative RL for plasma control combines GAIL-style adversarial imitation with a Hilbert-latent space structured via Bellman-consistent encoding, enabling universal goal-conditioned policy deployment across unseen geometric targets without fine-tuning (Wu et al., 20 Oct 2025).

f) Tube-Based MPC for Robust Transfer

ADAPT integrates offline RL with online tube-based robust MPC to transfer a learned policy to any target within a known bounded model discrepancy, guaranteeing safety and bounded reward loss under Lipschitz dynamic divergence (Harrison et al., 2017).

3. Mathematical and Algorithmic Foundations

The mathematical backbone of zero-shot adaptive control is the decoupling of the offline acquisition of a universal or basis policy/model, and the efficient online adaptation enabled by explicit, often closed-form, inference for new parameterizations.

Function Encoders and Basis Learning: Training yields a finite set of neural basis functions in a Hilbert space of dynamics or policy functions (often neural networks). Any instance in this family can then be efficiently encoded by projecting observed rollouts or task specifications onto the span of these bases (Li et al., 22 Sep 2025, Ingebrand et al., 2024, Iqbal et al., 7 Nov 2025). The computation of coefficients $x(t) \in \mathbb{R}^n$ 4 frequently reduces to solving a regularized linear system.
Latent Representation Extraction: Modular, graph, or sequence-based encoders extract system-invariant features (e.g., CASNET’s $x(t) \in \mathbb{R}^n$ 5) which inform the abstract action policy, decoupling morphology or parameter specifics from low-level control (Malik, 2019).
Closed-Form On-the-Fly Inference: For both value function approximation and system identification, estimation is conducted via regression or inner-product projections—no gradient descent—using as few as 100 observations (Ingebrand et al., 2024, Li et al., 22 Sep 2025, Iqbal et al., 7 Nov 2025).
Joint Embedding and Policy Training: In goal-conditioned tasks, Bellman-consistent latent spaces are constructed so that Euclidean distances in latent space encode time-to-goal or transition utility, allowing value shaping without task-specific RL or explicit dynamic identification (Wu et al., 20 Oct 2025).
Safety and Theoretical Guarantees: Tube-based robust MPC provides invariance and performance gap bounds between the learned policy and real-system performance, under convex, Lipschitz-bounded model mismatch (Harrison et al., 2017).

4. Empirical Performance and Benchmarks

Extensive empirical evaluation across robotic manipulation, locomotion, path planning, vehicle control, and plasma regulation validates that state-of-the-art zero-shot adaptive controllers routinely achieve performance within 1–12% of per-task expertly-tuned baselines, with adaptation latency limited to a few milliseconds.

For example, in robot control:

Domain	Zero-Shot Return (% of Expert)	Success Rate
Planar Manipulators (Malik, 2019)	92 ± 3	0.90 / 1.00
Crawlers	88 ± 4	0.85 / 0.95

In zero-shot function-encoder-based predictive control, closed-loop MSE on nonlinear tasks remains within 2× of classical MPC with inference time accelerated by 2–70× (Iqbal et al., 7 Nov 2025). In sim-to-real transfer tasks, correction policies substantially exceed the performance of naive transfer, and remain effective over adjacent parameterizations without explicit re-tuning (Semage et al., 2023).

Qualitative results underline the data efficiency, sample efficiency, and instant deployment capabilities of zero-shot adaptive controllers, evidenced by consistent performance across unseen tasks (e.g., real-world blimp tracking, plasma scenario tracking, high-DoF robot morphologies) (Liu et al., 2024, Wu et al., 20 Oct 2025).

5. Theoretical Analysis, Limitations, and Extensions

Universal approximation theorems guarantee that with sufficiently many basis functions (or expressive enough latent architectures), the learned policy/model class can approximate any continuous feedback policy in the relevant Hilbert space to arbitrary precision (Li et al., 22 Sep 2025). Finite-sample least-squares error decays at $x(t) \in \mathbb{R}^n$ 6, given sufficient diversity of training tasks and full-rank Gram matrices.

Limitations primarily arise for environments far outside the convex hull of training parameter distributions, in high-dimensional or hierarchical control scenarios, or where task parameters are implicit and mapping to latent codes is ill-posed (Li et al., 22 Sep 2025, Malik, 2019, Iqbal et al., 7 Nov 2025). Stability and constraint satisfaction is not universally guaranteed, and in certain approaches, explicit safety tubes or additional barrier constraints are required (Harrison et al., 2017). Extension to broader classes of tasks (e.g., multi-agent, highly stochastic, or vision-based control) and integration with fast online identification or safe learning remain active research directions.

6. Relation to Broader Control and Learning Paradigms

Zero-shot adaptive control bridges concepts from robust control, adaptive control, meta-learning, and transfer learning, but is distinguished by its explicit emphasis on “instant deployment” with no on-the-fly gradient updates. Unlike robust control, which plans for worst-case but fixed uncertainties, or classical adaptive schemes requiring ongoing estimation, zero-shot controllers exploit learned structures from large-scale diverse training or simulation to produce both efficiency and safety in entirely novel contexts. Recent hybridizations with imitation learning (e.g., GAIL), representation learning (e.g., Hilbert-latent spaces), and differentiable programming are particularly notable for enabling foundation models for control with broad applicability (Wu et al., 20 Oct 2025, Iqbal et al., 7 Nov 2025).

The central research trajectory is toward universal controllers—capable of generalizing across tasks, morphologies, and environments—delivered with real-time performance and without the overheads of per-task or per-environment optimization (Malik, 2019, Wu et al., 20 Oct 2025).