Adaptive Re-Acquisition Controller

Updated 13 December 2025

Adaptive Re-Acquisition Controllers are systems that dynamically adjust control or sensing policies to re-establish target objectives after perturbations.
They integrate meta-learning, reinforcement learning, and passivity-based methods to enable rapid adaptation in robotics, process control, and active sensing.
Experimental evaluations demonstrate improved stability, trajectory recovery, and measurement accuracy even under severe disruptions and system faults.

An Adaptive Re-Acquisition Controller is a control mechanism—spanning robotic systems, cyber-physical process control, and sequential data acquisition—whose core function is to dynamically adjust online behavior or measurement strategies in response to unexpected task variations, distributional or dynamical changes, faults, or information deficits. These controllers typically exploit meta-learned priors, end-to-end differentiable world models, reinforcement learning, or passivity-based mechanisms to "re-acquire" previously established behaviors or performance properties, even after the underlying system or environment has substantially shifted. This class of controllers provides a general framework for re-establishing desired operation in adaptive motor control, fault tolerant automation, and active sensing.

1. Foundational Problem Definitions

Adaptive re-acquisition controllers are formalized in diverse but related paradigms: (1) meta-reinforcement learning over Markov Decision Processes (MDPs) (Daaboul et al., 2022), (2) input-output passivity-based feedback resizing for unknown plants (Zakeri et al., 2019), and (3) Partially Observable Markov Decision Processes (POMDPs) for adaptive signal measurement (Silvestri et al., 10 Jul 2024).

Meta-RL and Trajectory Re-Acquisition: A distribution $\mathcal{T}$ of related MDPs is assumed, each with potentially unknown and shifting dynamics $p(s_{t+1}|s_t, a_t)$ . The controller must, given a reference trajectory $c^*$ or behavior from a prior task, compute action sequences in the current task such that executed state transitions match $c^*$ as closely as possible under new dynamics.
Passivity-Based System Interface: The system $G$ is an unknown, input-output system satisfying passivity or dissipativity bounds. Tasked with maintaining closed-loop stability, the controller identifies deficits in passivity indices online and reconfigures feedback gains to "re-acquire" robust performance, even post-fault or under attack.
Adaptive Sequential Measurement: For adaptive acquisition in inverse problems, the target is the optimal selection (policy) of measurement parameters $a_t$ at each step, such that the cumulative information gain (as measured by signal reconstruction quality or reward) is maximized over a short sequential acquisition horizon, even under unobserved or changing signal priors.

Formally, the adaptive re-acquisition controller is responsible for dynamically modifying control or sensing policy to recover a target objective (trajectory, passivity, or estimation accuracy) after perturbation.

2. Mathematical Architectures and Algorithms

Adaptive re-acquisition controllers utilize a variety of algorithmic constructs, each tailored to their operational setting.

Meta-Adaptation Controller (MAC) (Daaboul et al., 2022): Employs a probabilistic world model $p_\theta(s_{t+1}|s_t, a_t, h)$ with adaptable task embedding $h$ . Meta-learning uses a bi-level MAML/REPTILE approach: after meta-training, online adaptation is performed via inner-loop gradient descent on both $\theta$ and $h$ using recent trajectory data. MAC plans actions to maximize both task reward and trajectory similarity to a reference, via constrained optimization:

$\max_{a_{0:H-1}}\sum_{k=0}^{H-1} [ r(s_k, a_k) + \lambda\;\mathrm{Sim}(s_{k+1}, s^{\mathrm{ref}}_{k+1}) ]$

Solution is performed with the Cross-Entropy Method (CEM).

Passivity-Reacquisition via M-matrix (Zakeri et al., 2019): No explicit state model; instead, online integral estimates are maintained for passivity indices $\nu$ (IFP) and $\rho$ (OFP). Upon detection of a passivity index drop below calibrated thresholds, a static feedback interface $M$ (partitioned $2\times2$ block matrix) is synthesized to inject "missing" passivity into the feedback loop, restoring $L_2$ -gain-stability conditions.
RL-based Adaptive Acquisition (Silvestri et al., 10 Jul 2024): Treats measurement selection as a sequential policy over latent states (via a RNN encoder), with a reconstruction network $f_\theta$ and acquisition policy $\pi_\phi(a_t|z_t)$ . Training merges policy gradient reinforcement learning and supervised reconstruction loss; in the probabilistic variant, joint end-to-end ELBO optimization over latent variables $z_t$ regularizes the policy for distributional robustness.

These architectures prioritize sample efficiency, robust online adaptation, and constraint satisfaction under dynamic and model uncertainty.

3. Training Objectives and Online Adaptation Strategies

Training and adaptation protocols differ by regime but are unified by a feedback-with-reference or recovery-of-property aim.

Meta-RL MAC Training: Alternates inner-loop task adaptation (gradient descent on local data) with outer-loop meta-updates, targeting fast adaptation across unseen tasks during meta-test. At test time, adaptation with a buffer of recent transitions and retrieval of the most probable task embedding allows rapid world model and policy updates.
Passivity Re-Acquisition Routine: Continuously monitors integral-based passivity indices during operation. Fault is diagnosed when online estimates breach thresholds; reconfiguration is triggered only upon persistent, monotonic deficit detection for statistical robustness, followed by matrix update and instantaneous injection of feedback correction—no plant identification or time-consuming redesign required.
Adaptive Acquisition Policy Learning: Reconstruction network $f_\theta$ is optimized on the trajectory-accumulated mean squared error or SSIM, while acquisition policy $\phi$ is updated with policy gradients to maximize incremental improvements in reconstruction (reward). The probabilistic variant maximizes the ELBO, controlling for overfitting and capturing uncertainty in belief over signal structure.

This spectrum of strategies allows for real-time or near-real-time re-acquisition of desired objectives in nonstationary environments.

4. Applications and Experimental Evaluation

Adaptive re-acquisition controllers have been validated in a range of domains:

Robotic Locomotion and Manipulation: MAC has been deployed in MuJoCo environments—"HalfCheetah-disabled," "Ant-disabled," "Ant-gravity"—where the dynamics are abruptly shifted via disabling joints or altering gravity. MAC consistently enables the robot to re-acquire desired walking or movement patterns by planning for similarity to a stored reference, outperforming REPTILE+MPC and FAMLE+MPC baselines in cumulative reward and behavior shape (Daaboul et al., 2022).
Physical Process Control under Faults: The passivity-based controller has been demonstrated on linear systems with actuator time delays, nonlinear system drift, and parameter degradation (e.g., softening spring), showing immediate restoration of stability without redesigning the nominal controller or learning a new plant model. The M-matrix adaptation occurs only upon persistent passivity deficit, ensuring low computational overhead (Zakeri et al., 2019).
Adaptive Sensing and Inverse Problems: RL-trained acquisition controllers select measurement parameters for Gaussian and Radon-type systems, achieving lower average-case reconstruction error and higher perceptual quality in image and CT reconstruction, particularly in the low-measurement regime. Both deterministic and probabilistic (VAE/β-VAE) end-to-end training approaches were successful, with analysis indicating that average-case gains are practically achievable even when theoretical worst-case results suggest otherwise (Silvestri et al., 10 Jul 2024).

These results substantiate the generality and domain-agnostic character of the adaptive re-acquisition paradigm.

5. Theoretical Considerations and Limitations

Similarity Metrics and Trade-off Parameters: Trajectory re-acquisition via MAC requires careful selection of state similarity functions (e.g., cosine similarity) and reward trade-off coefficients $\lambda$ . Variations may lead to divergence from true preferred behavior or suboptimal exploitation.
Complexity and Computational Overhead: CEM-based planning for MAC introduces substantial per-step computational costs. Passivity adaptation is computationally light—requiring only integral updates and rare algebraic corrections.
Model Expressivity and Task Coverage: For meta-learning and RL controllers, success is contingent on the expressivity of the learned world model and the comprehensiveness of meta-training task distributions. Limited coverage or under-parameterized models can result in failure to adapt or recapture target behavior.
Worst-Case Adaptive Sensing Limits: For inverse problems with deterministic adaptive designs, classical gelfand-width theory states worst-case error cannot be improved over non-adaptive random measurements; however, average-case improvements are realized for data-driven, probabilistic, or RL-based acquisition strategies (Silvestri et al., 10 Jul 2024).
Reliance on Measurable Output: Passivity-based controllers require reliable measurements of input and output. While inherently robust to moderate noise (due to integral smoothing), severe sensor failures may still escape detection or defeat the adaptation logic.

Limitations highlight the importance of metric selection, prior coverage, and domain knowledge for controller configuration.

6. Practical Implementation and Hyperparameter Choices

Table: Hyperparameter and Implementation Features Across Domains

Setting	Key Steps / Components	Typical Hyperparameters
Robotic MAC	Meta-train world model, online CEM	$\lambda$ , $H$ , $g$ , $\beta$ , $\alpha$
Passivity	Online integral update, M-matrix update	Detection window $T_0$ , thresholds $\rho_0$ , $\nu_0$
RL Acquisition	GRU encoder/decoder, PG or PPO	Learning rates ( $10^{-3}$ ), batch size (64–128), latent dimension (64–256), discount $\gamma$

Careful selection of trade-off coefficients, adaptation buffer length, and detection thresholds enables correct, robust responses across domains. In process control, integral windowing mitigates measurement drift; in RL acquisition, normalizing measurement actions and tuning $\beta$ improves quality and disentanglement.

7. Significance and Future Directions

Adaptive re-acquisition controllers offer a unified solution to the longstanding challenge of regaining desired operational efficacy after system or task disruption, without full retraining, model identification, or reward reengineering. Their ability to transfer reference behaviors, actively restore passivity, or select maximally informative measurements online, solely from data and meta-learned priors, positions them at the intersection of robust control and learning.

Open research questions include: extension to hierarchical controllers, formalizing trade-offs between reference-trajectory fidelity and practical constraint satisfaction, distributed and multi-agent generalizations, and improved sample efficiency under extreme nonstationarity. A plausible implication is that future adaptive re-acquisition controllers will integrate aspects of meta-learning, system identification, and safety-critical verification to deliver seamless and robust recovery in cyber-physical, robotic, and information-theoretic tasks (Daaboul et al., 2022, Zakeri et al., 2019, Silvestri et al., 10 Jul 2024).