Generalized Information Gathering Under Dynamics Uncertainty

Published 29 Jan 2026 in cs.LG, cs.AI, cs.MA, cs.RO, and eess.SY | (2601.21988v1)

Abstract: An agent operating in an unknown dynamical system must learn its dynamics from observations. Active information gathering accelerates this learning, but existing methods derive bespoke costs for specific modeling choices: dynamics models, belief update procedures, observation models, and planners. We present a unifying framework that decouples these choices from the information-gathering cost by explicitly exposing the causal dependencies between parameters, beliefs, and controls. Using this framework, we derive a general information-gathering cost based on Massey's directed information that assumes only Markov dynamics with additive noise and is otherwise agnostic to modeling choices. We prove that the mutual information cost used in existing literature is a special case of our cost. Then, we leverage our framework to establish an explicit connection between the mutual information cost and information gain in linearized Bayesian estimation, thereby providing theoretical justification for mutual information-based active learning approaches. Finally, we illustrate the practical utility of our framework through experiments spanning linear, nonlinear, and multi-agent systems.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper introduces a causal graphical model that decouples dynamics, belief updates, and planning objectives to derive a unified directed information-based cost.
It demonstrates accelerated parameter learning and reduced model generalization error through extensive validation in both single- and multi-agent systems.
The approach bridges active information gathering with Bayesian estimation, showing equivalence to mutual information criteria under linear-Gaussian conditions.

Generalized Information Gathering Under Dynamics Uncertainty

Introduction and Motivation

The paper "Generalized Information Gathering Under Dynamics Uncertainty" (2601.21988) presents a comprehensive and modular framework for active information gathering in dynamical systems subject to parametric uncertainty. The central contribution is an explicit causal graphical model that decouples agent modeling choices (e.g., dynamics class, belief update, observation model, planner architecture) from the information-gathering objective. This formalism enables derivation of a unified, model-agnostic information-gathering cost based on Massey’s directed information, with proofs that widely used mutual information criteria emerge as limiting cases. Theoretical results are coupled with empirical validation in both single- and multi-agent scenarios, demonstrating accelerated parameter learning and improved generalization with direct algorithmic implementations.

Modular Causal Framework for Belief Evolution and Planning

The authors formalize agent-environment interaction under uncertainty as iterations of planning and model adaptation within a stochastic, Markovian system. At each timestep, the agent selects controls to minimize a generic cost function, incorporating both task objectives and information-gathering terms. Parameter uncertainty is maintained in beliefs, which are recursively updated based on observations and executed controls, using a generic belief updater (e.g., EKF, particle filter, gradient descent).

The framework’s essential structure is made explicit by a causal graphical model:

Figure 1: Causal graph illustrating relationships between true parameters, states, observations, controls, beliefs, and noise during planning and filtering; predictions leverage the latest available parameter beliefs due to unobservability of ground-truth parameters.

The separation of model dynamics, belief update rules, and planning objectives allows for arbitrary choices of modeling architecture without altering the causal semantics or requiring problem-specific derivations of information criteria.

Directed Information as a Unified Information-Gathering Objective

Recognizing that existing formulations typically bias exploration by maximizing future mutual information (MI) between parameters and observation sequences, the authors critique these methods for ignoring causal dependencies intrinsic to sequential decision making. Instead, they advocate for a causally conditioned metric—Massey’s directed information—which quantifies expected information transfer from parameters (or beliefs) to future observations, aggregated over causal trajectories governed by the agent’s actions.

The directed information-based cost is given as:

$J_{\text{info}}(\hat{o}, u, \hat{d}) = -I(\hat{d} \rightarrow \hat{o} \parallel u) = - \sum_{i=t}^{t+T-1} \left[ h(\hat{o}_{i+1} \mid \hat{o}_{t:i}, u_{t:i}) - h(\hat{o}_{i+1} \mid \hat{o}_i, u_i, \hat{d}_i) \right],$

where $\hat{d}$ is the sequence of predicted beliefs and $\hat{o}$ the sequence of predicted observations. This objective is agnostic to the form of the dynamics, noise structure, belief update, and observation model, provided only first-principles assumptions of Markovian structure with additive noise.

A formal theorem establishes that under common special cases—full observability, linearized-Gaussian beliefs, static parameters—the directed information cost reduces to the sum of per-timestep conditional mutual information terms ubiquitously used in the literature.

Explicit Theoretical Connection: Information-Gathering and Bayesian Estimation

The integration of the information-gathering cost within a sequential Bayesian estimation context is made explicit via analysis of the Extended Kalman Filter (EKF) as the belief updater. In the linearized regime and under Gaussian noise, maximizing directed (and thus mutual) information is provably equivalent to maximizing the information gain (i.e., log-determinant reduction in covariance) in the parameter estimates. This clarity substantiates the use of mutual information-based exploration penalties as directly compatible with concrete, well-characterized online estimation procedures such as the EKF, and by extension, gradient-based or sample-based Bayesian filters.

Practical Instantiation in Single- and Multi-Agent Systems

Empirical evaluation leverages the modular framework to instantiate a variety of scenarios:

Single-Agent, Linear System: Double integrator dynamics with unknown transition matrices; results indicate parameter error and covariance trace decrease markedly faster under high information-seeking weight.
Single-Agent, Nonlinear System: Damped pendulum, operating with unknown parameters and partial/nonlinear observations, showing robust generalization to held-out trajectories after information-driven exploration.
Multi-Agent, Linear/Nonlinear Adversaries: Two-agent pursuit-evasion games, with the unknown opponent’s policy specified either as an LQR controller with unknown weights or as a differentiable MPC with unknown cost parameters. In all cases, the same framework and cost function apply without modification, underscoring its expressive generality.

Figure 2: Schematic of the double integrator system used in experimental validation, facilitating direct examination of parameter learning under linear dynamics uncertainty.

Across all cases, strong numerical evidence is provided that active information gathering—via the proposed directed information cost—substantially accelerates parameter identification and reduces model generalization error, compared to random or task-only policy baselines. Notably, even in the presence of complex or black-box opponent strategies encoded as differentiable optimization procedures, the framework is directly applicable.

Theoretical and Practical Implications

The framework’s decoupling of modeling and planning choices yields both theoretical and algorithmic flexibility. Practitioners and theorists can deploy arbitrary differentiable or probabilistic models and belief update rules—ranging from classical Bayesian updates, kernel-based methods, to neural parameterizations—without needing to derive new exploration costs specific to their model class. The formal equivalence demonstrated between directed information and mutual information-based exploration in the linear-Gaussian setting provides a useful justification and an analytic baseline.

Future extensions may include:

Direct instantiation for high-dimensional or nonparametric model classes, with sampling-based or variational approximations of the directed information objective.
Integration into hierarchical or deep reinforcement learning settings, where the agent’s policy and belief models can be co-optimized for generalization and rapid adaptation in nonstationary environments.
Broader application to multi-agent competitive/cooperative scenarios, leveraging the direct modeling of causal dependencies for policy inference and opponent modeling.

Conclusion

This work provides a principled, general-purpose framework for active information gathering under model uncertainty in sequential decision-making systems (2601.21988). By making the causal structure of beliefs, parameters, controls, and observations explicit and by deriving a unified directed information-based exploration criterion, the framework eliminates the need for model-specific information costs, supports diverse modeling and estimation choices, and is empirically validated to yield rapid parameter learning and robust model generalization across domains. The theoretical bridge established with Bayesian estimation procedures positions the approach as a foundation for future information-aware modeling and active estimation research.

Markdown Report Issue