Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making

Published 15 May 2026 in cs.LG and cs.AI | (2605.16054v1)

Abstract: Recent work has framed decision-making as a sequence modeling problem using generative models such as diffusion models. Although promising, these approaches often overlook latent factors that exhibit evolving dynamics, elements that are fundamental to environment transitions, reward structures, and high-level agent behavior. Explicitly modeling these hidden processes is essential for both precise dynamics modeling and effective decision-making. In this paper, we propose a unified framework that explicitly incorporates latent dynamic inference into generative decision-making from minimal yet sufficient observations. We theoretically show that under mild conditions, the latent process can be identified from small temporal blocks of observations. Building on this insight, we introduce Ada-Diffuser, a causal diffusion model that learns the temporal structure of observed interactions and the underlying latent dynamics simultaneously, and furthermore, leverages them for planning and control. With a modular design, Ada-Diffuser supports both planning and policy learning tasks, enabling adaptation to latent variations in dynamics, rewards, and latent actions. Experiments on simulated control and robotic benchmarks demonstrate its effectiveness in accurate latent inference and adaptive policy learning.

Abstract PDF Upgrade to Chat

Authors (9)

Summary

The paper introduces a latent-aware adaptive diffusion framework that infers time-varying latent contexts from minimal observations for improved decision-making.
It deploys a two-stage method combining a VAE-based latent inference module with an autoregressive, noise-refined diffusion process to boost planning and control.
Empirical results across robotic and control tasks demonstrate robust performance, consistently outpacing baselines in both explicit and implicit latent settings.

Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making

Introduction and Motivation

Generative trajectory models, particularly those leveraging diffusion-based architectures, have shown strong performance for planning and control in sequential decision-making domains. However, these approaches are typically predicated on full observability, or otherwise rely on high-capacity encoders to implicitly integrate partial observability. This paradigm frequently neglects latent factors with temporal dynamics—variables such as evolving environmental forces or shifting reward objectives—which are fundamental to many real-world problems, including robotics, autonomous systems, and healthcare. Inadequate modeling of temporally evolving latent processes leads to suboptimal outcomes, particularly in settings with partial observability or task non-stationarity.

The paper "Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making" (2605.16054) addresses this limitation directly. It formulates a theoretical foundation for latent context identifiability using only minimal local temporal observations, introduces a principled latent-inference-augmented diffusion framework, and demonstrates empirical efficacy on diverse robotic and control tasks with both explicit and implicit latent factors.

Theoretical Foundations: Latent Identifiability from Minimal Observations

The study builds on a formalization of contextual Markov Decision Processes (MDPs) with a time-evolving latent variable, generalizing self-contained MDPs and POMDPs. The authors prove that, under weak and empirically verified conditions, the latent context at any time step is block-identifiable—it can be inferred from a small block of consecutive observations (as few as four time steps, given certain spectral separability and variability conditions defined at the operator level).

Key technical assumptions include:

Distributional Variability: Transition dynamics must be distinguishable under different latent contexts, formalized via the injectivity of associated Markov operators.
Spectral Uniqueness: Second-order variation in observed state transitions (quantified via the ratio $k$ of transition probabilities) must be non-degenerate across different contexts.

The resulting identifiability theorem asserts that the latent posterior $p(C_t | X_{t-2:t+1})$ is recoverable up to an invertible transformation, which suffices for downstream policy or planning applications.

Ada-Diffuser Framework

Building on these theoretical insights, Ada-Diffuser introduces a two-stage, modular architecture:

Latent Factor Identification (Stage 1): A lightweight VAE-based module, trained on short temporal blocks, infers the latent context from partial trajectories using variational inference. The prior and posterior latent distributions are sequentially modeled, with the latter incorporating future observations for better identifiability.
Causal Diffusion Model (Stage 2): An autoregressive, latent-augmented diffusion process over trajectories, where the noise schedule increases temporally to mirror intrinsic generative uncertainty. A novel denoise-then-refine scheme iteratively alternates between denoising observations and refining latent estimates, enabling posterior alignment even in the absence of inaccessible future context during online inference. Zig-zag sampling alternates between latent-conditional denoising and context updates to maintain fidelity to both trajectory and latent dynamics.

This design supports both planning (generating full state-action sequences) and policy learning (single or multi-step action generation), handling diverse settings such as:

Latent contexts modulating environment dynamics or rewards,
Action-free demonstration imitation with latent actions,
Environments lacking explicit latent factors, where latent inference captures process stochasticity.

Empirical Evaluation: Quantitative Results and Ablations

Latent Identification

Linear probing and $R^2$ analyses confirm that Ada-Diffuser reliably infers temporally varying latent factors with high accuracy, provided the observation block length aligns with the identifiability theory. Ablations demonstrate failure with insufficient future context and performance degradation with excessively large blocks due to optimization difficulties.

Planning and Policy Learning

Ada-Diffuser is benchmarked on MuJoCo locomotion (Cheetah, Ant, Walker), robot navigation (Maze2D), complex manipulation (Franka-Kitchen), and both action-free and standard settings using datasets such as RoboMimic and LIBERO.

Strong empirical claims include:

Consistent outperformance across 8 environments and 23 test settings, including both explicit and implicit latent factors, relative to baselines such as Diffuser, Diffusion Policy (DP), LDCQ, and latent-context-augmented variants (MetaDiffuser, LILAC, DynaMITE).
Robust performance even in settings where latent factors are not explicitly defined, indicating Ada-Diffuser's capacity for implicit Bayesian filtering over process stochasticity.
High-fidelity latent action inference for planning from action-free demonstrations, with notable improvements in complex robotic manipulation over state-of-the-art LDP.
Long-horizon stability: Ablation of latent modeling and autoregressive denoising modules shows that both the refinement and zig-zag sampling components are critical for accurate latent recovery and, consequently, decision performance. Disabling either leads to substantial degradation, with backward refinement being particularly vital.

Scalability and Overhead

The model introduces only moderate computational overhead (≈20–30%) relative to vanilla diffusion counterparts, with possible further acceleration via parallelization techniques (e.g., Picard iteration).

Practical and Theoretical Implications

Ada-Diffuser establishes a theoretically grounded and practically effective bridge between generative trajectory modeling and explicit causal inference over latent dynamics. Its modularity and robustness make it applicable across a broad spectrum of sequential decision-making tasks. For practitioners in robotics, autonomous systems, and imitation learning, Ada-Diffuser offers a scalable path to more adaptive, context-aware, and robust models, particularly in environments characterized by partial observability, dynamic non-stationarity, or unmodeled latent variation.

Theoretically, the identifiability result provides a clear recipe for minimal observation requirements and informs blockwise training strategies for latent-aware generative planners broadly.

Future Prospects

The explicit incorporation of latent process inference into decision-making is expected to become increasingly critical as embodied and autonomous agents operate in richer, less structured environments. Potential directions include extension to multi-agent settings with unobserved coordination, hierarchical policy abstractions, and further integration with large-scale video/action models via autoregressive diffusion.

Conclusion

Ada-Diffuser provides a unified, theoretically justified approach to integrating latent dynamic inference with diffusion-based sequential decision-making. Empirical evaluation confirms substantial and consistent performance improvements across diverse environments, tasks, and benchmark datasets. The modularity, practical efficiency, and general applicability suggest that Ada-Diffuser is a strong architectural foundation for future research into context-aware, generative decision-making models (2605.16054).

Markdown Report Issue