Papers
Topics
Authors
Recent
Search
2000 character limit reached

Adversarial Latent-State Training for Robust Policies in Partially Observable Domains

Published 7 Mar 2026 in cs.LG, cs.AI, and stat.ML | (2603.07313v1)

Abstract: Robustness under latent distribution shift remains challenging in partially observable reinforcement learning. We formalize a focused setting where an adversary selects a hidden initial latent distribution before the episode, termed an adversarial latent-initial-state POMDP. Theoretically, we prove a latent minimax principle, characterize worst-case defender distributions, and derive approximate best-response certificates with finite-sample guarantees, providing formal meaning to empirical training diagnostics. Empirically, using a Battleship benchmark, we demonstrate that targeted exposure to shifted latent distributions reduces average robustness gaps between Spread and Uniform distributions from 10.3 to 3.1 shots at equal budget. Furthermore, iterative best-response training exhibits budget-sensitive behavior entirely consistent with our approximate certificate theory. Ultimately, we show that for latent-initial-state problems, our framework yields precise diagnostic principles and confirms that structured adversarial exposure effectively mitigates worst-case vulnerabilities.

Authors (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.