Polynomial sample complexity for learning γ-observable POMDPs without privileged information

Determine whether a learning algorithm for γ-observable POMDPs in the standard reinforcement learning access model (without access to latent state information) can achieve polynomial, rather than quasi-polynomial, sample complexity, matching the improvement obtainable when latent state information is available.

Background

The paper reviews results showing that γ-observable POMDPs admit learning algorithms with quasi-polynomial complexity (Golowich et al., 2022).

Cai et al. (2024) showed that when privileged latent state information is available at training time, the sample complexity can be improved from quasi-polynomial to polynomial.

The authors explicitly note that whether such a polynomial sample complexity is attainable without latent state information remains an open question, highlighting a key theoretical gap between learning with and without privileged information.

References

cai2024provable show that with latent state information the sample complexity of the algorithm for learning γ-observable POMDPs [golowich2022learning] can be improved from quasi-polynomial to polynomial, though it is an open question whether this is possible without latent state information.

— To Distill or Decide? Understanding the Algorithmic Trade-off in Partially Observable Reinforcement Learning (2510.03207 - Song et al., 3 Oct 2025) in Appendix, Section "Additional Related Work" (sec:related), Theoretical literature

Polynomial sample complexity for learning γ-observable POMDPs without privileged information

Background

References

Related Problems