Polynomial sample complexity for learning γ-observable POMDPs without privileged information
Determine whether a learning algorithm for γ-observable POMDPs in the standard reinforcement learning access model (without access to latent state information) can achieve polynomial, rather than quasi-polynomial, sample complexity, matching the improvement obtainable when latent state information is available.
References
cai2024provable show that with latent state information the sample complexity of the algorithm for learning γ-observable POMDPs [golowich2022learning] can be improved from quasi-polynomial to polynomial, though it is an open question whether this is possible without latent state information.
— To Distill or Decide? Understanding the Algorithmic Trade-off in Partially Observable Reinforcement Learning
(2510.03207 - Song et al., 3 Oct 2025) in Appendix, Section "Additional Related Work" (sec:related), Theoretical literature