Modify PBVI or sampling-based methods to operate with particle interactive beliefs in I-POMDPs

Determine how to modify the point-based value iteration (PBVI) algorithm or other sampling-based POMDP solvers so they can operate on a particle-based representation of interactive beliefs within interactive partially observable Markov decision processes (I-POMDPs), enabling tractable planning with interactive belief particles.

Background

Interactive POMDPs represent beliefs over both physical states and other agents’ beliefs, which leads to severe computational challenges. Interactive Particle Filtering (I-PF) addresses dimensionality by using particles for interactive beliefs, while PBVI addresses the curse of history in single-agent POMDPs by sampling belief points. However, aligning PBVI’s point-based backups with a particle-based interactive belief representation is non-trivial and not presently established.

The authors highlight the gap: although particle methods exist for interactive beliefs (I-PF), it is not clear how PBVI or related sampling-based algorithms could be adapted to work directly with such particle representations in the interactive setting. Bridging this gap could yield algorithms that alleviate both the belief representation burden and planning complexity.

References

Unfortunately, I-PF fails to address the curse of history and it is not clear how PBVI or other sampling-based algorithm can be modified to work with a particle representation of interactive beliefs, whereas I-PBVI suffers from the curse of dimensionality because its dimension of interactive belief grows exponentially with the length of planning horizon of the other agent (Section~\ref{sect:iap}).