Convergence of nested MDP policy to I-POMDP policy as sensing and actuation accuracy improves
Prove that as observation and transition models become more accurate (i.e., as the parameter ε decreases), the level-0 nested MDP policy for the other agent converges to or closely approximates the exact level-0 I-POMDP policy, thereby formally establishing nested MDP as an effective surrogate for I-POMDP under fine sensing and actuation capabilities.
Sponsor
References
Following from Theorem~\ref{thm:0}, we conjecture that, as $\epsilon$ decreases (i.e., observation and transition models become more accurate), the nested MDP policy ${\pi}{0}_{\mbox{-}t}$ of the other agent is more likely to approximate the exact I-POMDP policy closely.
— Interactive POMDP Lite: Towards Practical Planning to Predict and Exploit Intentions for Interacting with Self-Interested Agents
(1304.5159 - Hoang et al., 2013) in Section 3 Intention-Aware POMDP