Validity of the sensibly off-policy condition for mixture universes
Prove that the sensibly off-policy condition—bounding the expected optimality gap when taking an arbitrary off-policy action compared to sampling from the self-model—holds for specified model classes and their associated mixture universes used by embedded Bayesian agents with finite planning horizons.
Sponsor
References
It remains an open problem in the literature to prove that this condition is satisfied for certain model classes and corresponding mixture models.
— Embedded Universal Predictive Intelligence: a coherent framework for multi-agent learning
(2511.22226 - Meulemans et al., 27 Nov 2025) in Section 4.5 (Embedded Bayesian agents with finite planning horizons)