Validity of the sensibly off-policy condition for mixture universes

Prove that the sensibly off-policy condition—bounding the expected optimality gap when taking an arbitrary off-policy action compared to sampling from the self-model—holds for specified model classes and their associated mixture universes used by embedded Bayesian agents with finite planning horizons.

Background

In analyzing convergence of k-step planner embedded Bayesian agents, the authors employ a "sensibly off-policy" assumption: taking an arbitrary off-policy action should not significantly worsen the expected optimality gap relative to sampling from the self-model. This assumption is used to derive convergence to subjective correlated equilibria.

However, the authors point out that it remains unproven for general model classes and mixtures, and they further show that the condition is not satisfied for Solomonoff mixture models, underscoring the need for a rigorous characterization of when the condition holds.

References

It remains an open problem in the literature to prove that this condition is satisfied for certain model classes and corresponding mixture models.

Embedded Universal Predictive Intelligence: a coherent framework for multi-agent learning (2511.22226 - Meulemans et al., 27 Nov 2025) in Section 4.5 (Embedded Bayesian agents with finite planning horizons)