Grain-of-truth for Self-AIXI: constructing a dominating mixture policy

Construct a countable model class of policies \calMpol and a corresponding mixture policy \zeta such that the mixture policy \zeta multiplicatively dominates the Self-AIXI policy \pi_S for all finite histories, thereby satisfying the grain-of-truth property for Self-AIXI.

Background

Self-AIXI is a self-predictive variant of AIXI that performs one-step policy improvement using separate mixture models over environments and policies. For Self-AIXI to converge to AIXI as claimed under additional assumptions, a grain-of-truth condition must hold: the mixture policy must dominate the true Self-AIXI policy, ensuring consistent self-prediction.

The authors note that prior work introducing Self-AIXI left a key gap: it did not prove whether the proposed mixture over policies actually dominates the Self-AIXI policy. As a result, establishing a concrete policy class and mixture that satisfy grain-of-truth for Self-AIXI remains unresolved.

References

Importantly, did not prove whether their proposed mixture policy \zeta does in fact dominate \pi_S, making it still an open problem in the field to construct a model class \calMpol and corresponding mixture policy \zeta satisfying the grain-of-truth property.

Embedded Universal Predictive Intelligence: a coherent framework for multi-agent learning (2511.22226 - Meulemans et al., 27 Nov 2025) in Section 2.4 (Algorithmic probability, Solomonoff induction and AIXI) — Self-AIXI paragraph