Training neural sequence models to approximate Bayesian prediction from Occam’s prior

Develop training methodologies for neural sequence models that robustly approximate Bayesian sequence prediction starting from an Occam’s razor prior, so they can produce accurate prospective forecasts in dynamically changing, non-stationary environments.

Background

A central practical gap highlighted by the authors is bridging the idealized Bayesian predictors in MUPI with real neural sequence models trained via stochastic gradient descent on episodic datasets.

Although some theory suggests SGD can approximate sampling-based Bayesian inference, the authors note that obtaining robust Bayesian predictors from an Occam’s prior—sufficient to support prospective prediction in non-stationary worlds—remains unresolved and constitutes a key step for practical deployment.

References

However, training neural sequence models to robustly approximate Bayesian prediction starting from an Occam's razor prior—thereby ensuring they can make accurate, prospective forecasts in dynamically changing worlds—remains an open problem \citep{bornschein2024transformers,de2024prospective}.

Embedded Universal Predictive Intelligence: a coherent framework for multi-agent learning (2511.22226 - Meulemans et al., 27 Nov 2025) in Section 6 (Discussion) — The connection between MUPI and current foundation models