From pixels to planning: scale-free active inference (2407.20292v1)

Published 27 Jul 2024 in cs.LG and q-bio.NC

Abstract: This paper describes a discrete state-space model -- and accompanying methods -- for generative modelling. This model generalises partially observed Markov decision processes to include paths as latent variables, rendering it suitable for active inference and learning in a dynamic setting. Specifically, we consider deep or hierarchical forms using the renormalisation group. The ensuing renormalising generative models (RGM) can be regarded as discrete homologues of deep convolutional neural networks or continuous state-space models in generalised coordinates of motion. By construction, these scale-invariant models can be used to learn compositionality over space and time, furnishing models of paths or orbits; i.e., events of increasing temporal depth and itinerancy. This technical note illustrates the automatic discovery, learning and deployment of RGMs using a series of applications. We start with image classification and then consider the compression and generation of movies and music. Finally, we apply the same variational principles to the learning of Atari-like games.

Citations (4)

View on Semantic Scholar

Summary

The paper presents a hierarchical model using renormalisation group principles to generalize POMDPs for efficient learning and inference.
It demonstrates compositionality and scale-invariance that enable robust performance in image classification, video generation, and game planning.
Numerical results, such as 95.1% accuracy on MNIST, validate the model’s efficiency in lossless compression and sequential decision-making.

Renormalising Generative Models: An Overview

The paper, "Renormalising Generative Models," presents a detailed exploration of discrete state-space models using principles from the renormalisation group to develop generative models suitable for classification, compression, generation, prediction, and planning tasks. These models generalize partially observed Markov decision processes (POMDPs) to a hierarchical architecture where paths serve as latent variables. The renormalising generative models (RGMs) introduced can be seen as discrete analogues of deep convolutional neural networks (CNNs) and continuous state-space models within generalized coordinates of motion.

Key Contributions and Findings

The authors contribute several notable innovations and insights, including:

Hierarchical Model Structuring: By leveraging renormalisation group principles, this paper introduces a method to model dynamics in discrete state-spaces that generalize POMDPs. The models demonstrate how states and paths at one level can generate conditions and transitions at lower levels, forming a recursive structure conducive to efficient learning and inference.
Compositionality and Scale-Invariance: The RGMs developed allow for the learning of compositionality across space and time, leading to models that can autonomously discover and deploy hierarchical structures. This scale-free property is illustrated through applications spanning image classification, video and music generation, and game planning.
Practical Applications: The models were validated across various domains. For example, in image classification using the MNIST dataset, the model achieved high accuracy with minimal training data by leveraging hierarchical structures for lossless compression. In video compression, it demonstrated the generation and recognition of complex sequential data, such as birds in flight, by encoding temporal dependencies. Moreover, in planning tasks exemplified by Atari-like games, the model exhibited capabilities for planning as inference by learning optimal paths through state-action spaces.

Numerical Results and Bold Claims

Several numerical results substantiate the efficacy of RGMs:

Image Classification: The models achieved a classification accuracy of 95.1% on the MNIST dataset while ensuring highly efficient learning with incremental improvements in ELBO (evidence lower bound) throughout the training process.
Video Generation: Demonstrations showed the model's ability to generate extended sequences after compressing video input to a reduced set of states and paths.
Game Planning: In tasks such as Atari's Pong and Breakout, RGMs learned compressed representations of rewarded paths, facilitating expert play and adaptive learning to variations in game scenarios.

Theoretical and Practical Implications

The theoretical importance of this work lies in its novel application of the renormalisation group to discrete state-space models, pushing the boundaries of active inference frameworks. Practically, this development has significant implications for AI, where the ability to autonomously assemble and learn hierarchical models can lead to more efficient and adaptable systems. These models can potentially transform areas such as autonomous robotics, where real-time decision-making and adaptability are crucial.

Future Directions

Potential future developments stemming from this research may focus on enhancing the scalability and applicability of RGMs to more complex environments. Investigating further into continuous-discrete hybrid models could provide a more nuanced bridge between theoretical robustness and practical flexibility. Additionally, extending the current framework to accommodate richer types of sensory input and more complex interaction patterns promises to broaden the horizons for applications in AI, particularly in the realms of automated reasoning, interactive systems, and advanced robotics.

Conclusion

The paper introduces a meticulously designed approach to generative modeling using discrete state-space constructs and renormalisation group methods. The compelling numerical results and practical applications across various domains highlight the robustness and adaptability of RGMs. This approach brings forward significant advancements in the efficient and scalable design of generative models, setting a strong foundation for future research and practical innovations in AI and beyond.

PDF Markdown

Follow-up Questions

Related Papers

Authors (10)

Tweets

https://twitter.com/EMostaque/status/1820437680116613207

https://twitter.com/GReal1111/status/1840122142014484761

https://twitter.com/InferenceActive/status/1850997959594569898

https://twitter.com/tariqrauf/status/1820442884979773628

https://twitter.com/finformer_net/status/1893453148800311516

https://twitter.com/DeductiveEngine/status/1820566053447201268