Minimal Random Code Learning with Mean-KL Parameterization
Abstract: This paper studies the qualitative behavior and robustness of two variants of Minimal Random Code Learning (MIRACLE) used to compress variational Bayesian neural networks. MIRACLE implements a powerful, conditionally Gaussian variational approximation for the weight posterior $Q_{\mathbf{w}}$ and uses relative entropy coding to compress a weight sample from the posterior using a Gaussian coding distribution $P_{\mathbf{w}}$. To achieve the desired compression rate, $D_{\mathrm{KL}}[Q_{\mathbf{w}} \Vert P_{\mathbf{w}}]$ must be constrained, which requires a computationally expensive annealing procedure under the conventional mean-variance (Mean-Var) parameterization for $Q_{\mathbf{w}}$. Instead, we parameterize $Q_{\mathbf{w}}$ by its mean and KL divergence from $P_{\mathbf{w}}$ to constrain the compression cost to the desired value by construction. We demonstrate that variational training with Mean-KL parameterization converges twice as fast and maintains predictive performance after compression. Furthermore, we show that Mean-KL leads to more meaningful variational distributions with heavier tails and compressed weight samples which are more robust to pruning.
- What is the State of Neural Network Pruning? In Proceedings of Machine Learning and Systems, 2020.
- Compressing Neural Networks with the Hashing Trick. In International Conference on Machine Learning, 2015.
- On the Lambert W𝑊Witalic_W Function. Advances in Computational Mathematics, 1996.
- TensorFlow Distributions. In arXiv:1711.10604, 2017.
- Flamich, G. Greedy Poisson Rejection Sampling. In arXiv:2305.15313, 2023.
- Compressing Images by Encoding their Latent Representations with Relative Entropy Coding. In Advances in Neural Information Processing Systems, 2020.
- Fast Relative Entropy Coding with A* Coding. In International Conference on Machine Learning, 2022.
- Minimal Random Code Learning: Getting Bits Back from Compressed Model Parameters. In International Conference on Learning Representations, 2019.
- β𝛽\betaitalic_β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. In International Conference on Learning Representations, 2017.
- Keeping Neural Networks Simple by Minimizing the Description Length of the Weights. In Conference on Computational Learning Theory, 1993.
- Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies, 2019.
- PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems, 2019.
- Winitzki, S. Uniform Approximations for Transcendental Functions. In Computational Science and Its Applications, 2003.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.