Papers
Topics
Authors
Recent
2000 character limit reached

Minimal Random Code Learning with Mean-KL Parameterization

Published 15 Jul 2023 in cs.LG and stat.ML | (2307.07816v2)

Abstract: This paper studies the qualitative behavior and robustness of two variants of Minimal Random Code Learning (MIRACLE) used to compress variational Bayesian neural networks. MIRACLE implements a powerful, conditionally Gaussian variational approximation for the weight posterior $Q_{\mathbf{w}}$ and uses relative entropy coding to compress a weight sample from the posterior using a Gaussian coding distribution $P_{\mathbf{w}}$. To achieve the desired compression rate, $D_{\mathrm{KL}}[Q_{\mathbf{w}} \Vert P_{\mathbf{w}}]$ must be constrained, which requires a computationally expensive annealing procedure under the conventional mean-variance (Mean-Var) parameterization for $Q_{\mathbf{w}}$. Instead, we parameterize $Q_{\mathbf{w}}$ by its mean and KL divergence from $P_{\mathbf{w}}$ to constrain the compression cost to the desired value by construction. We demonstrate that variational training with Mean-KL parameterization converges twice as fast and maintains predictive performance after compression. Furthermore, we show that Mean-KL leads to more meaningful variational distributions with heavier tails and compressed weight samples which are more robust to pruning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (13)
  1. What is the State of Neural Network Pruning? In Proceedings of Machine Learning and Systems, 2020.
  2. Compressing Neural Networks with the Hashing Trick. In International Conference on Machine Learning, 2015.
  3. On the Lambert W𝑊Witalic_W Function. Advances in Computational Mathematics, 1996.
  4. TensorFlow Distributions. In arXiv:1711.10604, 2017.
  5. Flamich, G. Greedy Poisson Rejection Sampling. In arXiv:2305.15313, 2023.
  6. Compressing Images by Encoding their Latent Representations with Relative Entropy Coding. In Advances in Neural Information Processing Systems, 2020.
  7. Fast Relative Entropy Coding with A* Coding. In International Conference on Machine Learning, 2022.
  8. Minimal Random Code Learning: Getting Bits Back from Compressed Model Parameters. In International Conference on Learning Representations, 2019.
  9. β𝛽\betaitalic_β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. In International Conference on Learning Representations, 2017.
  10. Keeping Neural Networks Simple by Minimizing the Description Length of the Weights. In Conference on Computational Learning Theory, 1993.
  11. Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies, 2019.
  12. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems, 2019.
  13. Winitzki, S. Uniform Approximations for Transcendental Functions. In Computational Science and Its Applications, 2003.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.