Efficient and Provably Convergent Computation of Information Bottleneck: A Semi-Relaxed Approach (2404.04862v1)
Abstract: Information Bottleneck (IB) is a technique to extract information about one target random variable through another relevant random variable. This technique has garnered significant interest due to its broad applications in information theory and deep learning. Hence, there is a strong motivation to develop efficient numerical methods with high precision and theoretical convergence guarantees. In this paper, we propose a semi-relaxed IB model, where the Markov chain and transition probability condition are relaxed from the relevance-compression function. Based on the proposed model, we develop an algorithm, which recovers the relaxed constraints and involves only closed-form iterations. Specifically, the algorithm is obtained by analyzing the Lagrangian of the relaxed model with alternating minimization in each direction. The convergence property of the proposed algorithm is theoretically guaranteed through descent estimation and Pinsker's inequality. Numerical experiments across classical and discrete distributions corroborate the analysis. Moreover, our proposed algorithm demonstrates notable advantages in terms of computational efficiency, evidenced by significantly reduced run times compared to existing methods with comparable accuracy.
- N. Tishby, F. C. Pereira, and W. Bialek, “The Information Bottleneck Method,” Proc. 37th Annual Allerton Conference on Communications, Control and Computing, 1999.
- T. Cover, J. Thomas, and J. Wiley, “Elements of Information Theory,” Tsinghua University Pres, 2003.
- R. Gilad-Bachrach, A. Navot, and N. Tishby, “An information theoretic tradeoff between complexity and accuracy,” in Learning Theory and Kernel Machines: 16th Annual Conference on Learning Theory and 7th Kernel Workshop, COLT/Kernel 2003, Washington, DC, USA, August 24-27, 2003. Proceedings. Springer, 2003, pp. 595–609.
- S. Hassanpour, T. Monsees, D. Wübben, and A. Dekorsy, “Forward-Aware Information Bottleneck-Based Vector Quantization for Noisy Channels,” IEEE Transactions on Communications, vol. 68, no. 12, pp. 7911–7926, 2020.
- D. Chen and V. Kuehn, “Alternating Information Bottleneck Optimization for Weighted Sum Rate and Resource Allocation in the Uplink of C-RAN,” in WSA 2016; 20th International ITG Workshop on Smart Antennas. VDE, 2016, pp. 1–7.
- A. Zaidi, I. E. Aguerri, and S. Shamai, “On the Information Bottleneck Problems: Models, Connections, Applications and Information Theoretic Views,” CoRR, vol. abs/2002.00008, 2020.
- O. Shamir, S. Sabato, and N. Tishby, “Learning and Generalization with the Information Bottleneck,” Theoretical Computer Science, vol. 411, no. 29–30, pp. 2696–2711, 2010.
- R. Shwartz-Ziv and N. Tishby, “Opening the Black Box of Deep Neural Networks via Information,” arXiv preprint arXiv:1703.00810, 2017.
- A. A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy, “Deep Variational Information Bottleneck,” in Proc. 5th International Conference on Learning Representations (ICLR), Toulon, France, Apr. 2017, pp. 1–5.
- A. Kolchinsky, B. D. Tracey, and S. Van Kuyk, “Caveats for Information Bottleneck in Deterministic Scenarios,” in Proc. 7th International Conference on Learning Representations (ICLR), New Orleans, Louisiana, USA, May 2019, pp. 1–23.
- T. Wu and I. Fischer, “Phase transitions for the information bottleneck in representation learning,” in International Conference on Learning Representations, 2019.
- T.-H. Huang and A. El Gamal, “A Provably Convergent Information Bottleneck Solution via ADMM,” 2021 IEEE International Symposium on Information Theory (ISIT), pp. 43–48, 2021.
- S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein et al., “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends® in Machine learning, vol. 3, no. 1, pp. 1–122, 2011.
- L. Chen, S. Wu, W. Ye, H. Wu, H. Wu, W. Zhang, B. Bai, and Y. Sun, “Information Bottleneck Revisited: Posterior Probability Perspective with Optimal Transport,” in 2023 IEEE International Symposium on Information Theory (ISIT), Taipei, Taiwan, China, Jun. 2023, pp. 1490–1495.
- S. Wu, W. Ye, H. Wu, H. Wu, W. Zhang, and B. Bai, “A Communication Optimal Transport Approach to the Computation of the Rate Distortion Functions,” Proc. 2023 IEEE Information Theory Workshop (ITW), 2023.
- M. Cuturi, “Sinkhorn Distances: Lightspeed Computation of Optimal Transport,” Advances in Neural Information Processing Systems, vol. 26, 2013.
- R. A. Amjad and B. C. Geiger, “Learning Representations for Neural Network-based Classification Using the Information Bottleneck Principle,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 9, pp. 2225–2239, Apr. 2019.
- Z. Goldfeld and Y. Polyanskiy, “The Information Bottleneck Problem and Its Applications in Machine Learning,” IEEE Journal on Selected Areas in Information Theory, vol. 1, no. 1, pp. 19–38, 2020.
- F. Bayat and S. Wei, “Information bottleneck problem revisited,” in 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE, 2019, pp. 40–47.
- J.-D. Benamou, G. Carlier, M. Cuturi, L. Nenna, and G. Peyré, “Iterative bregman projections for regularized transportation problems,” SIAM Journal on Scientific Computing, vol. 37, no. 2, pp. A1111–A1138, 2015.
- C. L. Blake, “UCI Repository of Machine Learning Databases,” http://www. ics. uci. edu/~ mlearn/MLRepository. html, 1998.
- Lingyi Chen (7 papers)
- Shitong Wu (11 papers)
- Jiachuan Ye (2 papers)
- Huihui Wu (16 papers)
- Wenyi Zhang (82 papers)
- Hao Wu (625 papers)