Efficient Unbiased Sparsification (2402.14925v2)
Abstract: An unbiased $m$-sparsification of a vector $p\in \mathbb{R}n$ is a random vector $Q\in \mathbb{R}n$ with mean $p$ that has at most $m<n$ nonzero coordinates. Unbiased sparsification compresses the original vector without introducing bias; it arises in various contexts, such as in federated learning and sampling sparse probability distributions. Ideally, unbiased sparsification should also minimize the expected value of a divergence function $\mathsf{Div}(Q,p)$ that measures how far away $Q$ is from the original $p$. If $Q$ is optimal in this sense, then we call it efficient. Our main results describe efficient unbiased sparsifications for divergences that are either permutation-invariant or additively separable. Surprisingly, the characterization for permutation-invariant divergences is robust to the choice of divergence function, in the sense that our class of optimal $Q$ for squared Euclidean distance coincides with our class of optimal $Q$ for Kullback-Leibler divergence, or indeed any of a wide variety of divergences.
- Y. Zhang, J. Duchi, M. I. Jordan, and M. J. Wainwright, “Information-theoretic lower bounds for distributed statistical estimation with communication constraints,” Advances in Neural Information Processing Systems, pp. 2328 – 2336, 2013.
- A. Garg, T. Ma, and H. Nguyen, “On communication cost of distributed statistical estimation and dimensionality,” Advances in Neural Information Processing Systems, pp. 2726 – 2734, 2014.
- M. Braverman, A. Garg, T. Ma, and H. L. Nguyen, “Communication lower bounds for statistical estimation problems via a distributed data processing inequality,” Proceedings of the Forty-Eighth Annual ACM Symposium on Theory of Computing, pp. 1011 – 1020, 2016.
- L. P. Barnes, Y. Han, and A. Özgür, “Lower bounds for learning distributions under communication constraints via Fisher information,” The Journal of Machine Learning Research, vol. 21, no. 1, pp. 9583 – 9612, 2020.
- J. Konečný, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon, “Federated learning: Strategies for improving communication efficiency,” arXiv:1610.492v2, 2017.
- P. K. et al., “Advances and open problems in federated learning,” Foundations and Trends in Machine Learning, vol. 14, no. 1-2, pp. 1 – 210, 2021.
- L. P. Barnes, H. A. Inan, B. Isik, and A. Özgür, “rTop-k: A statistical estimation approach to distributed SGD,” IEEE Journal on Selected Areas in Information Theory, vol. 1, no. 3, pp. 897 – 907, 2020.
- J. Konečný and P. Richtárik, “Randomized distributed mean estimation: accuracy vs. communication,” Frontiers in Applied Mathematics and Statistics, vol. 4, no. 62, 2018.
- H. Wang, S. Sievert, Z. Charles, S. Liu, S. Wright, and D. Papailiopoulos, “ATOMO: Communication-efficient learning via atomic sparsification,” in Advances in Neural Information Processing Systems, vol. 31, 2018.
- J. Wangni, J. Wang, J. Liu, and T. Zhang, “Gradient sparsification for communication-efficient distributed optimization,” in Advances in Neural Information Processing Systems, vol. 31, 2018.
- W. Chen, S. Horváth, and P. Richtárik, “Optimal client sampling for federated learning,” Transactions on Machine Learning Research, 2022.
- S. Amari, “Divergence, optimization and geometry,” in Neural Information Processing: 16th International Conference, ICONIP 2009, Part I, ser. Lecture Notes in Computer Science, vol. 5863, C. S. Leung, M. Lee, and J. H. Chan, Eds. Springer, 2009, pp. 185–193.
- R. M. Dudley, “On second derivatives of convex functions,” Mathematica Scandinavica, vol. 41, pp. 159–174, 1977.
- M. Ghomi, “The problem of optimal smoothing for convex functions,” Proceedings of the American Mathematical Society, vol. 130, no. 8, pp. 2255–2259, March 2002.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.