Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 64 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 30 tok/s Pro
GPT-5 High 35 tok/s Pro
GPT-4o 77 tok/s Pro
Kimi K2 174 tok/s Pro
GPT OSS 120B 457 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

Efficient Unbiased Sparsification (2402.14925v2)

Published 22 Feb 2024 in cs.IT, cs.LG, math.IT, math.ST, and stat.TH

Abstract: An unbiased $m$-sparsification of a vector $p\in \mathbb{R}n$ is a random vector $Q\in \mathbb{R}n$ with mean $p$ that has at most $m<n$ nonzero coordinates. Unbiased sparsification compresses the original vector without introducing bias; it arises in various contexts, such as in federated learning and sampling sparse probability distributions. Ideally, unbiased sparsification should also minimize the expected value of a divergence function $\mathsf{Div}(Q,p)$ that measures how far away $Q$ is from the original $p$. If $Q$ is optimal in this sense, then we call it efficient. Our main results describe efficient unbiased sparsifications for divergences that are either permutation-invariant or additively separable. Surprisingly, the characterization for permutation-invariant divergences is robust to the choice of divergence function, in the sense that our class of optimal $Q$ for squared Euclidean distance coincides with our class of optimal $Q$ for Kullback-Leibler divergence, or indeed any of a wide variety of divergences.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (14)
  1. Y. Zhang, J. Duchi, M. I. Jordan, and M. J. Wainwright, “Information-theoretic lower bounds for distributed statistical estimation with communication constraints,” Advances in Neural Information Processing Systems, pp. 2328 – 2336, 2013.
  2. A. Garg, T. Ma, and H. Nguyen, “On communication cost of distributed statistical estimation and dimensionality,” Advances in Neural Information Processing Systems, pp. 2726 – 2734, 2014.
  3. M. Braverman, A. Garg, T. Ma, and H. L. Nguyen, “Communication lower bounds for statistical estimation problems via a distributed data processing inequality,” Proceedings of the Forty-Eighth Annual ACM Symposium on Theory of Computing, pp. 1011 – 1020, 2016.
  4. L. P. Barnes, Y. Han, and A. Özgür, “Lower bounds for learning distributions under communication constraints via Fisher information,” The Journal of Machine Learning Research, vol. 21, no. 1, pp. 9583 – 9612, 2020.
  5. J. Konečný, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon, “Federated learning: Strategies for improving communication efficiency,” arXiv:1610.492v2, 2017.
  6. P. K. et al., “Advances and open problems in federated learning,” Foundations and Trends in Machine Learning, vol. 14, no. 1-2, pp. 1 – 210, 2021.
  7. L. P. Barnes, H. A. Inan, B. Isik, and A. Özgür, “rTop-k: A statistical estimation approach to distributed SGD,” IEEE Journal on Selected Areas in Information Theory, vol. 1, no. 3, pp. 897 – 907, 2020.
  8. J. Konečný and P. Richtárik, “Randomized distributed mean estimation: accuracy vs. communication,” Frontiers in Applied Mathematics and Statistics, vol. 4, no. 62, 2018.
  9. H. Wang, S. Sievert, Z. Charles, S. Liu, S. Wright, and D. Papailiopoulos, “ATOMO: Communication-efficient learning via atomic sparsification,” in Advances in Neural Information Processing Systems, vol. 31, 2018.
  10. J. Wangni, J. Wang, J. Liu, and T. Zhang, “Gradient sparsification for communication-efficient distributed optimization,” in Advances in Neural Information Processing Systems, vol. 31, 2018.
  11. W. Chen, S. Horváth, and P. Richtárik, “Optimal client sampling for federated learning,” Transactions on Machine Learning Research, 2022.
  12. S. Amari, “Divergence, optimization and geometry,” in Neural Information Processing: 16th International Conference, ICONIP 2009, Part I, ser. Lecture Notes in Computer Science, vol. 5863, C. S. Leung, M. Lee, and J. H. Chan, Eds.   Springer, 2009, pp. 185–193.
  13. R. M. Dudley, “On second derivatives of convex functions,” Mathematica Scandinavica, vol. 41, pp. 159–174, 1977.
  14. M. Ghomi, “The problem of optimal smoothing for convex functions,” Proceedings of the American Mathematical Society, vol. 130, no. 8, pp. 2255–2259, March 2002.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.