Optimal Compression of Unit Norm Vectors in the High Distortion Regime (2307.07941v2)
Abstract: Motivated by the need for communication-efficient distributed learning, we investigate the method for compressing a unit norm vector into the minimum number of bits, while still allowing for some acceptable level of distortion in recovery. This problem has been explored in the rate-distortion/covering code literature, but our focus is exclusively on the "high-distortion" regime. We approach this problem in a worst-case scenario, without any prior information on the vector, but allowing for the use of randomized compression maps. Our study considers both biased and unbiased compression methods and determines the optimal compression rates. It turns out that simple compression schemes are nearly optimal in this scenario. While the results are a mix of new and known, they are compiled in this paper for completeness.
- Vector quantization and signal compression, vol. 159, Springer Science & Business Media, 2012.
- Handbook of data compression, vol. 2, Springer, 2010.
- Khalid Sayood, Introduction to data compression, Morgan Kaufmann, 2017.
- “Advances and open problems in federated learning,” Foundations and Trends® in Machine Learning, vol. 14, no. 1, pp. 1–210, 2021.
- “Federated learning: Strategies for improving communication efficiency,” arXiv preprint arXiv:1610.05492, 2016.
- “Communication-efficient learning of deep networks from decentralized data,” in Artificial intelligence and statistics. PMLR, 2017, pp. 1273–1282.
- “QSGD: Communication-efficient SGD via gradient quantization and encoding,” in Advances in Neural Information Processing Systems, 2017, pp. 1709–1720.
- “Sparsified SGD with memory,” in Advances in Neural Information Processing Systems, 2018, pp. 4447–4458.
- “Error feedback fixes SignSGD and other gradient compression schemes,” in International Conference on Machine Learning, 2019, pp. 3252–3261.
- “Communication-efficient and byzantine-robust distributed learning with error feedback,” IEEE Journal on Selected Areas in Information Theory, vol. 2, no. 3, pp. 942–953, 2021.
- “Gradient sparsification for communication-efficient distributed optimization,” in Advances in Neural Information Processing Systems, 2018, pp. 1299–1309.
- “vqsgd: Vector quantized stochastic gradient descent,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2021, pp. 2197–2205.
- Toby Berger, “Rate-distortion theory,” Wiley Encyclopedia of Telecommunications, 2003.
- Covering codes, Elsevier, 1997.
- “Breaking the communication-privacy-accuracy trilemma,” Advances in Neural Information Processing Systems, vol. 33, pp. 3312–3324, 2020.
- “The fundamental price of secure aggregation in differentially private federated learning,” in International Conference on Machine Learning. PMLR, 2022, pp. 3056–3089.
- Roman Vershynin, “Introduction to the non-asymptotic analysis of random matrices,” arXiv preprint arXiv:1011.3027, 2010.
- “Terngrad: Ternary gradients to reduce communication in distributed deep learning,” in Advances in Neural Information Processing Systems, 2017, pp. 1509–1519.
- “Zipml: Training linear models with end-to-end low precision, and a little bit of deep learning,” in International Conference on Machine Learning, 2017, pp. 4035–4043.
- “Randomized distributed mean estimation: Accuracy vs. communication,” Frontiers in Applied Mathematics and Statistics, vol. 4, no. 62, 2018.
- “Ratq: A universal fixed-length quantizer for stochastic optimization,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2020, pp. 1399–1409.
- “Limits on gradient compression for stochastic optimization,” in 2020 IEEE International Symposium on Information Theory (ISIT). IEEE, 2020, pp. 2658–2663.
- “Distributed mean estimation with limited communication,” in International conference on machine learning. PMLR, 2017, pp. 3329–3337.
- Sebastian U Stich, “Local SGD converges fast and communicates little,” in International Conference on Learning Representations, 2019.
- “Scaffold: Stochastic controlled averaging for federated learning,” in International Conference on Machine Learning. PMLR, 2020, pp. 5132–5143.
- “Communication efficient distributed optimization using an approximate newton-type method,” CoRR, vol. abs/1312.7853, 2013.
- “Giant: Globally improved approximate newton method for distributed optimization,” 2017.
- “Distributed newton can communicate less and resist byzantine workers,” in Proceedings of the 34th International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 2020, NIPS’20, Curran Associates Inc.
- “Escaping saddle points in distributed newton’s method with communication efficiency and byzantine resilience,” CoRR, vol. abs/2103.09424, 2021.