RS-DGC: Exploring Neighborhood Statistics for Dynamic Gradient Compression on Remote Sensing Image Interpretation (2312.17530v1)
Abstract: Distributed deep learning has recently been attracting more attention in remote sensing (RS) applications due to the challenges posed by the increased amount of open data that are produced daily by Earth observation programs. However, the high communication costs of sending model updates among multiple nodes are a significant bottleneck for scalable distributed learning. Gradient sparsification has been validated as an effective gradient compression (GC) technique for reducing communication costs and thus accelerating the training speed. Existing state-of-the-art gradient sparsification methods are mostly based on the "larger-absolute-more-important" criterion, ignoring the importance of small gradients, which is generally observed to affect the performance. Inspired by informative representation of manifold structures from neighborhood information, we propose a simple yet effective dynamic gradient compression scheme leveraging neighborhood statistics indicator for RS image interpretation, termed RS-DGC. We first enhance the interdependence between gradients by introducing the gradient neighborhood to reduce the effect of random noise. The key component of RS-DGC is a Neighborhood Statistical Indicator (NSI), which can quantify the importance of gradients within a specified neighborhood on each node to sparsify the local gradients before gradient transmission in each iteration. Further, a layer-wise dynamic compression scheme is proposed to track the importance changes of each layer in real time. Extensive downstream tasks validate the superiority of our method in terms of intelligent interpretation of RS images. For example, we achieve an accuracy improvement of 0.51% with more than 50 times communication compression on the NWPU-RESISC45 dataset using VGG-19 network.
- J. M. Haut, M. E. Paoletti, S. Moreno-Álvarez, J. Plaza, J.-A. Rico-Gallego, and A. Plaza, “Distributed deep learning for remote sensing data interpretation,” Proceedings of the IEEE, vol. 109, no. 8, pp. 1320–1349, 2021.
- D. Li, W. Xie, Y. Li, and L. Fang, “Fedfusion: Manifold driven federated learning for multi-satellite and multi-modality fusion,” arXiv preprint arXiv:2311.09540, 2023.
- S. Veraverbeke, P. Dennison, I. Gitas, G. Hulley, O. Kalashnikova, T. Katagis, L. Kuai, R. Meng, D. Roberts, and N. Stavros, “Hyperspectral remote sensing of fire: State-of-the-art and future perspectives,” Remote Sensing of Environment, vol. 216, pp. 105–121, 2018.
- M. Appel, F. Lahn, W. Buytaert, and E. Pebesma, “Open and scalable analytics of large earth observation datasets: From scenes to multidimensional arrays using scidb and gdal,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 138, pp. 47–56, 2018.
- J. M. Bioucas-Dias, A. Plaza, G. Camps-Valls, P. Scheunders, N. Nasrabadi, and J. Chanussot, “Hyperspectral remote sensing data analysis and future challenges,” IEEE Geoscience and Remote Sensing Magazine, vol. 1, no. 2, pp. 6–36, 2013.
- D. P. Bertsekas and J. N. Tsitsiklis, “Parallel and distributed computation: Numerical methods,” 2003.
- B. Zhang, Z. Chen, D. Peng, J. A. Benediktsson, B. Liu, L. Zou, J. Li, and A. Plaza, “Remotely sensed big data: Evolution in model development for information extraction [point of view],” Proceedings of the IEEE, vol. 107, no. 12, pp. 2294–2301, 2019.
- Z. Tang, S. Shi, X. Chu, W. Wang, and B. Li, “Communication-efficient distributed deep learning: A comprehensive survey.” Cornell University - arXiv,Cornell University - arXiv, Mar 2020.
- Q. Bi, K. Qin, H. Zhang, J. Xie, Z. Li, and K. Xu, “Apdc-net: Attention pooling-based convolutional network for aerial scene classification,” IEEE Geoscience and Remote Sensing Letters, vol. 17, no. 9, pp. 1603–1607, 2019.
- R. Zhu, L. Yan, N. Mo, and Y. Liu, “Attention-based deep feature fusion for the scene classification of high-resolution remote sensing images,” Remote Sensing, vol. 11, no. 17, p. 1996, 2019.
- W. Zhang, P. Tang, and L. Zhao, “Remote sensing image scene classification using cnn-capsnet,” Remote Sensing, vol. 11, no. 5, p. 494, 2019.
- K. Xu, H. Huang, Y. Li, and G. Shi, “Multilayer feature fusion network for scene classification in remote sensing,” IEEE Geoscience and Remote Sensing Letters, vol. 17, no. 11, pp. 1894–1898, 2020.
- S. U. Stich, J.-B. Cordonnier, and M. Jaggi, “Sparsified sgd with memory,” Advances in Neural Information Processing Systems, vol. 31, 2018.
- Y. Tsuzuku, H. Imachi, and T. Akiba, “Variance-based gradient compression for efficient distributed deep learning,” Cornell University - arXiv,Cornell University - arXiv, Feb 2018.
- J. Wangni, J. Wang, J. Liu, and T. Zhang, “Gradient sparsification for communication-efficient distributed optimization,” arXiv: Learning,arXiv: Learning, Oct 2017.
- A. Dutta, E. H. Bergou, A. M. Abdelmoniem, C.-Y. Ho, A. N. Sahu, M. Canini, and P. Kalnis, “On the discrepancy between the theoretical analysis and practical implementations of compressed communication for distributed deep learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 04, 2020, pp. 3817–3824.
- D. Alistarh, D. Grubic, J. Li, R. Tomioka, and M. Vojnovic, “Qsgd: Communication-efficient sgd via gradient quantization and encoding,” Advances in Neural Information Processing Systems, vol. 30, 2017.
- J. Bernstein, Y.-X. Wang, K. Azizzadenesheli, and A. Anandkumar, “Signsgd: Compressed optimisation for non-convex problems,” arXiv: Learning,arXiv: Learning, Feb 2018.
- X. Dai, X. Yan, K. Zhou, H. Yang, K. K. Ng, J. Cheng, and Y. Fan, “Hyper-sphere quantization: Communication-efficient sgd for federated learning,” arXiv preprint arXiv:1911.04655, 2019.
- S. P. Karimireddy, Q. Rebjock, S. Stich, and M. Jaggi, “Error feedback fixes signsgd and other gradient compression schemes,” in Proceedings of the International Conference on Machine Learning (ICML), 2019, pp. 3252–3261.
- J. Wu, W. Huang, J. Huang, and T. Zhang, “Error compensated quantized sgd and its applications to large-scale distributed optimization,” in Proceedings of the International Conference on Machine Learning (ICML). PMLR, 2018, pp. 5325–5333.
- W. Wen, C. Xu, F. Yan, C. Wu, Y. Wang, Y. Chen, and H. Li, “Terngrad: Ternary gradients to reduce communication in distributed deep learning,” Advances in Neural Information Processing Systems, vol. 30, 2017.
- M. Cho, V. Muthusamy, B. Nemanich, and R. Puri, “Gradzip: Gradient compression using alternating matrix factorization for large-scale deep learning,” in NeurIPS, 2019.
- H. Wang, S. Sievert, S. Liu, Z. Charles, D. Papailiopoulos, and S. Wright, “Atomo: Communication-efficient learning via atomic sparsification,” Advances in Neural Information Processing Systems, vol. 31, 2018.
- T. Vogels, S. P. Karimireddy, and M. Jaggi, “Powersgd: Practical low-rank gradient compression for distributed optimization,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- M. Yu, Z. Lin, K. Narra, S. Li, Y. Li, N. Kim, A. Schwing, M. Annavaram, and A. Avestimehr, “Gradiveq: Vector quantization for bandwidth-efficient gradient aggregation in distributed cnn training,” arXiv: Learning,arXiv: Learning, Nov 2018.
- Y. Lin, S. Han, H. Mao, Y. Wang, and W. J. Dally, “Deep gradient compression: Reducing the communication bandwidth for distributed training,” arXiv preprint arXiv:1712.01887, 2017.
- C. Xie, S. Zheng, S. Koyejo, I. Gupta, M. Li, and H. Lin, “Cser: Communication-efficient sgd with error reset,” Advances in Neural Information Processing Systems, vol. 33, pp. 12 593–12 603, 2020.
- D. Basu, D. Data, C. Karakus, and S. Diggavi, “Qsparse-local-sgd: Distributed sgd with quantization, sparsification, and local computations,” arXiv: Machine Learning,arXiv: Machine Learning, Jun 2019.
- W. Dong, J. Wu, Y. Luo, Z. Ge, and P. Wang, “Node representation learning in graph via node-to-neighbourhood mutual information maximization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 16 620–16 629.
- L. Song, K. Zhao, P. Pan, Y. Liu, Y. Zhang, Y. Xu, and R. Jin, “Communication efficient sgd via gradient sampling with bayes prior,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 12 065–12 074.
- D. Bouneffouf, “Exponentiated gradient exploration for active learning,” arXiv: Learning,arXiv: Learning, Aug 2014.
- H. Wang, S. Kim, E. McCord-Snook, Q. Wu, and H. Wang, “Variance reduction in gradient exploration for online learning to rank,” in Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019, pp. 835–844.
- G. Bhat, M. Danelljan, L. Van Gool, and R. Timofte, “Deep burst super-resolution,” in Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2021. [Online]. Available: http://dx.doi.org/10.1109/cvpr46437.2021.00909
- F. Seide, H. Fu, J. Droppo, G. Li, and D. Yu, “1-bit stochastic gradient descent and its application to data-parallel distributed training of speech dnns,” in Fifteenth Annual Conference of the International Speech Communication Association, 2014.
- M. Elibol, L. Lei, and M. I. Jordan, “Variance reduction with sparse gradients,” arXiv preprint arXiv:2001.09623, 2020.
- A. F. Aji and K. Heafield, “Sparse communication for distributed gradient descent,” in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP), Jan 2017. [Online]. Available: http://dx.doi.org/10.18653/v1/d17-1045
- L. Abrahamyan, Y. Chen, G. Bekoulis, and N. Deligiannis, “Learned gradient compression for distributed deep learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 12, pp. 7330–7344, 2021.
- Z. Tao, Q. Xia, Q. Li, and S. Cheng, “Ce-sgd: Communication-efficient distributed machine learning,” in Proceedings of the 2021 IEEE Global Communications Conference (GLOBECOM). IEEE, 2021, pp. 1–7.
- L. Wang, H. Zhong, R. Ranjan, A. Zomaya, and P. Liu, “Estimating the statistical characteristics of remote sensing big data in the wavelet transform domain,” IEEE Transactions on Emerging Topics in Computing, vol. 2, no. 3, pp. 324–337, 2014.
- M. M. U. Rathore, A. Paul, A. Ahmad, B.-W. Chen, B. Huang, and W. Ji, “Real-time big data analytical architecture for remote sensing application,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 8, no. 10, pp. 4610–4621, 2015.
- V. Syrris, C. Corbane, M. Pesaresi, and P. Soille, “Mosaicking copernicus sentinel-1 data at global scale,” IEEE Transactions on Big Data, vol. 6, no. 3, pp. 547–557, 2018.
- Q. Yuan, H. Shen, T. Li, Z. Li, S. Li, Y. Jiang, H. Xu, W. Tan, Q. Yang, J. Wang et al., “Deep learning in environmental remote sensing: Achievements and challenges,” Remote Sensing of Environment, vol. 241, p. 111716, 2020.
- Y. Ma, H. Wu, L. Wang, B. Huang, R. Ranjan, A. Zomaya, and W. Jie, “Remote sensing big data computing: Challenges and opportunities,” Future Generation Computer Systems, vol. 51, pp. 47–60, 2015.
- M. Chi, A. Plaza, J. A. Benediktsson, Z. Sun, J. Shen, and Y. Zhu, “Big data for remote sensing: Challenges and opportunities,” Proceedings of the IEEE, vol. 104, no. 11, pp. 2207–2219, 2016.
- J. Sun, Y. Zhang, Z. Wu, Y. Zhu, X. Yin, Z. Ding, Z. Wei, J. Plaza, and A. Plaza, “An efficient and scalable framework for processing remotely sensed big data in cloud computing environments,” IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 7, pp. 4294–4308, 2019.
- W. Boulila, M. Sellami, M. Driss, M. Al-Sarem, M. Safaei, and F. A. Ghaleb, “Rs-dcnn: A novel distributed convolutional-neural-networks based-approach for big remote-sensing image classification,” Computers and Electronics in Agriculture, vol. 182, p. 106014, 2021.
- E. Martel, R. Guerra, S. Lopez, and R. Sarmiento, “A gpu-based processing chain for linearly unmixing hyperspectral images,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 10, no. 3, pp. 818–834, 2016.
- Y. Yang and S. Newsam, “Bag-of-visual-words and spatial extensions for land-use classification,” in Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2010, pp. 270–279.
- G. Cheng, J. Han, and X. Lu, “Remote sensing image scene classification: Benchmark and state of the art,” Proceedings of the IEEE, p. 1865–1883, Oct 2017. [Online]. Available: http://dx.doi.org/10.1109/jproc.2017.2675998
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
- X. Wu, D. Hong, and J. Chanussot, “Convolutional neural networks for multimodal remote sensing data classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–10, 2021.
- J. Yao, B. Zhang, C. Li, D. Hong, and J. Chanussot, “Extended vision transformer (exvit) for land use and land cover classification: A multimodal deep learning framework,” IEEE Transactions on Geoscience and Remote Sensing, 2023.
- Weiying Xie (31 papers)
- Zixuan Wang (82 papers)
- Jitao Ma (5 papers)
- Daixun Li (9 papers)
- Yunsong Li (41 papers)