Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Robust and Efficient Cloud-Edge Elastic Model Adaptation via Selective Entropy Distillation (2402.17316v3)

Published 27 Feb 2024 in cs.CV

Abstract: The conventional deep learning paradigm often involves training a deep model on a server and then deploying the model or its distilled ones to resource-limited edge devices. Usually, the models shall remain fixed once deployed (at least for some period) due to the potential high cost of model adaptation for both the server and edge sides. However, in many real-world scenarios, the test environments may change dynamically (known as distribution shifts), which often results in degraded performance. Thus, one has to adapt the edge models promptly to attain promising performance. Moreover, with the increasing data collected at the edge, this paradigm also fails to further adapt the cloud model for better performance. To address these, we encounter two primary challenges: 1) the edge model has limited computation power and may only support forward propagation; 2) the data transmission budget between cloud and edge devices is limited in latency-sensitive scenarios. In this paper, we establish a Cloud-Edge Elastic Model Adaptation (CEMA) paradigm in which the edge models only need to perform forward propagation and the edge models can be adapted online. In our CEMA, to reduce the communication burden, we devise two criteria to exclude unnecessary samples from uploading to the cloud, i.e., dynamic unreliable and low-informative sample exclusion. Based on the uploaded samples, we update and distribute the affine parameters of normalization layers by distilling from the stronger foundation model to the edge model with a sample replay strategy. Extensive experimental results on ImageNet-C and ImageNet-R verify the effectiveness of our CEMA.

Selective Entropy Distillation for Efficient Cloud-Edge Model Adaptation

Introduction

The paper proposes a Cloud-Edge Elastic Model Adaptation (CEMA) framework designed to address two primary challenges in deploying deep neural networks (DNNs) in real-world edge devices: limited computation power on the edge and constrained data transmission budgets between cloud and edge devices. Given the dynamic nature of real-world environments, edge models often suffer from degraded performance due to distribution shifts in the test data. Traditional methods for updating these models either lack practicality due to resource constraints on edge devices or introduce heavy communication overheads when adapting models in the cloud. The paper introduces an innovative paradigm for online adaptation of edge models that leverages sample filtration based on dynamic and static entropy thresholds, minimizing unnecessary data transmission, and utilizes replay-based knowledge distillation from a more powerful foundation model in the cloud for efficient and effective adaptation.

Cloud-Edge Communication-Efficient Model Adaptation

The CEMA framework is designed for scenarios where edge devices confront distribution-shifted test samples. It effectively partitions the adaptation task between edge devices and the cloud, exploiting their computational and data resources. The paper delineates a selective sample uploading mechanism aimed at reducing communication overhead by filtering out high-entropy (unreliable) and low-entropy (low-informative) samples. Moreover, it elaborates on a strategy where the edge model is adapted in the cloud through knowledge distillation, guided by the stronger foundation model. A distinctive aspect of this process is the use of a replay buffer to enhance data utilization, allowing the system to learn from both newly uploaded and previously encountered samples.

Main Contributions

The paper's main contributions are threefold:

  1. Introduction of the Cloud-Edge Elastic Model Adaptation (CEMA) paradigm, a novel and practical framework for efficient model adaptation in distributed environments.
  2. Proposal of a replay-based entropy distillation method that facilitates the adaptation of edge models to new environments dynamically, leveraging a foundation model in the cloud.
  3. Implementation of entropy-based criteria for sample selection, significantly reducing communication costs by excluding samples that are deemed unreliable or low-informative, verified through experiments to lower communication costs by 60% compared to state-of-the-art methods on ImageNet-C.

Experimental Results

The efficacy of the CEMA paradigm is demonstrated through extensive experiments on ImageNet-C and ImageNet-R, showcasing its superiority in adapting edge models under distribution shifts. Notably, the framework achieves commendable performance with substantially lower data transmission, addressing the practical challenge of updating edge models in latency-sensitive applications. The results illustrate the potential of CEMA in real-world deployments, where maintaining high model performance with minimal communication overhead is paramount.

Future Directions

The paper speculates on several avenues for future research, including refining the entropy-based sample selection criteria for further reduction in communication costs and exploring the applicability of CEMA across a broader range of edge computing scenarios. Another interesting direction could involve investigating the impact of different types of foundation models on the adaptation performance and efficiency of edge models.

Conclusion

The paper presents a formidable approach towards robust and efficient model adaptation in cloud-edge deployments, addressing critical challenges in real-world AI applications. By carefully designing mechanisms for selective sample transmission and leveraging advanced knowledge distillation techniques, it sets a new benchmark for practical, communication-efficient model adaptation strategies. The proposed CEMA paradigm holds significant promise for enhancing the performance of edge AI systems, paving the way for more adaptive, efficient, and scalable deployments in diverse applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (85)
  1. Hyperdomainnet: Universal domain adaptation for generative adversarial networks. In Advances in Neural Information Processing Systems, pp.  29414–29426, 2022.
  2. Mt3: Meta test-time training for self-supervised test-time adaption. In International Conference on Artificial Intelligence and Statistics, pp.  3080–3090, 2022.
  3. The power of ensembles for active learning in image classification. In IEEE Conference on Computer Vision and Pattern Recognition, pp.  9368–9377, 2018.
  4. Distill on the go: Online knowledge distillation in self-supervised learning. In IEEE Conference on Computer Vision and Pattern Recognition, pp.  2678–2687, 2021.
  5. Parameter-free online test-time adaptation. In IEEE Conference on Computer Vision and Pattern Recognition, pp.  8334–8343, 2022.
  6. Language models are few-shot learners. In Advances in Neural Information Processing Systems, pp.  1877–1901, 2020.
  7. Online knowledge distillation with diverse peers. In AAAI Conference on Artificial Intelligence, pp.  3430–3437, 2020a.
  8. Data-free learning of student networks. In IEEE International Conference on Computer Vision, pp.  3513–3521, 2019.
  9. Wasserstein contrastive representation distillation. In IEEE Conference on Computer Vision and Pattern Recognition, pp.  16296–16305, 2021a.
  10. Distilling knowledge via knowledge review. In IEEE Conference on Computer Vision and Pattern Recognition, pp.  5008–5017, 2021b.
  11. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning, pp.  1597–1607, 2020b.
  12. Clipper: A low-latency online prediction serving system. In USENIX Symposium on Networked Systems Design and Implementation, pp.  613–627, 2017.
  13. Efficient test-time adaptation for super-resolution with second-order degradation and reconstruction. In Advances in Neural Information Processing Systems, volume 36, 2024.
  14. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021.
  15. Domain generalization via model-agnostic learning of semantic features. In Advances in Neural Information Processing Systems, pp.  6447–6458, 2019.
  16. Deep bayesian active learning with image data. In International Conference on Machine Learning, pp.  1183–1192, 2017.
  17. Test-time training with masked autoencoders. volume 35, pp.  29374–29385, 2022.
  18. Unsupervised representation learning by predicting image rotations. In International Conference on Learning Representations, 2018.
  19. Semi-supervised learning by entropy minimization. In Advances in Neural Information Processing Systems, pp.  529–536, 2004.
  20. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, pp.  770–778, 2016.
  21. Benchmarking neural network robustness to common corruptions and perturbations. In International Conference on Learning Representations, 2019.
  22. The many faces of robustness: A critical analysis of out-of-distribution generalization. In IEEE Conference on Computer Vision and Pattern Recognition, pp.  8340–8349, 2021.
  23. Distilling the knowledge in a neural network. arXiv, abs/1503.02531, 2015.
  24. Bayesian active learning for classification and preference learning. arXiv, abs/1112.5745, 2011.
  25. Mixnorm: Test-time adaptation through online normalization estimation. arXiv, abs/2110.11478, 2021.
  26. Sergey Ioffe. Batch renormalization: Towards reducing minibatch dependence in batch-normalized models. In Advances in Neural Information Processing Systems, pp.  1945–1953, 2017.
  27. Test-time classifier adjustment module for model-agnostic domain generalization. Advances in Neural Information Processing Systems, 34:2427–2440, 2021.
  28. Self-paced learning with diversity. In Advances in Neural Information Processing Systems, pp.  2078–2086, 2014.
  29. Sita: Single image test-time adaptation. arXiv, abs/2112.02355, 2021.
  30. Wilds: A benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning, pp.  5637–5664, 2021.
  31. Self-paced learning for latent variable models. In Advances in Neural Information Processing Systems, pp.  1189–1197, 2010.
  32. Dong-Hyun Lee et al. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In International Conference on Machine Learning Workshop, volume 3, pp.  896, 2013.
  33. Learning to generalize: Meta-learning for domain generalization. In AAAI Conference on Artificial Intelligence, pp.  3490–3497, 2018.
  34. Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. In International Conference on Machine Learning, pp.  6028–6039, 2020.
  35. Prototype-guided continual adaptation for class-incremental unsupervised domain adaptation. In European Conference on Computer Vision, pp.  351–368. Springer, 2022.
  36. Microsoft COCO: common objects in context. arXiv, abs/1405.0312, 2014.
  37. Ttt++: When does self-supervised test-time training fail or thrive? In Advances in Neural Information Processing Systems, volume 34, 2021.
  38. Knowledge distillation via instance relationship graph. In IEEE Conference on Computer Vision and Pattern Recognition, pp.  7096–7104, 2019.
  39. Datamix: Efficient privacy-preserving edge-cloud inference. In European Conference on Computer Vision, pp.  578–595, 2020.
  40. A convnet for the 2020s. In IEEE Conference on Computer Vision and Pattern Recognition, pp.  11966–11976, 2022.
  41. Duet: A tuning-free device-cloud collaborative parameters generation framework for efficient device model generalization. In Proceedings of the ACM Web Conference, pp.  3077–3085, 2023.
  42. Kitting in the wild through online domain adaptation. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.  1103–1109, 2018.
  43. Active learning by acquiring contrastive examples. In Proceedings of Conference on Empirical Methods in Natural Language Processing, pp.  650–663, 2021.
  44. Playing atari with deep reinforcement learning. arXiv, abs/1312.5602, 2013.
  45. Evaluating prediction-time batch normalization for robustness under covariate shift. arXiv, abs/2006.10963, 2020.
  46. Obtaining well calibrated probabilities using bayesian binning. In AAAI Conference on Artificial Intelligence, pp.  2901–2907, 2015.
  47. Efficient test-time model adaptation without forgetting. In International Conference on Machine Learning, pp.  16888–16905, 2022.
  48. Towards stable test-time adaptation in dynamic wild world. In International Conference on Learning Representations, 2023.
  49. Tensorflow-serving: Flexible, high-performance ml serving. arXiv, abs/1712.06139, 2017.
  50. Deep private-feature extraction. IEEE Transactions on Knowledge and Data Engineering, 32(1):54–66, 2018.
  51. Relational knowledge distillation. In IEEE Conference on Computer Vision and Pattern Recognition, pp.  3967–3976, 2019.
  52. Self-paced contrastive learning for semi-supervised medical image segmentation with meta-labels. In Advances in Neural Information Processing Systems, pp.  16686–16699, 2021.
  53. Sentry: Selective entropy optimization via committee consistency for unsupervised domain adaptation. In IEEE International Conference on Computer Vision, pp.  8558–8567, 2021.
  54. Switchable online knowledge distillation. In European Conference on Computer Vision, volume 13671, pp.  449–466, 2022.
  55. Source-free domain adaptation via avatar prototype generation and adaptation. In International Joint Conference on Artificial Intelligence, pp.  2921–2927, 2021.
  56. Improving language understanding by generative pre-training. In International Conference on Learning Representations, 2018.
  57. You only look once: Unified, real-time object detection. In IEEE Conference on Computer Vision and Pattern Recognition, pp.  779–788, 2016.
  58. Fine-grained elastic partitioning for distributed dnn towards mobile web ar services in the 5g era. IEEE Transactions on Services Computing, 2021.
  59. A survey of deep active learning. ACM Computing Surveys, 54(9):180:1–180:40, 2022.
  60. Fitnets: Hints for thin deep nets. In International Conference on Learning Representations, 2015.
  61. Unsupervised domain adaptation using feature-whitening and consensus loss. In IEEE Conference on Computer Vision and Pattern Recognition, pp.  9471–9480, 2019.
  62. Semi-supervised domain adaptation via minimax entropy. In IEEE International Conference on Computer Vision, pp.  8049–8057, 2019.
  63. Self paced deep learning for weakly supervised object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(3):712–725, 2019.
  64. Improving robustness against common corruptions by covariate shift adaptation. In Advances in Neural Information Processing Systems, volume 33, pp.  11539–11551, 2020.
  65. Active learning for convolutional neural networks: A core-set approach. In International Conference on Learning Representations, 2018.
  66. Burr Settles. Active learning literature survey. University of Wisconsin, 52, 07 2010.
  67. Test-time training with self-supervision for generalization under distribution shifts. In International Conference on Machine Learning, pp.  9229–9248, 2020.
  68. Dropout-based active learning for regression. In Analysis of Images, Social Networks and Texts, pp.  247–258, 2018.
  69. Han Vanholder. Efficient inference with tensorrt. In GPU Technology Conference, volume 1, pp.  2, 2016.
  70. Tent: Fully test-time adaptation by entropy minimization. In International Conference on Learning Representations, 2021.
  71. Not just privacy: Improving performance of private deep learning in mobile cloud. In Proceedings of SIGKDD International Conference on Knowledge Discovery & Data Mining, pp.  2407–2416, 2018.
  72. Continual test-time domain adaptation. In IEEE Conference on Computer Vision and Pattern Recognition, pp.  7191–7201, 2022.
  73. Test-time model adaptation for visual question answering with debiased self-supervisions. IEEE Transactions on Multimedia, 2023.
  74. Eosdnn: An efficient offloading scheme for dnn inference acceleration in local-edge-cloud collaborative environments. IEEE Transactions on Green Communications and Networking, 6(1):248–264, 2021.
  75. Rethinking knowledge distillation via cross-entropy. arXiv, abs/2208.10139, 2022a.
  76. Masked generative distillation. In European Conference on Computer Vision, volume 13671, pp.  53–69, 2022b.
  77. Generalized knowledge distillation via relationship matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(2):1817–1834, 2022.
  78. Data-free knowledge distillation via feature exchange and activation region constraint. In IEEE Conference on Computer Vision and Pattern Recognition, pp.  24266–24275, 2023.
  79. Memo: Test time robustness via adaptation and augmentation. In Advances in Neural Information Processing Systems, 2022a.
  80. Collaborative unsupervised domain adaptation for medical image diagnosis. IEEE Transactions on Image Processing, 29:7834–7844, 2020.
  81. Self-supervised aggregation of diverse experts for test-agnostic long-tailed recognition. In Advances in Neural Information Processing Systems, volume 35, pp.  34077–34090, 2022b.
  82. Deep mutual learning. In IEEE Conference on Computer Vision and Pattern Recognition, pp.  4320–4328, 2018.
  83. Decoupled knowledge distillation. In IEEE Conference on Computer Vision and Pattern Recognition, pp.  11943–11952, 2022.
  84. DELTA: Debiased Fully Test-time Adaptation. In International Conference on Learning Representations, 2023.
  85. Edge intelligence: Paving the last mile of artificial intelligence with edge computing. Proceedings of the IEEE, 107(8):1738–1762, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yaofo Chen (14 papers)
  2. Shuaicheng Niu (23 papers)
  3. Shoukai Xu (3 papers)
  4. Hengjie Song (3 papers)
  5. Yaowei Wang (149 papers)
  6. Mingkui Tan (124 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com