Towards Robust and Efficient Cloud-Edge Elastic Model Adaptation via Selective Entropy Distillation
Abstract: The conventional deep learning paradigm often involves training a deep model on a server and then deploying the model or its distilled ones to resource-limited edge devices. Usually, the models shall remain fixed once deployed (at least for some period) due to the potential high cost of model adaptation for both the server and edge sides. However, in many real-world scenarios, the test environments may change dynamically (known as distribution shifts), which often results in degraded performance. Thus, one has to adapt the edge models promptly to attain promising performance. Moreover, with the increasing data collected at the edge, this paradigm also fails to further adapt the cloud model for better performance. To address these, we encounter two primary challenges: 1) the edge model has limited computation power and may only support forward propagation; 2) the data transmission budget between cloud and edge devices is limited in latency-sensitive scenarios. In this paper, we establish a Cloud-Edge Elastic Model Adaptation (CEMA) paradigm in which the edge models only need to perform forward propagation and the edge models can be adapted online. In our CEMA, to reduce the communication burden, we devise two criteria to exclude unnecessary samples from uploading to the cloud, i.e., dynamic unreliable and low-informative sample exclusion. Based on the uploaded samples, we update and distribute the affine parameters of normalization layers by distilling from the stronger foundation model to the edge model with a sample replay strategy. Extensive experimental results on ImageNet-C and ImageNet-R verify the effectiveness of our CEMA.
- Hyperdomainnet: Universal domain adaptation for generative adversarial networks. In Advances in Neural Information Processing Systems, pp. 29414–29426, 2022.
- Mt3: Meta test-time training for self-supervised test-time adaption. In International Conference on Artificial Intelligence and Statistics, pp. 3080–3090, 2022.
- The power of ensembles for active learning in image classification. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 9368–9377, 2018.
- Distill on the go: Online knowledge distillation in self-supervised learning. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 2678–2687, 2021.
- Parameter-free online test-time adaptation. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 8334–8343, 2022.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems, pp. 1877–1901, 2020.
- Online knowledge distillation with diverse peers. In AAAI Conference on Artificial Intelligence, pp. 3430–3437, 2020a.
- Data-free learning of student networks. In IEEE International Conference on Computer Vision, pp. 3513–3521, 2019.
- Wasserstein contrastive representation distillation. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 16296–16305, 2021a.
- Distilling knowledge via knowledge review. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 5008–5017, 2021b.
- A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning, pp. 1597–1607, 2020b.
- Clipper: A low-latency online prediction serving system. In USENIX Symposium on Networked Systems Design and Implementation, pp. 613–627, 2017.
- Efficient test-time adaptation for super-resolution with second-order degradation and reconstruction. In Advances in Neural Information Processing Systems, volume 36, 2024.
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021.
- Domain generalization via model-agnostic learning of semantic features. In Advances in Neural Information Processing Systems, pp. 6447–6458, 2019.
- Deep bayesian active learning with image data. In International Conference on Machine Learning, pp. 1183–1192, 2017.
- Test-time training with masked autoencoders. volume 35, pp. 29374–29385, 2022.
- Unsupervised representation learning by predicting image rotations. In International Conference on Learning Representations, 2018.
- Semi-supervised learning by entropy minimization. In Advances in Neural Information Processing Systems, pp. 529–536, 2004.
- Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.
- Benchmarking neural network robustness to common corruptions and perturbations. In International Conference on Learning Representations, 2019.
- The many faces of robustness: A critical analysis of out-of-distribution generalization. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 8340–8349, 2021.
- Distilling the knowledge in a neural network. arXiv, abs/1503.02531, 2015.
- Bayesian active learning for classification and preference learning. arXiv, abs/1112.5745, 2011.
- Mixnorm: Test-time adaptation through online normalization estimation. arXiv, abs/2110.11478, 2021.
- Sergey Ioffe. Batch renormalization: Towards reducing minibatch dependence in batch-normalized models. In Advances in Neural Information Processing Systems, pp. 1945–1953, 2017.
- Test-time classifier adjustment module for model-agnostic domain generalization. Advances in Neural Information Processing Systems, 34:2427–2440, 2021.
- Self-paced learning with diversity. In Advances in Neural Information Processing Systems, pp. 2078–2086, 2014.
- Sita: Single image test-time adaptation. arXiv, abs/2112.02355, 2021.
- Wilds: A benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning, pp. 5637–5664, 2021.
- Self-paced learning for latent variable models. In Advances in Neural Information Processing Systems, pp. 1189–1197, 2010.
- Dong-Hyun Lee et al. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In International Conference on Machine Learning Workshop, volume 3, pp. 896, 2013.
- Learning to generalize: Meta-learning for domain generalization. In AAAI Conference on Artificial Intelligence, pp. 3490–3497, 2018.
- Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. In International Conference on Machine Learning, pp. 6028–6039, 2020.
- Prototype-guided continual adaptation for class-incremental unsupervised domain adaptation. In European Conference on Computer Vision, pp. 351–368. Springer, 2022.
- Microsoft COCO: common objects in context. arXiv, abs/1405.0312, 2014.
- Ttt++: When does self-supervised test-time training fail or thrive? In Advances in Neural Information Processing Systems, volume 34, 2021.
- Knowledge distillation via instance relationship graph. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 7096–7104, 2019.
- Datamix: Efficient privacy-preserving edge-cloud inference. In European Conference on Computer Vision, pp. 578–595, 2020.
- A convnet for the 2020s. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 11966–11976, 2022.
- Duet: A tuning-free device-cloud collaborative parameters generation framework for efficient device model generalization. In Proceedings of the ACM Web Conference, pp. 3077–3085, 2023.
- Kitting in the wild through online domain adaptation. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1103–1109, 2018.
- Active learning by acquiring contrastive examples. In Proceedings of Conference on Empirical Methods in Natural Language Processing, pp. 650–663, 2021.
- Playing atari with deep reinforcement learning. arXiv, abs/1312.5602, 2013.
- Evaluating prediction-time batch normalization for robustness under covariate shift. arXiv, abs/2006.10963, 2020.
- Obtaining well calibrated probabilities using bayesian binning. In AAAI Conference on Artificial Intelligence, pp. 2901–2907, 2015.
- Efficient test-time model adaptation without forgetting. In International Conference on Machine Learning, pp. 16888–16905, 2022.
- Towards stable test-time adaptation in dynamic wild world. In International Conference on Learning Representations, 2023.
- Tensorflow-serving: Flexible, high-performance ml serving. arXiv, abs/1712.06139, 2017.
- Deep private-feature extraction. IEEE Transactions on Knowledge and Data Engineering, 32(1):54–66, 2018.
- Relational knowledge distillation. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 3967–3976, 2019.
- Self-paced contrastive learning for semi-supervised medical image segmentation with meta-labels. In Advances in Neural Information Processing Systems, pp. 16686–16699, 2021.
- Sentry: Selective entropy optimization via committee consistency for unsupervised domain adaptation. In IEEE International Conference on Computer Vision, pp. 8558–8567, 2021.
- Switchable online knowledge distillation. In European Conference on Computer Vision, volume 13671, pp. 449–466, 2022.
- Source-free domain adaptation via avatar prototype generation and adaptation. In International Joint Conference on Artificial Intelligence, pp. 2921–2927, 2021.
- Improving language understanding by generative pre-training. In International Conference on Learning Representations, 2018.
- You only look once: Unified, real-time object detection. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788, 2016.
- Fine-grained elastic partitioning for distributed dnn towards mobile web ar services in the 5g era. IEEE Transactions on Services Computing, 2021.
- A survey of deep active learning. ACM Computing Surveys, 54(9):180:1–180:40, 2022.
- Fitnets: Hints for thin deep nets. In International Conference on Learning Representations, 2015.
- Unsupervised domain adaptation using feature-whitening and consensus loss. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 9471–9480, 2019.
- Semi-supervised domain adaptation via minimax entropy. In IEEE International Conference on Computer Vision, pp. 8049–8057, 2019.
- Self paced deep learning for weakly supervised object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(3):712–725, 2019.
- Improving robustness against common corruptions by covariate shift adaptation. In Advances in Neural Information Processing Systems, volume 33, pp. 11539–11551, 2020.
- Active learning for convolutional neural networks: A core-set approach. In International Conference on Learning Representations, 2018.
- Burr Settles. Active learning literature survey. University of Wisconsin, 52, 07 2010.
- Test-time training with self-supervision for generalization under distribution shifts. In International Conference on Machine Learning, pp. 9229–9248, 2020.
- Dropout-based active learning for regression. In Analysis of Images, Social Networks and Texts, pp. 247–258, 2018.
- Han Vanholder. Efficient inference with tensorrt. In GPU Technology Conference, volume 1, pp.  2, 2016.
- Tent: Fully test-time adaptation by entropy minimization. In International Conference on Learning Representations, 2021.
- Not just privacy: Improving performance of private deep learning in mobile cloud. In Proceedings of SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2407–2416, 2018.
- Continual test-time domain adaptation. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 7191–7201, 2022.
- Test-time model adaptation for visual question answering with debiased self-supervisions. IEEE Transactions on Multimedia, 2023.
- Eosdnn: An efficient offloading scheme for dnn inference acceleration in local-edge-cloud collaborative environments. IEEE Transactions on Green Communications and Networking, 6(1):248–264, 2021.
- Rethinking knowledge distillation via cross-entropy. arXiv, abs/2208.10139, 2022a.
- Masked generative distillation. In European Conference on Computer Vision, volume 13671, pp. 53–69, 2022b.
- Generalized knowledge distillation via relationship matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(2):1817–1834, 2022.
- Data-free knowledge distillation via feature exchange and activation region constraint. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 24266–24275, 2023.
- Memo: Test time robustness via adaptation and augmentation. In Advances in Neural Information Processing Systems, 2022a.
- Collaborative unsupervised domain adaptation for medical image diagnosis. IEEE Transactions on Image Processing, 29:7834–7844, 2020.
- Self-supervised aggregation of diverse experts for test-agnostic long-tailed recognition. In Advances in Neural Information Processing Systems, volume 35, pp. 34077–34090, 2022b.
- Deep mutual learning. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 4320–4328, 2018.
- Decoupled knowledge distillation. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 11943–11952, 2022.
- DELTA: Debiased Fully Test-time Adaptation. In International Conference on Learning Representations, 2023.
- Edge intelligence: Paving the last mile of artificial intelligence with edge computing. Proceedings of the IEEE, 107(8):1738–1762, 2019.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.