Efficient Model Adaptation for Continual Learning at the Edge (2308.02084v2)
Abstract: Most ML systems assume stationary and matching data distributions during training and deployment. This is often a false assumption. When ML models are deployed on real devices, data distributions often shift over time due to changes in environmental factors, sensor characteristics, and task-of-interest. While it is possible to have a human-in-the-loop to monitor for distribution shifts and engineer new architectures in response to these shifts, such a setup is not cost-effective. Instead, non-stationary automated ML (AutoML) models are needed. This paper presents the Encoder-Adaptor-Reconfigurator (EAR) framework for efficient continual learning under domain shifts. The EAR framework uses a fixed deep neural network (DNN) feature encoder and trains shallow networks on top of the encoder to handle novel data. The EAR framework is capable of 1) detecting when new data is out-of-distribution (OOD) by combining DNNs with hyperdimensional computing (HDC), 2) identifying low-parameter neural adaptors to adapt the model to the OOD data using zero-shot neural architecture search (ZS-NAS), and 3) minimizing catastrophic forgetting on previous tasks by progressively growing the neural architecture as needed and dynamically routing data through the appropriate adaptors and reconfigurators for handling domain-incremental and class-incremental continual learning. We systematically evaluate our approach on several benchmark datasets for domain adaptation and demonstrate strong performance compared to state-of-the-art algorithms for OOD detection and few-/zero-shot NAS.
- M. De Lange, R. Aljundi, M. Masana, S. Parisot, X. Jia, A. Leonardis, G. Slabaugh, and T. Tuytelaars, “A continual learning survey: Defying forgetting in classification tasks,” IEEE TPAMI, 2021.
- G. M. van de Ven, T. Tuytelaars, and A. S. Tolias, “Three types of incremental learning,” Nature Machine Intelligence, 2022.
- P. Kanerva, “Hyperdimensional computing: An introduction to computing in distributed representation with high-dimensional random vectors,” Cognitive computation, 2009.
- J. Mellor, J. Turner, A. Storkey, and E. J. Crowley, “Neural architecture search without training,” in ICML. PMLR, 2021.
- A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu, R. Pascanu, and R. Hadsell, “Progressive neural networks,” arXiv:1606.04671, 2016.
- H. M. Fayek, L. Cavedon, and H. R. Wu, “Progressive learning: A deep learning framework for continual learning,” Neural Networks, vol. 128, 2020.
- Z. Chen and B. Liu, “Lifelong machine learning,” Synthesis Lectures on AI and ML, 2018.
- J. Yoon, E. Yang, J. Lee, and S. J. Hwang, “Lifelong learning with dynamically expandable networks,” arXiv:1708.01547, 2017.
- C.-Y. Hung, C.-H. Tu, C.-E. Wu, C.-H. Chen, Y.-M. Chan, and C.-S. Chen, “Compacting, picking and growing for unforgetting continual learning,” NeurIPS, 2019.
- X. Li, Y. Zhou, T. Wu, R. Socher, and C. Xiong, “Learn to grow: A continual structure learning framework for overcoming catastrophic forgetting,” in ICML. PMLR, 2019.
- G. Yang, C. S. Y. Wong, and R. Savitha, “Robust continual learning through a comprehensively progressive bayesian neural network,” arXiv:2202.13369, 2022.
- J. Yang, K. Zhou, Y. Li, and Z. Liu, “Generalized out-of-distribution detection: A survey,” arXiv:2110.11334, 2021.
- M. Markou and S. Singh, “Novelty detection: a review—part 1: statistical approaches,” Signal processing, 2003.
- L. M. Manevitz and M. Yousef, “One-class svms for document classification,” JMLR, 2001.
- D. Peña and F. J. Prieto, “Multivariate outlier detection and robust covariance matrix estimation,” Technometrics, 2001.
- M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, “Lof: identifying density-based local outliers,” in ACM SIGMOD International Conference on Management of Data, 2000.
- F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation forest,” in ICDM. IEEE, 2008.
- ——, “Isolation-based anomaly detection,” ACM TKDD, 2012.
- D. Hendrycks and K. Gimpel, “A baseline for detecting misclassified and out-of-distribution examples in neural networks,” arXiv:1610.02136, 2016.
- C. S. Sastry and S. Oore, “Detecting out-of-distribution examples with gram matrices,” in ICML. PMLR, 2020.
- S. Wilson, T. Fischer, N. Sünderhauf, and F. Dayoub, “Hyperdimensional feature fusion for out-of-distribution detection,” in WACV, 2023.
- Y. Sun, Y. Ming, X. Zhu, and Y. Li, “Out-of-distribution detection with deep nearest neighbors,” in ICML. PMLR, 2022.
- A. Thomas, S. Dasgupta, and T. Rosing, “A theoretical perspective on hyperdimensional computing,” JAIR, 2021.
- D. Kleyko, D. A. Rachkovskij, E. Osipov, and A. Rahimi, “A survey on hyperdimensional computing aka vector symbolic architectures, part i: Models and data transformations,” ACM Computing Surveys.
- F. Zhuang, Z. Qi, K. Duan, D. Xi, Y. Zhu, H. Zhu, H. Xiong, and Q. He, “A comprehensive survey on transfer learning,” Proceedings of the IEEE, 2020.
- T. Elsken, J. H. Metzen, and F. Hutter, “Neural architecture search: A survey,” JMLR, 2019.
- M. S. Abdelfattah, A. Mehrotra, Ł. Dudziak, and N. D. Lane, “Zero-cost proxies for lightweight nas,” arXiv:2101.08134, 2021.
- H. Chen, M. Lin, X. Sun, and H. Li, “Nas-bench-zero: A large scale dataset for understanding zero-shot neural architecture search,” 2021.
- N. Lee, T. Ajanthan, and P. H. Torr, “Snip: Single-shot network pruning based on connection sensitivity,” arXiv:1810.02340, 2018.
- C. Wang, G. Zhang, and R. Grosse, “Picking winning tickets before training by preserving gradient flow,” arXiv:2002.07376, 2020.
- J. Turner, E. J. Crowley, M. O’Boyle, A. Storkey, and G. Gray, “Blockswap: Fisher-guided block substitution for network compression on a budget,” arXiv:1906.04113, 2019.
- H. Tanaka, D. Kunin, D. L. Yamins, and S. Ganguli, “Pruning neural networks without any data by iteratively conserving synaptic flow,” NeurIPS, 2020.
- M. Lin, P. Wang, Z. Sun, H. Chen, X. Sun, Q. Qian, H. Li, and R. Jin, “Zen-nas: A zero-shot nas for high-performance image recognition,” in ICCV, 2021.
- T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in ICCV, 2017.
- W. J. Scheirer, L. P. Jain, and T. E. Boult, “Probability models for open set recognition,” IEEE TPAMI, 2014.
- J. Snoek, H. Larochelle, and R. P. Adams, “Practical bayesian optimization of machine learning algorithms,” NeurIPS, 2012.
- F. Nogueira, “Bayesian Optimization: Open source constrained global optimization tool for Python,” 2014–. [Online]. Available: https://github.com/fmfn/BayesianOptimization
- N. Stander and K. Craig, “On the robustness of a simple domain reduction scheme for simulation-based optimization,” Engineering Computations, 2002.
- W. Chen, X. Gong, J. Wu, Y. Wei, H. Shi, Z. Yan, Y. Yang, and Z. Wang, “Understanding and accelerating neural architecture search with training-free and theory-grounded metrics,” arXiv:2108.11939, 2021.
- A. Marsden, “Eigenvalues of the laplacian and their relationship to the connectedness of a graph,” University of Chicago, REU, 2013.
- U. Von Luxburg, “A tutorial on spectral clustering,” Statistics and computing, 2007.
- A. Damle, V. Minden, and L. Ying, “Simple, direct and efficient multi-way spectral clustering,” Information and Inference: A Journal of the IMA, 2019.
- N. X. Vinh, J. Epps, and J. Bailey, “Information theoretic measures for clusterings comparison: is a correction for chance necessary?” in ICML, 2009.
- M. Tan and Q. Le, “Efficientnetv2: Smaller models and faster training,” in ICML. PMLR, 2021.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv:1412.6980, 2014.
- P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola, A. Maschinot, C. Liu, and D. Krishnan, “Supervised contrastive learning,” NeurIPS, 2020.
- D. Li, Y. Yang, Y.-Z. Song, and T. M. Hospedales, “Deeper, broader and artier domain generalization,” in ICCV, 2017.
- K. Saenko, B. Kulis, M. Fritz, and T. Darrell, “Adapting visual category models to new domains,” in ECCV. Springer, 2010.
- H. Venkateswara, J. Eusebio, S. Chakraborty, and S. Panchanathan, “Deep hashing network for unsupervised domain adaptation,” in CVPR, 2017.
- X. Peng, Q. Bai, X. Xia, Z. Huang, K. Saenko, and B. Wang, “Moment matching for multi-source domain adaptation,” in ICCV, 2019.
- Zachary A. Daniels (3 papers)
- Jun Hu (239 papers)
- Michael Lomnitz (9 papers)
- Phil Miller (1 paper)
- Aswin Raghavan (18 papers)
- Joe Zhang (3 papers)
- Michael Piacentino (8 papers)
- David Zhang (83 papers)