Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Transfer-Once-For-All: AI Model Optimization for Edge (2303.15485v2)

Published 27 Mar 2023 in cs.LG, cs.CV, and cs.NE

Abstract: Weight-sharing neural architecture search aims to optimize a configurable neural network model (supernet) for a variety of deployment scenarios across many devices with different resource constraints. Existing approaches use evolutionary search to extract models of different sizes from a supernet trained on a very large data set, and then fine-tune the extracted models on the typically small, real-world data set of interest. The computational cost of training thus grows linearly with the number of different model deployment scenarios. Hence, we propose Transfer-Once-For-All (TOFA) for supernet-style training on small data sets with constant computational training cost over any number of edge deployment scenarios. Given a task, TOFA obtains custom neural networks, both the topology and the weights, optimized for any number of edge deployment scenarios. To overcome the challenges arising from small data, TOFA utilizes a unified semi-supervised training loss to simultaneously train all subnets within the supernet, coupled with on-the-fly architecture selection at deployment time.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. Can weight sharing outperform random architecture search? an investigation with TuNAS. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
  2. Mixmatch: A holistic approach to semi-supervised learning. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  3. Once for all: Train one network and specialize it for efficient deployment. In International Conference on Learning Representations, 2020.
  4. Fbnetv3: Joint architecture-recipe search using predictor pretraining, 2020.
  5. Prior-guided one-shot neural architecture search. 2022.
  6. Fast and efficient once-for-all networks for diverse hardware deployment, 2022.
  7. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015.
  8. Searching for mobilenetv3. CoRR, abs/1905.02244, 2019.
  9. Dynamic-ofa: Runtime dnn architecture switching for performance scaling on heterogeneous embedded platforms. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 3104–3112, 2021.
  10. Neural architecture transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(09):2971–2989, 2021.
  11. CompOFA: Compound once-for-all networks for faster multi-platform deployment. In Proc. of the 9th International Conference on Learning Representations, 2021.
  12. Openmatch: Open-set semi-supervised learning with open-set consistency regularization. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021.
  13. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 596–608. Curran Associates, Inc., 2020.
  14. K-shot nas: Learnable weight-sharing for nas with k-shot supernets. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 9880–9890. PMLR, 18–24 Jul 2021.
  15. EfficientNet: Rethinking model scaling for convolutional neural networks. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 6105–6114. PMLR, 09–15 Jun 2019.
  16. Alphanet: Improved training of supernet with alpha-divergence. arXiv preprint arXiv:2102.07954, 2021.
  17. Attentivenas: Improving neural architecture search via attentive sampling. arXiv preprint arXiv:2011.09011, 2020.
  18. Bignas: Scaling up neural architecture search with big single-stage models. 2020.
  19. How to train your super-net: An analysis of training heuristics in weight-sharing NAS. CoRR, abs/2003.04276, 2020.
  20. Multi-task curriculum framework for open-set semi-supervised learning. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, Computer Vision – ECCV 2020, pages 438–454, Cham, 2020. Springer International Publishing.
Citations (6)

Summary

We haven't generated a summary for this paper yet.