Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
140 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

NetBooster: Empowering Tiny Deep Learning By Standing on the Shoulders of Deep Giants (2306.13586v1)

Published 23 Jun 2023 in cs.LG and cs.DC

Abstract: Tiny deep learning has attracted increasing attention driven by the substantial demand for deploying deep learning on numerous intelligent Internet-of-Things devices. However, it is still challenging to unleash tiny deep learning's full potential on both large-scale datasets and downstream tasks due to the under-fitting issues caused by the limited model capacity of tiny neural networks (TNNs). To this end, we propose a framework called NetBooster to empower tiny deep learning by augmenting the architectures of TNNs via an expansion-then-contraction strategy. Extensive experiments show that NetBooster consistently outperforms state-of-the-art tiny deep learning solutions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. L. Bossard et al., “Food-101–mining discriminative components with random forests,” in ECCV.   Springer, 2014, pp. 446–461.
  2. H. Cai, “Proxylessnas: Direct neural architecture search on target task and hardware,” arXiv preprint, 2018.
  3. H. Cai et al., “Network augmentation for tiny deep learning,” arXiv preprint, 2021.
  4. E. D. Cubuk, “Randaugment: Practical automated data augmentation with a reduced search space,” in CVPR Workshops, 2020, pp. 702–703.
  5. E. D. Cubuk et al., “Autoaugment: Learning augmentation policies from data,” arXiv preprint, 2018.
  6. X. Ding et al., “Repvgg: Making vgg-style convnets great again,” in CVPR, 2021, pp. 13 733–13 742.
  7. J. Donahue et al., “Decaf: A deep convolutional activation feature for generic visual recognition,” in ICML.   PMLR, 2014, pp. 647–655.
  8. M. Everingham et al., “The pascal visual object classes (voc) challenge,” IJCV, vol. 88, no. 2, pp. 303–338, 2010.
  9. Y. Fu et al., “Double-win quant: Aggressively winning robustness of quantized deep neural networks via random precision training and inference,” in ICML.   PMLR, 2021, pp. 3492–3504.
  10. G. Ghiasi et al., “Dropblock: A regularization method for convolutional networks,” arXiv preprint, 2018.
  11. K. He et al., “Deep residual learning for image recognition,” in CVPR, 2016, pp. 770–778.
  12. G. Hinton et al., “Distilling the knowledge in a neural network,” arXiv preprint, 2015.
  13. N. K. Jha et al., “Deepreduce: Relu reduction for fast private inference,” in ICML.   PMLR, 2021, pp. 4839–4849.
  14. X. Jin et al., “Knowledge distillation via route constrained optimization,” in ICCV, 2019, pp. 1345–1354.
  15. J. Krause et al., “3d object representations for fine-grained categorization,” in ICCV workshops, 2013, pp. 554–561.
  16. J. Lee et al., “Compounding the performance improvements of assembled techniques in a convolutional neural network,” arXiv preprint, 2020.
  17. J. Lin et al., “Mcunet: Tiny deep learning on iot devices,” arXiv preprint, 2020.
  18. S. Liu et al., “Do we actually need dense over-parameterization? in-time over-parameterization in sparse training,” in ICML, 2021.
  19. Z. Liu et al., “Learning efficient convolutional networks through network slimming,” in ICCV, 2017, pp. 2736–2744.
  20. S. I. Mirzadeh et al., “Improved knowledge distillation via teacher assistant,” in AAAI, vol. 34, no. 04, 2020, pp. 5191–5198.
  21. R. Mormont et al., “Comparison of deep transfer learning strategies for digital pathology,” in CVPR workshops, 2018, pp. 2262–2271.
  22. T. Nguyen et al., “Do wide and deep networks learn the same things? uncovering how neural network representations vary with width and depth,” arXiv preprint, 2020.
  23. M.-E. Nilsback et al., “Automated flower classification over a large number of classes,” in ICVGIP.   IEEE, 2008, pp. 722–729.
  24. O. M. Parkhi et al., “Cats and dogs,” in CVPR.   IEEE, 2012, pp. 3498–3505.
  25. H. Salman et al., “Do adversarially robust imagenet models transfer better?” NeurIPS, vol. 33, pp. 3533–3545, 2020.
  26. M. Sandler et al., “Mobilenetv2: Inverted residuals and linear bottlenecks,” in CVPR, 2018, pp. 4510–4520.
  27. N. Srivastava et al., “Dropout: a simple way to prevent neural networks from overfitting,” JMLR, vol. 15, no. 1, pp. 1929–1958, 2014.
  28. Y. Tian et al., “Contrastive representation distillation,” arXiv preprint, 2019.
  29. Z. Yu et al., “Mia-former: Efficient and robust vision transformers via multi-grained input-adaptation,” arXiv preprint, 2021.
  30. L. Yuan et al., “Revisiting knowledge distillation via label smoothing regularization,” in CVPR, 2020, pp. 3903–3911.
  31. G. Zhou et al., “Rocket launching: A universal and efficient framework for training well-performing light net,” in AAAI, 2018.

Summary

We haven't generated a summary for this paper yet.