2000 character limit reached
NetBooster: Empowering Tiny Deep Learning By Standing on the Shoulders of Deep Giants (2306.13586v1)
Published 23 Jun 2023 in cs.LG and cs.DC
Abstract: Tiny deep learning has attracted increasing attention driven by the substantial demand for deploying deep learning on numerous intelligent Internet-of-Things devices. However, it is still challenging to unleash tiny deep learning's full potential on both large-scale datasets and downstream tasks due to the under-fitting issues caused by the limited model capacity of tiny neural networks (TNNs). To this end, we propose a framework called NetBooster to empower tiny deep learning by augmenting the architectures of TNNs via an expansion-then-contraction strategy. Extensive experiments show that NetBooster consistently outperforms state-of-the-art tiny deep learning solutions.
- L. Bossard et al., “Food-101–mining discriminative components with random forests,” in ECCV. Springer, 2014, pp. 446–461.
- H. Cai, “Proxylessnas: Direct neural architecture search on target task and hardware,” arXiv preprint, 2018.
- H. Cai et al., “Network augmentation for tiny deep learning,” arXiv preprint, 2021.
- E. D. Cubuk, “Randaugment: Practical automated data augmentation with a reduced search space,” in CVPR Workshops, 2020, pp. 702–703.
- E. D. Cubuk et al., “Autoaugment: Learning augmentation policies from data,” arXiv preprint, 2018.
- X. Ding et al., “Repvgg: Making vgg-style convnets great again,” in CVPR, 2021, pp. 13 733–13 742.
- J. Donahue et al., “Decaf: A deep convolutional activation feature for generic visual recognition,” in ICML. PMLR, 2014, pp. 647–655.
- M. Everingham et al., “The pascal visual object classes (voc) challenge,” IJCV, vol. 88, no. 2, pp. 303–338, 2010.
- Y. Fu et al., “Double-win quant: Aggressively winning robustness of quantized deep neural networks via random precision training and inference,” in ICML. PMLR, 2021, pp. 3492–3504.
- G. Ghiasi et al., “Dropblock: A regularization method for convolutional networks,” arXiv preprint, 2018.
- K. He et al., “Deep residual learning for image recognition,” in CVPR, 2016, pp. 770–778.
- G. Hinton et al., “Distilling the knowledge in a neural network,” arXiv preprint, 2015.
- N. K. Jha et al., “Deepreduce: Relu reduction for fast private inference,” in ICML. PMLR, 2021, pp. 4839–4849.
- X. Jin et al., “Knowledge distillation via route constrained optimization,” in ICCV, 2019, pp. 1345–1354.
- J. Krause et al., “3d object representations for fine-grained categorization,” in ICCV workshops, 2013, pp. 554–561.
- J. Lee et al., “Compounding the performance improvements of assembled techniques in a convolutional neural network,” arXiv preprint, 2020.
- J. Lin et al., “Mcunet: Tiny deep learning on iot devices,” arXiv preprint, 2020.
- S. Liu et al., “Do we actually need dense over-parameterization? in-time over-parameterization in sparse training,” in ICML, 2021.
- Z. Liu et al., “Learning efficient convolutional networks through network slimming,” in ICCV, 2017, pp. 2736–2744.
- S. I. Mirzadeh et al., “Improved knowledge distillation via teacher assistant,” in AAAI, vol. 34, no. 04, 2020, pp. 5191–5198.
- R. Mormont et al., “Comparison of deep transfer learning strategies for digital pathology,” in CVPR workshops, 2018, pp. 2262–2271.
- T. Nguyen et al., “Do wide and deep networks learn the same things? uncovering how neural network representations vary with width and depth,” arXiv preprint, 2020.
- M.-E. Nilsback et al., “Automated flower classification over a large number of classes,” in ICVGIP. IEEE, 2008, pp. 722–729.
- O. M. Parkhi et al., “Cats and dogs,” in CVPR. IEEE, 2012, pp. 3498–3505.
- H. Salman et al., “Do adversarially robust imagenet models transfer better?” NeurIPS, vol. 33, pp. 3533–3545, 2020.
- M. Sandler et al., “Mobilenetv2: Inverted residuals and linear bottlenecks,” in CVPR, 2018, pp. 4510–4520.
- N. Srivastava et al., “Dropout: a simple way to prevent neural networks from overfitting,” JMLR, vol. 15, no. 1, pp. 1929–1958, 2014.
- Y. Tian et al., “Contrastive representation distillation,” arXiv preprint, 2019.
- Z. Yu et al., “Mia-former: Efficient and robust vision transformers via multi-grained input-adaptation,” arXiv preprint, 2021.
- L. Yuan et al., “Revisiting knowledge distillation via label smoothing regularization,” in CVPR, 2020, pp. 3903–3911.
- G. Zhou et al., “Rocket launching: A universal and efficient framework for training well-performing light net,” in AAAI, 2018.