Tiny Machine Learning: Progress and Futures (2403.19076v2)
Abstract: Tiny Machine Learning (TinyML) is a new frontier of machine learning. By squeezing deep learning models into billions of IoT devices and microcontrollers (MCUs), we expand the scope of AI applications and enable ubiquitous intelligence. However, TinyML is challenging due to hardware constraints: the tiny memory resource makes it difficult to hold deep learning models designed for cloud and mobile platforms. There is also limited compiler and inference engine support for bare-metal devices. Therefore, we need to co-design the algorithm and system stack to enable TinyML. In this review, we will first discuss the definition, challenges, and applications of TinyML. We then survey the recent progress in TinyML and deep learning on MCUs. Next, we will introduce MCUNet, showing how we can achieve ImageNet-scale AI applications on IoT devices with system-algorithm co-design. We will further extend the solution from inference to training and introduce tiny on-device training techniques. Finally, we present future directions in this area. Today's large model might be tomorrow's tiny model. The scope of TinyML should evolve and adapt over time.
- Z. Liu, Z. Wu, C. Gan, L. Zhu, and S. Han, “Datamix: Efficient privacy-preserving edge-cloud inference,” in European Conference on Computer Vision. Springer, 2020, pp. 578–595.
- A. Singh, P. Vepakomma, O. Gupta, and R. Raskar, “Detailed comparison of communication efficiency of split learning and federated learning,” arXiv preprint arXiv:1909.09145, 2019.
- J. Konečnỳ, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon, “Federated learning: Strategies for improving communication efficiency,” arXiv preprint arXiv:1610.05492, 2016.
- M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
- N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “Shufflenet v2: Practical guidelines for efficient cnn architecture design,” in ECCV, 2018.
- X. Zhang, X. Zhou, M. Lin, and J. Sun, “Shufflenet: An extremely efficient convolutional neural network for mobile devices,” in CVPR, 2018.
- J. Lin, W.-M. Chen, Y. Lin, J. Cohn, C. Gan, and S. Han, “Mcunet: Tiny deep learning on iot devices,” in NeurIPS, 2020.
- J. Lin, W.-M. Chen, H. Cai, C. Gan, and S. Han, “Mcunetv2: Memory-efficient patch-based inference for tiny deep learning,” arXiv preprint arXiv:2110.15352, 2021.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- V. Tsoukas, E. Boumpa, G. Giannakas, and A. Kakarountas, “A review of machine learning and tinyml in healthcare,” in 25th Pan-Hellenic Conference on Informatics, 2021, pp. 69–73.
- A. Rana, Y. Dhiman, and R. Anand, “Cough detection system using tinyml,” in 2022 International Conference on Computing, Communication and Power Technology (IC3P). IEEE, 2022, pp. 119–122.
- O. D’Souza, S. C. Mukhopadhyay, and M. Sheng, “Health, security and fire safety process optimisation using intelligence at the edge,” Sensors, vol. 22, no. 21, 2022. [Online]. Available: https://www.mdpi.com/1424-8220/22/21/8143
- A.-T. Shumba, T. Montanaro, I. Sergi, L. Fachechi, M. De Vittorio, and L. Patrono, “Leveraging iot-aware technologies and ai techniques for real-time critical healthcare applications,” Sensors, vol. 22, no. 19, 2022. [Online]. Available: https://www.mdpi.com/1424-8220/22/19/7675
- M. Vuletic, V. Mujagic, N. Milojevic, and D. Biswas, “Edge ai framework for healthcare applications.”
- A. Wong, M. Famouri, M. Pavlova, and S. Surana, “Tinyspeech: Attention condensers for deep speech recognition neural networks on edge devices,” arXiv preprint arXiv:2008.04245, 2020.
- M. Mazumder, C. Banbury, J. Meyer, P. Warden, and V. J. Reddi, “Few-shot keyword spotting in any language,” arXiv preprint arXiv:2104.01454, 2021.
- E. Hardy and F. Badets, “An ultra-low power rnn classifier for always-on voice wake-up detection robust to real-world scenarios,” arXiv preprint arXiv:2103.04792, 2021.
- C.-H. Lu and X.-Z. Lin, “Toward direct edge-to-edge transfer learning for iot-enabled edge cameras,” IEEE Internet of Things Journal, vol. 8, no. 6, pp. 4931–4943, 2020.
- M. Giordano, P. Mayer, and M. Magno, “A battery-free long-range wireless smart camera for face detection,” in Proceedings of the 8th International Workshop on Energy Harvesting and Energy-Neutral Sensing Systems, 2020, pp. 29–35.
- T. Luukkonen, A. Colley, T. Seppänen, and J. Häkkilä, “Cough activated dynamic face visor,” in Augmented Humans Conference 2021, 2021, pp. 295–297.
- P. Mohan, A. J. Paul, and A. Chirania, “A tiny cnn architecture for medical face mask detection for resource-constrained endpoints,” in Innovations in Electrical and Electronic Engineering. Springer, 2021, pp. 657–670.
- A. Wong, M. Famouri, and M. J. Shafiee, “Attendnets: tiny deep image recognition neural networks for the edge via visual attention condensers,” arXiv preprint arXiv:2009.14385, 2020.
- S. Benatti, F. Montagna, V. Kartsch, A. Rahimi, D. Rossi, and L. Benini, “Online learning and classification of emg-based gestures on a parallel ultra-low power platform using hyperdimensional computing,” IEEE transactions on biomedical circuits and systems, vol. 13, no. 3, pp. 516–528, 2019.
- A. Moin, A. Zhou, A. Rahimi, A. Menon, S. Benatti, G. Alexandrov, S. Tamakloe, J. Ting, N. Yamamoto, Y. Khan et al., “A wearable biosensing system with in-sensor adaptive machine learning for hand gesture recognition,” Nature Electronics, vol. 4, no. 1, pp. 54–63, 2021.
- A. Zhou, R. Muller, and J. Rabaey, “Memory-efficient, limb position-aware hand gesture recognition using hyperdimensional computing,” arXiv preprint arXiv:2103.05267, 2021.
- S. Bian and P. Lukowicz, “Capacitive sensing based on-board hand gesture recognition with tinyml,” in Adjunct Proceedings of the 2021 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2021 ACM International Symposium on Wearable Computers, 2021, pp. 4–5.
- A. J. Paul, P. Mohan, and S. Sehgal, “Rethinking generalization in american sign language prediction for edge devices with extremely low memory footprint,” in 2020 IEEE Recent Advances in Intelligent Computational Systems (RAICS). IEEE, 2020, pp. 147–152.
- M. de Prado, M. Rusci, A. Capotondi, R. Donze, L. Benini, and N. Pazos, “Robustifying the deployment of tinyml models for autonomous mini-vehicles,” Sensors, vol. 21, no. 4, p. 1339, 2021.
- A. N. Roshan, B. Gokulapriyan, C. Siddarth, and P. Kokil, “Adaptive traffic control with tinyml,” in 2021 Sixth International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET). IEEE, 2021, pp. 451–455.
- W. Bao, C. Wu, S. Guleng, J. Zhang, K.-L. A. Yau, and Y. Ji, “Edge computing-based joint client selection and networking scheme for federated learning in vehicular iot,” China Communications, vol. 18, no. 6, pp. 39–52, 2021.
- J. Ying, J. Hsieh, D. Hou, J. Hou, T. Liu, X. Zhang, Y. Wang, and Y.-T. Pan, “Edge-enabled cloud computing management platform for smart manufacturing,” in 2021 IEEE International Workshop on Metrology for Industry 4.0 & IoT (MetroInd4.0&IoT), 2021, pp. 682–686.
- Y. Y. Siang, M. R. Ahamd, and M. S. Z. Abidin, “Anomaly detection based on tiny machine learning: A review,” Open International Journal of Informatics, vol. 9, no. Special Issue 2, pp. 67–78, 2021.
- K. Roth, L. Pemula, J. Zepeda, B. Schölkopf, T. Brox, and P. Gehler, “Towards total recall in industrial anomaly detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 14 318–14 328.
- F. Alongi, N. Ghielmetti, D. Pau, F. Terraneo, and W. Fornaciari, “Tiny neural networks for environmental predictions: an integrated approach with miosix,” in 2020 IEEE International Conference on Smart Computing (SMARTCOMP). IEEE, 2020, pp. 350–355.
- C. Vuppalapati, A. Ilapakurti, K. Chillara, S. Kedari, and V. Mamidi, “Automating tiny ml intelligent sensors devops using microsoft azure,” in 2020 ieee international conference on big data (big data). IEEE, 2020, pp. 2375–2384.
- C. Vuppalapati, A. Ilapakurti, S. Kedari, J. Vuppalapati, S. Kedari, and R. Vuppalapati, “Democratization of ai, albeit constrained iot devices & tiny ml, for creating a sustainable food future,” in 2020 3rd International Conference on Information and Computer Technologies (ICICT). IEEE, 2020, pp. 525–530.
- F. Nakhle and A. L. Harfouche, “Ready, steady, go ai: A practical tutorial on fundamentals of artificial intelligence and its applications in phenomics image analysis,” Patterns, vol. 2, no. 9, p. 100323, 2021.
- D. J. Curnick, A. J. Davies, C. Duncan, R. Freeman, D. M. Jacoby, H. T. Shelley, C. Rossi, O. R. Wearn, M. J. Williamson, and N. Pettorelli, “Smallsats: a new technological frontier in ecology and conservation?” Remote Sensing in Ecology and Conservation, vol. 8, no. 2, pp. 139–150, 2022.
- C. Nicolas, B. Naila, and R.-C. Amar, “Tinyml smart sensor for energy saving in internet of things precision agriculture platform,” in 2022 Thirteenth International Conference on Ubiquitous and Future Networks (ICUFN). IEEE, 2022, pp. 256–259.
- L. Lai, N. Suda, and V. Chandra, “Cmsis-nn: Efficient neural network kernels for arm cortex-m cpus,” arXiv preprint arXiv:1801.06601, 2018.
- STMicroelectronics, “X-cube-ai: Ai expansion pack for stm32cubemx,” https://www.st.com/en/embedded-software/x-cube-ai.html.
- “microtvm: Tvm on bare-metal,” https://tvm.apache.org/docs/topic/microtvm/index.html.
- E. Liberis and N. D. Lane, “Neural networks on microcontrollers: saving memory at inference via operator reordering,” arXiv preprint arXiv:1910.05110, 2019.
- M. Rusci, A. Capotondi, and L. Benini, “Memory-driven mixed low precision quantization for enabling deep network inference on microcontrollers,” in MLSys, 2020.
- A. Capotondi, M. Rusci, M. Fariselli, and L. Benini, “Cmix-nn: Mixed low-precision cnn library for memory-constrained edge devices,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 67, no. 5, pp. 871–875, 2020.
- R. David, J. Duke, A. Jain, V. Janapa Reddi, N. Jeffries, J. Li, N. Kreeger, I. Nappier, M. Natraj, T. Wang, P. Warden, and R. Rhodes, “Tensorflow lite micro: Embedded machine learning for tinyml systems,” in Proceedings of Machine Learning and Systems, vol. 3, 2021, pp. 800–811.
- C. Banbury, C. Zhou, I. Fedorov, R. Matas, U. Thakker, D. Gope, V. Janapa Reddi, M. Mattina, and P. Whatmough, “Micronets: Neural network architectures for deploying tinyml applications on commodity microcontrollers,” Proceedings of Machine Learning and Systems, vol. 3, 2021.
- S. Sadiq, J. Hare, P. Maji, S. Craske, and G. V. Merrett, “Tinyops: Imagenet scale deep learning on microcontrollers,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2022, pp. 2701–2705.
- “Tinymaix,” https://github.com/sipeed/TinyMaix.
- I. Fedorov, R. Matas, H. Tann, C. Zhou, M. Mattina, and P. Whatmough, “UDC: Unified DNAS for compressible tinyML models for neural processing units,” in Advances in Neural Information Processing Systems, 2022.
- H. Cai, C. Gan, L. Zhu, and S. Han, “Tinytl: Reduce activations, not trainable parameters for efficient on-device learning,” arXiv preprint arXiv:2007.11622, 2020.
- H. Ren, D. Anicic, and T. A. Runkler, “Tinyol: Tinyml with online-learning on microcontrollers,” in 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 2021, pp. 1–8.
- S. G. Patil, P. Jain, P. Dutta, I. Stoica, and J. Gonzalez, “Poet: Training neural networks on tiny devices with integrated rematerialization and paging,” in International Conference on Machine Learning. PMLR, 2022, pp. 17 573–17 583.
- C. Profentzas, M. Almgren, and O. Landsiedel, “Minilearn: On-device learning for low-power iot devices,” in Proceedings of the 2022 International Conference on Embedded Wireless Systems and Networks (Linz, Austria)(EWSN’22). Junction Publishing, USA, 2022.
- J. Lin, L. Zhu, W.-M. Chen, W.-C. Wang, C. Gan, and S. Han, “On-device training under 256kb memory,” 2022.
- S. Han, J. Pool, J. Tran, and W. Dally, “Learning both weights and connections for efficient neural network,” in NeurIPS, 2015.
- Y. He, X. Zhang, and J. Sun, “Channel pruning for accelerating very deep neural networks,” in ICCV, 2017.
- J. Lin, Y. Rao, J. Lu, and J. Zhou, “Runtime neural pruning,” in NeurIPS, 2017.
- Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan, and C. Zhang, “Learning efficient convolutional networks through network slimming,” in ICCV, 2017.
- Y. He, J. Lin, Z. Liu, H. Wang, L.-J. Li, and S. Han, “Amc: Automl for model compression and acceleration on mobile devices,” in ECCV, 2018.
- Z. Liu, H. Mu, X. Zhang, Z. Guo, X. Yang, K.-T. Cheng, and J. Sun, “MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning,” in ICCV, 2019.
- S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” in ICLR, 2016.
- C. Zhu, S. Han, H. Mao, and W. J. Dally, “Trained ternary quantization,” in ICLR, 2017.
- M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “Xnor-net: Imagenet classification using binary convolutional neural networks,” in ECCV, 2016.
- S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, and Y. Zou, “Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients,” arXiv preprint arXiv:1606.06160, 2016.
- M. Courbariaux and Y. Bengio, “Binarynet: Training deep neural networks with weights and activations constrained to+ 1 or-1,” arXiv preprint arXiv:1602.02830, 2016.
- J. Choi, Z. Wang, S. Venkataramani, P. I.-J. Chuang, V. Srinivasan, and K. Gopalakrishnan, “Pact: Parameterized clipping activation for quantized neural networks,” arXiv preprint arXiv:1805.06085, 2018.
- K. Wang, Z. Liu, Y. Lin, J. Lin, and S. Han, “HAQ: Hardware-Aware Automated Quantization with Mixed Precision,” in CVPR, 2019.
- H. F. Langroudi, V. Karia, T. Pandit, and D. Kudithipudi, “Tent: Efficient quantization of neural networks on the tiny edge with tapered fixed point,” arXiv preprint arXiv:2104.02233, 2021.
- V. Lebedev, Y. Ganin, M. Rakhuba, I. Oseledets, and V. Lempitsky, “Speeding-up convolutional neural networks using fine-tuned cp-decomposition,” arXiv preprint arXiv:1412.6553, 2014.
- Y. Gong, L. Liu, M. Yang, and L. Bourdev, “Compressing deep convolutional networks using vector quantization,” arXiv preprint arXiv:1412.6115, 2014.
- Y.-D. Kim, E. Park, S. Yoo, T. Choi, L. Yang, and D. Shin, “Compression of deep convolutional neural networks for fast and low power mobile applications,” arXiv preprint arXiv:1511.06530, 2015.
- G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015.
- W. Park, D. Kim, Y. Lu, and M. Cho, “Relational knowledge distillation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 3967–3976.
- F. Tung and G. Mori, “Similarity-preserving knowledge distillation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 1365–1374.
- S. I. Mirzadeh, M. Farajtabar, A. Li, N. Levine, A. Matsukawa, and H. Ghasemzadeh, “Improved knowledge distillation via teacher assistant,” in Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 04, 2020, pp. 5191–5198.
- L. Wang and K.-J. Yoon, “Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
- Z. Yang, Z. Li, X. Jiang, Y. Gong, Z. Yuan, D. Zhao, and C. Yuan, “Focal and global knowledge distillation for detectors,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4643–4652.
- B. Zhao, Q. Cui, R. Song, Y. Qiu, and J. Liang, “Decoupled knowledge distillation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 11 953–11 962.
- L. Beyer, X. Zhai, A. Royer, L. Markeeva, R. Anil, and A. Kolesnikov, “Knowledge distillation: A good teacher is patient and consistent,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 10 925–10 934.
- B. Zoph and Q. V. Le, “Neural architecture search with reinforcement learning,” in ICLR, 2017.
- B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, “Learning transferable architectures for scalable image recognition,” in CVPR, 2018.
- H. Liu, K. Simonyan, and Y. Yang, “Darts: Differentiable architecture search,” in ICLR, 2019.
- H. Cai, L. Zhu, and S. Han, “ProxylessNAS: Direct neural architecture search on target task and hardware,” in ICLR, 2019. [Online]. Available: https://arxiv.org/pdf/1812.00332.pdf
- M. Tan, B. Chen, R. Pang, V. Vasudevan, M. Sandler, A. Howard, and Q. V. Le, “Mnasnet: Platform-aware neural architecture search for mobile,” in CVPR, 2019.
- B. Wu, X. Dai, P. Zhang, Y. Wang, F. Sun, Y. Wu, Y. Tian, P. Vajda, Y. Jia, and K. Keutzer, “Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search,” in CVPR, 2019.
- I. Radosavovic, R. P. Kosaraju, R. Girshick, K. He, and P. Dollár, “Designing network design spaces,” arXiv preprint arXiv:2003.13678, 2020.
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems, vol. 32, 2019.
- M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard et al., “Tensorflow: A system for large-scale machine learning,” in OSDI, 2016.
- T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, and Z. Zhang, “Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems,” arXiv preprint arXiv:1512.01274, 2015.
- J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. VanderPlas, S. Wanderman-Milne, and Q. Zhang, “JAX: composable transformations of Python+NumPy programs,” 2018. [Online]. Available: http://github.com/google/jax
- T. Chen, T. Moreau, Z. Jiang, L. Zheng, E. Yan, H. Shen, M. Cowan, L. Wang, Y. Hu, L. Ceze et al., “{{\{{TVM}}\}}: An automated end-to-end optimizing compiler for deep learning,” in OSDI, 2018.
- “Tensorflow lite,” https://www.tensorflow.org/lite.
- X. Jiang, H. Wang, Y. Chen, Z. Wu, L. Wang, B. Zou, Y. Yang, Z. Cui, Y. Cai, T. Yu, C. Lv, and Z. Wu, “Mnn: A universal and efficient inference engine,” in MLSys, 2020.
- “Ncnn : A high-performance neural network inference computing framework optimized for mobile platforms,” https://github.com/Tencent/ncnn.
- “Nvidia tensorrt, an sdk for high-performance deep learning inference,” https://developer.nvidia.com/tensorrt.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in NeurIPS, 2017.
- T. Chen, L. Zheng, E. Yan, Z. Jiang, T. Moreau, L. Ceze, C. Guestrin, and A. Krishnamurthy, “Learning to optimize tensor programs,” in NeurIPS, 2018.
- A. Stoutchinin, F. Conti, and L. Benini, “Optimally scheduling cnn convolutions for efficient memory access,” arXiv preprint arXiv:1902.01492, 2019.
- B. H. Ahn, J. Lee, J. M. Lin, H.-P. Cheng, J. Hou, and H. Esmaeilzadeh, “Ordering chaos: Memory-aware scheduling of irregularly wired neural networks for edge devices,” arXiv preprint arXiv:2003.02369, 2020.
- H. Miao and F. X. Lin, “Enabling large neural networks on tiny microcontrollers with swapping,” arXiv preprint arXiv:2101.08744, 2021.
- M. Alwani, H. Chen, M. Ferdman, and P. Milder, “Fused-layer cnn accelerators,” in 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2016, pp. 1–12.
- K. Goetschalckx and M. Verhelst, “Breaking high-resolution cnn bandwidth barriers with enhanced depth-first execution,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 9, no. 2, pp. 323–331, 2019.
- O. Saha, A. Kusupati, H. V. Simhadri, M. Varma, and P. Jain, “Rnnpool: Efficient non-linear pooling for ram constrained inference,” arXiv preprint arXiv:2002.11921, 2020.
- M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International Conferences on Machine Learning (ICML). PMLR, 2019, pp. 6105–6114.
- M. Tan and Q. V. Le, “Efficientnetv2: Smaller models and faster training,” CoRR, vol. abs/2104.00298, 2021. [Online]. Available: https://arxiv.org/abs/2104.00298
- A. Gruslys, R. Munos, I. Danihelka, M. Lanctot, and A. Graves, “Memory-efficient backpropagation through time,” in NeurIPS, 2016, p. 4132–4140.
- T. Chen, B. Xu, C. Zhang, and C. Guestrin, “Training deep nets with sublinear memory cost,” arXiv preprint arXiv:1604.06174, 2016.
- K. Greff, R. K. Srivastava, and J. Schmidhuber, “Highway and residual networks learn unrolled iterative estimation,” in ICLR, 2017. [Online]. Available: https://arxiv.org/pdf/1604.06174.pdf
- L. Liu, L. Deng, X. Hu, M. Zhu, G. Li, Y. Ding, and Y. Xie, “Dynamic sparse graph for efficient deep learning,” in ICLR, 2019.
- Y. Wang, Z. Jiang, X. Chen, P. Xu, Y. Zhao, Y. Lin, and Z. Wang, “E2-train: Training state-of-the-art cnns with over 80% energy savings,” 2019. [Online]. Available: https://arxiv.org/abs/1910.13349
- N. Wang, J. Choi, D. Brand, C.-Y. Chen, and K. Gopalakrishnan, “Training deep neural networks with 8-bit floating point numbers,” in NeurIPS, 2018.
- X. Sun, J. Choi, C.-Y. Chen, N. Wang, S. Venkataramani, V. V. Srinivasan, X. Cui, W. Zhang, and K. Gopalakrishnan, “Hybrid 8-bit floating point (hfp8) training and inference for deep neural networks,” in NeurIPS, 2019.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems (NIPS), 2012.
- Y. Cui, Y. Song, C. Sun, A. Howard, and S. Belongie, “Large scale fine-grained categorization and domain-specific transfer learning,” in CVPR, 2018.
- S. Kornblith, J. Shlens, and Q. V. Le, “Do better imagenet models transfer better?” in CVPR, 2019.
- A. Kolesnikov, L. Beyer, X. Zhai, J. Puigcerver, J. Yung, S. Gelly, and N. Houlsby, “Big transfer (bit): General visual representation learning,” in European conference on computer vision. Springer, 2020, pp. 491–507.
- K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman, “Return of the devil in the details: Delving deep into convolutional nets,” in BMVC, 2014.
- J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell, “Decaf: A deep convolutional activation feature for generic visual recognition,” in International Conferences on Machine Learning (ICML), 2014.
- C. Gan, N. Wang, Y. Yang, D.-Y. Yeung, and A. G. Hauptmann, “DevNet: a deep event network for multimedia event detection and evidence recounting,” in CVPR, 2015, pp. 2568–2577.
- A. Sharif Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, “Cnn features off-the-shelf: an astounding baseline for recognition,” in CVPR Workshops, 2014.
- Q. Wang, M. Xu, C. Jin, X. Dong, J. Yuan, X. Jin, G. Huang, Y. Liu, and X. Liu, “Melon: Breaking the memory wall for resource-efficient on-device machine learning,” in Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services, ser. MobiSys ’22, 2022, p. 450–463.
- D. Xu, M. Xu, Q. Wang, S. Wang, Y. Ma, K. Huang, G. Huang, X. Jin, and X. Liu, “Mandheling: Mixed-precision on-device dnn training with dsp offloading,” in Proceedings of the 28th Annual International Conference on Mobile Computing And Networking, ser. MobiCom ’22, 2022, p. 214–227.
- I. Gim and J. Ko, “Memory-efficient dnn training on mobile devices,” in Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services, ser. MobiSys ’22, 2022, p. 464–476.
- J. Frankle, D. J. Schwab, and A. S. Morcos, “Training batchnorm and only batchnorm: On the expressive power of random features in cnns,” arXiv preprint arXiv:2003.00152, 2020.
- P. K. Mudrakarta, M. Sandler, A. Zhmoginov, and A. Howard, “K for the price of 1: Parameter efficient multi-task and transfer learning,” in ICLR, 2019.
- A. Canziani, A. Paszke, and E. Culurciello, “An analysis of deep neural network models for practical applications,” arXiv preprint arXiv:1605.07678, 2016.
- H. Cai, C. Gan, T. Wang, Z. Zhang, and S. Han, “Once for all: Train one network and specialize it for efficient deployment,” in ICLR, 2020. [Online]. Available: https://arxiv.org/pdf/1908.09791.pdf
- G. Bender, P.-J. Kindermans, B. Zoph, V. Vasudevan, and Q. Le, “Understanding and simplifying one-shot architecture search,” in International Conferences on Machine Learning (ICML), 2018.
- Z. Guo, X. Zhang, H. Mu, W. Heng, Z. Liu, Y. Wei, and J. Sun, “Single path one-shot neural architecture search with uniform sampling,” arXiv preprint arXiv:1904.00420, 2019.
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in International Conference on Learning Representations (ICLR), 2015.
- A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan et al., “Searching for mobilenetv3,” in ICCV, 2019.
- M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” International journal of computer vision, vol. 88, no. 2, pp. 303–338, 2010.
- J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement,” arXiv, 2018.
- S. Yang, P. Luo, C. C. Loy, and X. Tang, “Wider face: A face detection benchmark,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- Y. Yoo, D. Han, and S. Yun, “Extd: Extremely tiny face detector via iterative filter reuse,” arXiv preprint arXiv:1906.06579, 2019.
- Y. He, D. Xu, L. Wu, M. Jian, S. Xiang, and C. Pan, “Lffd: A light and fast face detector for edge devices,” arXiv preprint arXiv:1904.10633, 2019.
- X. Zhao, X. Liang, C. Zhao, M. Tang, and J. Wang, “Real-time multi-scale face detector on embedded devices,” Sensors, vol. 19, no. 9, p. 2158, 2019.
- S. Zhang, X. Zhu, Z. Lei, H. Shi, X. Wang, and S. Z. Li, “S3fd: Single shot scale-invariant face detector,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 192–201.
- A. Burrello, A. Garofalo, N. Bruschi, G. Tagliavini, D. Rossi, and F. Conti, “Dory: Automatic end-to-end deployment of real-world dnns on low-cost iot mcus,” IEEE Transactions on Computers, vol. 70, no. 8, pp. 1253–1268, 2021.
- E. Liberis, Ł. Dudziak, and N. D. Lane, “μ𝜇\muitalic_μnas: Constrained neural architecture search for microcontrollers,” arXiv preprint arXiv:2010.14246, 2020.
- I. Fedorov, R. P. Adams, M. Mattina, and P. Whatmough, “Sparse: Sparse architecture search for cnns on resource-constrained microcontrollers,” in NeurIPS, 2019.
- M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015, software available from tensorflow.org. [Online]. Available: http://tensorflow.org/
- S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International Conferences on Machine Learning (ICML), 2015.
- B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko, “Quantization and training of neural networks for efficient integer-arithmetic-only inference,” in CVPR, 2018, pp. 2704–2713.
- A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” Master’s thesis, Department of Computer Science, University of Toronto, 2009.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
- Y. You, I. Gitman, and B. Ginsburg, “Large batch training of convolutional networks,” arXiv preprint arXiv:1708.03888, 2017.
- J. Krause, M. Stark, J. Deng, and L. Fei-Fei, “3d object representations for fine-grained categorization,” in Proceedings of the IEEE International Conference on Computer Vision Workshops, 2013.
- Y. Sun, X. Wang, Z. Liu, J. Miller, A. Efros, and M. Hardt, “Test-time training with self-supervision for generalization under distribution shifts,” in International conference on machine learning. PMLR, 2020, pp. 9229–9248.
- T. Dettmers, M. Lewis, Y. Belkada, and L. Zettlemoyer, “Llm. int8 (): 8-bit matrix multiplication for transformers at scale,” arXiv preprint arXiv:2208.07339, 2022.
- G. Xiao, J. Lin, M. Seznec, J. Demouth, and S. Han, “Smoothquant: Accurate and efficient post-training quantization for large language models,” arXiv preprint arXiv:2211.10438, 2022.
- A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H. W. Chung, C. Sutton, S. Gehrmann, P. Schuh, K. Shi, S. Tsvyashchenko, J. Maynez, A. Rao, P. Barnes, Y. Tay, N. Shazeer, V. Prabhakaran, E. Reif, N. Du, B. Hutchinson, R. Pope, J. Bradbury, J. Austin, M. Isard, G. Gur-Ari, P. Yin, T. Duke, A. Levskaya, S. Ghemawat, S. Dev, H. Michalewski, X. Garcia, V. Misra, K. Robinson, L. Fedus, D. Zhou, D. Ippolito, D. Luan, H. Lim, B. Zoph, A. Spiridonov, R. Sepassi, D. Dohan, S. Agrawal, M. Omernick, A. M. Dai, T. S. Pillai, M. Pellat, A. Lewkowycz, E. Moreira, R. Child, O. Polozov, K. Lee, Z. Zhou, X. Wang, B. Saeta, M. Diaz, O. Firat, M. Catasta, J. Wei, K. Meier-Hellstern, D. Eck, J. Dean, S. Petrov, and N. Fiedel, “Palm: Scaling language modeling with pathways,” in Machine Learning and Systems (MLSys), 2022.
- Ji Lin (47 papers)
- Ligeng Zhu (22 papers)
- Wei-Ming Chen (25 papers)
- Wei-Chen Wang (11 papers)
- Song Han (155 papers)