On-Device Domain Learning for Keyword Spotting on Low-Power Extreme Edge Embedded Systems (2403.10549v1)
Abstract: Keyword spotting accuracy degrades when neural networks are exposed to noisy environments. On-site adaptation to previously unseen noise is crucial to recovering accuracy loss, and on-device learning is required to ensure that the adaptation process happens entirely on the edge device. In this work, we propose a fully on-device domain adaptation system achieving up to 14% accuracy gains over already-robust keyword spotting models. We enable on-device learning with less than 10 kB of memory, using only 100 labeled utterances to recover 5% accuracy after adapting to the complex speech noise. We demonstrate that domain adaptation can be achieved on ultra-low-power microcontrollers with as little as 806 mJ in only 14 s on always-on, battery-operated devices.
- C. Cioflan, L. Cavigelli, M. Rusci, M. De Prado, and L. Benini, “Towards on-device domain adaptation for noise-robust keyword spotting,” in 2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS), 2022, pp. 82–85.
- M. Rusci, M. Fariselli, M. Croome, F. Paci, and E. Flamand, “Accelerating rnn-based speech enhancement on a multi-core mcu with mixed fp16-int8 post-training quantization,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 2022, pp. 606–617.
- I. López-Espejo, Z.-H. Tan, and J. Jensen, “A novel loss function and training strategy for noise-robust keyword spotting,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 2254–2266, 2021.
- J. Chen and X. Ran, “Deep learning with edge computing: A review,” Proceedings of the IEEE, vol. 107, no. 8, pp. 1655–1674, 2019.
- T. Chen, T. Moreau, Z. Jiang, L. Zheng, E. Yan, M. Cowan, H. Shen, L. Wang, Y. Hu, L. Ceze, C. Guestrin, and A. Krishnamurthy, “Tvm: An automated end-to-end optimizing compiler for deep learning,” in Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation, ser. OSDI’18. USA: USENIX Association, 2018, p. 579–594.
- R. David, J. Duke, A. Jain, V. Janapa Reddi, N. Jeffries, J. Li, N. Kreeger, I. Nappier, M. Natraj, T. Wang, P. Warden, and R. Rhodes, “Tensorflow lite micro: Embedded machine learning for tinyml systems,” in Proceedings of Machine Learning and Systems, A. Smola, A. Dimakis, and I. Stoica, Eds., vol. 3, 2021, pp. 800–811. [Online]. Available: https://proceedings.mlsys.org/paper_files/paper/2021/file/6c44dc73014d66ba49b28d483a8f8b0d-Paper.pdf
- A. Burrello, A. Garofalo, N. Bruschi, G. Tagliavini, D. Rossi, and F. Conti, “Dory: Automatic end-to-end deployment of real-world dnns on low-cost iot mcus,” IEEE Transactions on Computers, pp. 1–1, 2021.
- H. Ren, D. Anicic, and T. A. Runkler, “Tinyol: Tinyml with online-learning on microcontrollers,” in 2021 International Joint Conference on Neural Networks (IJCNN), 2021, pp. 1–8.
- J. Lin, L. Zhu, W.-M. Chen, W.-C. Wang, C. Gan, and S. Han, “On-device training under 256kb memory,” in Annual Conference on Neural Information Processing Systems (NeurIPS), 2022.
- D. Nadalini, M. Rusci, G. Tagliavini, L. Ravaglia, L. Benini, and F. Conti, “Pulp-trainlib: Enabling on-device training for risc-v multi-core mcus through performance-driven autotuning,” in Embedded Computer Systems: Architectures, Modeling, and Simulation, A. Orailoglu, M. Reichenbach, and M. Jung, Eds. Cham: Springer International Publishing, 2022, pp. 200–216.
- I. López-Espejo, Z.-H. Tan, J. H. L. Hansen, and J. Jensen, “Deep spoken keyword spotting: An overview,” IEEE Access, vol. 10, pp. 4169–4199, 2022.
- Y. Huang, T. Hughes, T. Z. Shabestary, and T. Applebaum, “Supervised noise reduction for multichannel keyword spotting,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 5474–5478.
- Y. A. Huang, T. Z. Shabestary, and A. Gruenstein, “Hotword cleaner: Dual-microphone adaptive noise cancellation with deferred filter coefficients for robust keyword spotting,” in ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 6346–6350.
- M. Jung, Y. Jung, J. Goo, and H. Kim, “Multi-task network for noise-robust keyword spotting and speaker verification using ctc-based soft vad and global query attention,” in Interspeech, 10 2020, pp. 931–935.
- D. Ng, J. Q. Yip, T. Surana, Z. Yang, C. Zhang, Y. Ma, C. Ni, E. S. Chng, and B. Ma, “I2cr: Improving noise robustness on keyword spotting using inter-intra contrastive regularization,” in 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2022, pp. 605–611.
- D. Ng, Y. Chen, B. Tian, Q. Fu, and E. S. Chng, “Convmixer: Feature interactive convolution with curriculum learning for small footprint and noisy far-field keyword spotting,” in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022. IEEE, 2022, pp. 3603–3607. [Online]. Available: https://doi.org/10.1109/ICASSP43922.2022.9747025
- C. Jagmohan, Y. D. Kwon, and C. Mascolo, “Exploring on-device learning using few shots for audio classification,” in Proc. IEEE EUSIPCO, 08 2022, pp. 424–428.
- S. Disabato and M. Roveri, “Incremental on-device tiny machine learning,” in Proceedings of the 2nd International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things, ser. AIChallengeIoT ’20. New York, NY, USA: Association for Computing Machinery, 2020, p. 7–13. [Online]. Available: https://doi.org/10.1145/3417313.3429378
- C. Profentzas, M. Almgren, and O. Landsiedel, “Minilearn: On-device learning for low-power iot devices,” in Proc. ACM Int. Conf. on Embedded Wireless Systems and Networks, ser. EWSN ’22. New York, NY, USA: Association for Computing Machinery, 2023, p. 1–11.
- M. Rusci and T. Tuytelaars, “On-device customization of tiny deep learning models for keyword spotting with few examples,” IEEE Micro, vol. 43, no. 06, pp. 50–57, nov 2023.
- C. Banbury, V. J. Reddi, P. Torelli, J. Holleman, N. Jeffries, C. Kiraly, P. Montino, D. Kanter, S. Ahmed, D. Pau et al., “Mlperf tiny benchmark,” Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, 2021.
- Y. Zhang, N. Suda, L. Lai, and V. Chandra, “Hello edge: Keyword spotting on microcontrollers,” 2017. [Online]. Available: https://arxiv.org/abs/1711.07128
- Y. Lu, W. Shan, and J. Xu, “A depthwise separable convolution neural network for small-footprint keyword spotting using approximate mac unit and streaming convolution reuse,” in 2019 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), 2019, pp. 309–312.
- P. Sørensen, B. Epp, and T. May, “A depthwise separable convolutional neural network for keyword spotting on an embedded system,” EURASIP Journal on Audio, Speech, and Music Processing, vol. 2020, 06 2020.
- P. Warden, “Speech commands: A dataset for limited-vocabulary speech recognition,” 2018. [Online]. Available: https://arxiv.org/abs/1804.03209
- J. Thiemann, N. Ito, and E. Vincent, “DEMAND: a collection of multi-channel recordings of acoustic noise in diverse environments,” Jun. 2013, Supported by Inria under the Associate Team Program VERSAMUS. [Online]. Available: https://doi.org/10.5281/zenodo.1227121
- M. Mazumder, S. Chitlangia, C. Banbury, Y. Kang, J. M. Ciro, K. Achorn, D. Galvez, M. Sabini, P. Mattson, D. Kanter, G. Diamos, P. Warden, J. Meyer, and V. J. Reddi, “Multilingual spoken words corpus,” in Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021. [Online]. Available: https://openreview.net/forum?id=c20jiJ5K2H