Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On-Device Domain Learning for Keyword Spotting on Low-Power Extreme Edge Embedded Systems (2403.10549v1)

Published 12 Mar 2024 in cs.SD, cs.LG, and eess.AS

Abstract: Keyword spotting accuracy degrades when neural networks are exposed to noisy environments. On-site adaptation to previously unseen noise is crucial to recovering accuracy loss, and on-device learning is required to ensure that the adaptation process happens entirely on the edge device. In this work, we propose a fully on-device domain adaptation system achieving up to 14% accuracy gains over already-robust keyword spotting models. We enable on-device learning with less than 10 kB of memory, using only 100 labeled utterances to recover 5% accuracy after adapting to the complex speech noise. We demonstrate that domain adaptation can be achieved on ultra-low-power microcontrollers with as little as 806 mJ in only 14 s on always-on, battery-operated devices.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. C. Cioflan, L. Cavigelli, M. Rusci, M. De Prado, and L. Benini, “Towards on-device domain adaptation for noise-robust keyword spotting,” in 2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS), 2022, pp. 82–85.
  2. M. Rusci, M. Fariselli, M. Croome, F. Paci, and E. Flamand, “Accelerating rnn-based speech enhancement on a multi-core mcu with mixed fp16-int8 post-training quantization,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases.   Springer, 2022, pp. 606–617.
  3. I. López-Espejo, Z.-H. Tan, and J. Jensen, “A novel loss function and training strategy for noise-robust keyword spotting,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 2254–2266, 2021.
  4. J. Chen and X. Ran, “Deep learning with edge computing: A review,” Proceedings of the IEEE, vol. 107, no. 8, pp. 1655–1674, 2019.
  5. T. Chen, T. Moreau, Z. Jiang, L. Zheng, E. Yan, M. Cowan, H. Shen, L. Wang, Y. Hu, L. Ceze, C. Guestrin, and A. Krishnamurthy, “Tvm: An automated end-to-end optimizing compiler for deep learning,” in Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation, ser. OSDI’18.   USA: USENIX Association, 2018, p. 579–594.
  6. R. David, J. Duke, A. Jain, V. Janapa Reddi, N. Jeffries, J. Li, N. Kreeger, I. Nappier, M. Natraj, T. Wang, P. Warden, and R. Rhodes, “Tensorflow lite micro: Embedded machine learning for tinyml systems,” in Proceedings of Machine Learning and Systems, A. Smola, A. Dimakis, and I. Stoica, Eds., vol. 3, 2021, pp. 800–811. [Online]. Available: https://proceedings.mlsys.org/paper_files/paper/2021/file/6c44dc73014d66ba49b28d483a8f8b0d-Paper.pdf
  7. A. Burrello, A. Garofalo, N. Bruschi, G. Tagliavini, D. Rossi, and F. Conti, “Dory: Automatic end-to-end deployment of real-world dnns on low-cost iot mcus,” IEEE Transactions on Computers, pp. 1–1, 2021.
  8. H. Ren, D. Anicic, and T. A. Runkler, “Tinyol: Tinyml with online-learning on microcontrollers,” in 2021 International Joint Conference on Neural Networks (IJCNN), 2021, pp. 1–8.
  9. J. Lin, L. Zhu, W.-M. Chen, W.-C. Wang, C. Gan, and S. Han, “On-device training under 256kb memory,” in Annual Conference on Neural Information Processing Systems (NeurIPS), 2022.
  10. D. Nadalini, M. Rusci, G. Tagliavini, L. Ravaglia, L. Benini, and F. Conti, “Pulp-trainlib: Enabling on-device training for risc-v multi-core mcus through performance-driven autotuning,” in Embedded Computer Systems: Architectures, Modeling, and Simulation, A. Orailoglu, M. Reichenbach, and M. Jung, Eds.   Cham: Springer International Publishing, 2022, pp. 200–216.
  11. I. López-Espejo, Z.-H. Tan, J. H. L. Hansen, and J. Jensen, “Deep spoken keyword spotting: An overview,” IEEE Access, vol. 10, pp. 4169–4199, 2022.
  12. Y. Huang, T. Hughes, T. Z. Shabestary, and T. Applebaum, “Supervised noise reduction for multichannel keyword spotting,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 5474–5478.
  13. Y. A. Huang, T. Z. Shabestary, and A. Gruenstein, “Hotword cleaner: Dual-microphone adaptive noise cancellation with deferred filter coefficients for robust keyword spotting,” in ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 6346–6350.
  14. M. Jung, Y. Jung, J. Goo, and H. Kim, “Multi-task network for noise-robust keyword spotting and speaker verification using ctc-based soft vad and global query attention,” in Interspeech, 10 2020, pp. 931–935.
  15. D. Ng, J. Q. Yip, T. Surana, Z. Yang, C. Zhang, Y. Ma, C. Ni, E. S. Chng, and B. Ma, “I2cr: Improving noise robustness on keyword spotting using inter-intra contrastive regularization,” in 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2022, pp. 605–611.
  16. D. Ng, Y. Chen, B. Tian, Q. Fu, and E. S. Chng, “Convmixer: Feature interactive convolution with curriculum learning for small footprint and noisy far-field keyword spotting,” in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022.   IEEE, 2022, pp. 3603–3607. [Online]. Available: https://doi.org/10.1109/ICASSP43922.2022.9747025
  17. C. Jagmohan, Y. D. Kwon, and C. Mascolo, “Exploring on-device learning using few shots for audio classification,” in Proc. IEEE EUSIPCO, 08 2022, pp. 424–428.
  18. S. Disabato and M. Roveri, “Incremental on-device tiny machine learning,” in Proceedings of the 2nd International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things, ser. AIChallengeIoT ’20.   New York, NY, USA: Association for Computing Machinery, 2020, p. 7–13. [Online]. Available: https://doi.org/10.1145/3417313.3429378
  19. C. Profentzas, M. Almgren, and O. Landsiedel, “Minilearn: On-device learning for low-power iot devices,” in Proc. ACM Int. Conf. on Embedded Wireless Systems and Networks, ser. EWSN ’22.   New York, NY, USA: Association for Computing Machinery, 2023, p. 1–11.
  20. M. Rusci and T. Tuytelaars, “On-device customization of tiny deep learning models for keyword spotting with few examples,” IEEE Micro, vol. 43, no. 06, pp. 50–57, nov 2023.
  21. C. Banbury, V. J. Reddi, P. Torelli, J. Holleman, N. Jeffries, C. Kiraly, P. Montino, D. Kanter, S. Ahmed, D. Pau et al., “Mlperf tiny benchmark,” Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, 2021.
  22. Y. Zhang, N. Suda, L. Lai, and V. Chandra, “Hello edge: Keyword spotting on microcontrollers,” 2017. [Online]. Available: https://arxiv.org/abs/1711.07128
  23. Y. Lu, W. Shan, and J. Xu, “A depthwise separable convolution neural network for small-footprint keyword spotting using approximate mac unit and streaming convolution reuse,” in 2019 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), 2019, pp. 309–312.
  24. P. Sørensen, B. Epp, and T. May, “A depthwise separable convolutional neural network for keyword spotting on an embedded system,” EURASIP Journal on Audio, Speech, and Music Processing, vol. 2020, 06 2020.
  25. P. Warden, “Speech commands: A dataset for limited-vocabulary speech recognition,” 2018. [Online]. Available: https://arxiv.org/abs/1804.03209
  26. J. Thiemann, N. Ito, and E. Vincent, “DEMAND: a collection of multi-channel recordings of acoustic noise in diverse environments,” Jun. 2013, Supported by Inria under the Associate Team Program VERSAMUS. [Online]. Available: https://doi.org/10.5281/zenodo.1227121
  27. M. Mazumder, S. Chitlangia, C. Banbury, Y. Kang, J. M. Ciro, K. Achorn, D. Galvez, M. Sabini, P. Mattson, D. Kanter, G. Diamos, P. Warden, J. Meyer, and V. J. Reddi, “Multilingual spoken words corpus,” in Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021. [Online]. Available: https://openreview.net/forum?id=c20jiJ5K2H
Citations (3)

Summary

We haven't generated a summary for this paper yet.