Papers
Topics
Authors
Recent
2000 character limit reached

Reassessing Noise Augmentation Methods in the Context of Adversarial Speech (2409.01813v3)

Published 3 Sep 2024 in eess.AS, cs.LG, and cs.SD

Abstract: In this study, we investigate if noise-augmented training can concurrently improve adversarial robustness in automatic speech recognition (ASR) systems. We conduct a comparative analysis of the adversarial robustness of four different state-of-the-art ASR architectures, where each of the ASR architectures is trained under three different augmentation conditions: one subject to background noise, speed variations, and reverberations, another subject to speed variations only, and a third without any form of data augmentation. The results demonstrate that noise augmentation not only improves model performance on noisy speech but also the model's robustness to adversarial attacks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” arXiv preprint arXiv:1312.6199, 2013.
  2. N. Carlini and D. Wagner, “Audio adversarial examples: Targeted attacks on speech-to-text,” in 2018 IEEE security and privacy workshops (SPW).   IEEE, 2018, pp. 1–7.
  3. Y. Qin, N. Carlini, G. Cottrell, I. Goodfellow, and C. Raffel, “Imperceptible, robust, and targeted adversarial examples for automatic speech recognition,” in International conference on machine learning.   PMLR, 2019, pp. 5231–5240.
  4. L. Schönherr, K. Kohls, S. Zeiler, T. Holz, and D. Kolossa, “Adversarial attacks against automatic speech recognition systems via psychoacoustic hiding,” Network and Distributed System Security Symposium (NDSS), 2019.
  5. R. F. Olivier, “Assessing and enhancing adversarial robustness in context and applications to speech security,” Ph.D. dissertation, Carnegie Mellon University, 2023.
  6. M. Dua, Akanksha, and S. Dua, “Noise robust automatic speech recognition: review and analysis,” International Journal of Speech Technology, vol. 26, no. 2, pp. 475–519, 2023.
  7. T. Ko, V. Peddinti, D. Povey, M. L. Seltzer, and S. Khudanpur, “A study on data augmentation of reverberant speech for robust speech recognition,” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).   IEEE, 2017, pp. 5220–5224.
  8. R. Olivier and B. Raj, “There is more than one kind of robustness: Fooling Whisper with adversarial examples,” in Proc. INTERSPEECH 2023, 2023, pp. 4394–4398.
  9. A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, R. Prenger, S. Satheesh, S. Sengupta, A. Coates et al., “Deep speech: Scaling up end-to-end speech recognition,” arXiv preprint arXiv:1412.5567, 2014.
  10. A. Abe, K. Yamamoto, and S. Nakagawa, “Robust speech recognition using DNN-HMM acoustic model combining noise-aware training with spectral subtraction,” in Proc. Interspeech 2015, 2015, pp. 2849–2853.
  11. D. Stoller, S. Ewert, and S. Dixon, “Wave-u-net: A multi-scale neural network for end-to-end audio source separation,” in Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, Paris, France, September 23-27, 2018, E. Gómez, X. Hu, E. Humphrey, and E. Benetos, Eds., 2018, pp. 334–340.
  12. S. Bagheri and D. Giacobello, “Exploiting Multi-Channel Speech Presence Probability in Parametric Multi-Channel Wiener Filter,” in Proc. Interspeech 2019, 2019, pp. 101–105.
  13. N. Alamdari, A. Azarang, and N. Kehtarnavaz, “Improving deep speech denoising by noisy2noisy signal mapping,” Applied Acoustics, vol. 172, p. 107631, 2021.
  14. M. Ravanelli, T. Parcollet, P. Plantinga, A. Rouhe, S. Cornell, L. Lugosch, C. Subakan, N. Dawalatabad, A. Heba, J. Zhong et al., “Speechbrain: A general-purpose speech toolkit,” arXiv preprint arXiv:2106.04624, 2021.
  15. I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572, 2014.
  16. A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” 6th International Conference on Learning Representations (ICLR), 2018.
  17. N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami, “Distillation as a defense to adversarial perturbations against deep neural networks,” in 2016 IEEE symposium on security and privacy (SP).   IEEE, 2016, pp. 582–597.
  18. F. Tramèr, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, and P. McDaniel, “Ensemble adversarial training: Attacks and defenses,” 6th International Conference on Learning Representations (ICLR), 2018.
  19. D. Jakubovitz and R. Giryes, “Improving DNN robustness to adversarial attacks using jacobian regularization,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 514–529.
  20. E. Wong and Z. Kolter, “Provable defenses against adversarial examples via the convex outer adversarial polytope,” in International conference on machine learning.   PMLR, 2018, pp. 5286–5295.
  21. C. Xie, Y. Wu, L. v. d. Maaten, A. L. Yuille, and K. He, “Feature denoising for improving adversarial robustness,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 501–509.
  22. W. Xu, D. Evans, and Y. Qi, “Feature squeezing: Detecting adversarial examples in deep neural networks,” Network and Distributed Systems Security Symposium (NDSS), 2018.
  23. M. Pizarro, D. Kolossa, and A. Fischer, “Robustifying automatic speech recognition by extracting slowly varying features,” in Proc. 2021 ISCA Symposium on Security and Privacy in Speech Communication, 2021, pp. 37–41.
  24. H. Zhang, H. Chen, Z. Song, D. Boning, I. S. Dhillon, and C.-J. Hsieh, “The limitations of adversarial training and the blind-spot attack,” 7th International Conference on Learning Representations (ICLR), 2019.
  25. A. Fawzi, S.-M. Moosavi-Dezfooli, and P. Frossard, “Robustness of classifiers: from adversarial to random noise,” Advances in neural information processing systems, vol. 29, 2016.
  26. A. Laugros, A. Caplier, and M. Ospici, “Are adversarial robustness and common perturbation robustness independant attributes?” in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2019.
  27. J. Gilmer, N. Ford, N. Carlini, and E. Cubuk, “Adversarial examples are a natural consequence of test error in noise,” in International Conference on Machine Learning.   PMLR, 2019, pp. 2280–2289.
  28. B. Li, C. Chen, W. Wang, and L. Carin, “Certified adversarial robustness with additive noise,” Advances in neural information processing systems, vol. 32, 2019.
  29. P. Żelasko, S. Joshi, Y. Shao, J. Villalba, J. Trmal, N. Dehak, and S. Khudanpur, “Adversarial attacks and defenses for speech recognition systems,” arXiv preprint arXiv:2103.17122, 2021.
  30. R. Olivier and B. Raj, “Recent improvements of ASR models in the face of adversarial attacks,” in Proc. Interspeech 2022, 2022, pp. 4113–4117.
  31. M. P. Pizarro B., D. Kolossa, and A. Fischer, “Distriblock: Identifying adversarial audio samples by leveraging characteristics of the output distribution,” in The 40th Conference on Uncertainty in Artificial Intelligence, 2024.
  32. R. Prabhavalkar, T. Hori, T. N. Sainath, R. Schlüter, and S. Watanabe, “End-to-end speech recognition: A survey,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 325–351, 2024.
  33. A. Baevski, H. Zhou, A. Mohamed, and M. Auli, “Wav2vec 2.0: A framework for self-supervised learning of speech representations,” in Proceedings of the 34th International Conference on Neural Information Processing Systems, ser. NIPS’20.   Red Hook, NY, USA: Curran Associates Inc., 2020.
  34. A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, “Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks,” in Proceedings of the 23rd International Conference on Machine Learning, ser. ICML ’06.   New York, NY, USA: Association for Computing Machinery, 2006, p. 369–376.
  35. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, p. 1735–1780, nov 1997.
  36. J. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, and Y. Bengio, “Attention-based models for speech recognition,” in Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1, ser. NIPS’15.   Cambridge, MA, USA: MIT Press, 2015, p. 577–585.
  37. W. Chan, N. Jaitly, Q. Le, and O. Vinyals, “Listen, attend and spell: A neural network for large vocabulary conversational speech recognition,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 4960–4964.
  38. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30.   Curran Associates, Inc., 2017.
  39. T. Ko, V. Peddinti, D. Povey, and S. Khudanpur, “Audio augmentation for speech recognition.” in Interspeech, vol. 2015, 2015, p. 3586.
  40. D. S. Park, W. Chan, Y. Zhang, C.-C. Chiu, B. Zoph, E. D. Cubuk, and Q. V. Le, “Specaugment: A simple data augmentation method for automatic speech recognition,” arXiv preprint arXiv:1904.08779, 2019.
  41. D. S. Park, Y. Zhang, C.-C. Chiu, Y. Chen, B. Li, W. Chan, Q. V. Le, and Y. Wu, “Specaugment on large scale datasets,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).   IEEE, 2020, pp. 6879–6883.
  42. V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, “Librispeech: an ASR corpus based on public domain audio books,” in 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP).   IEEE, 2015, pp. 5206–5210.
  43. P. Mermelstein, “Evaluation of a segmental SNR measure as an indicator of the quality of ADPCM coded speech,” The Journal of the Acoustical Society of America, vol. 66, no. 6, pp. 1664–1667, 12 1979.
  44. K. Markert, D. Mirdita, and K. Böttinger, “Language dependencies in adversarial attacks on speech recognition systems,” in Proc. 2021 ISCA Symposium on Security and Privacy in Speech Communication, 2021, pp. 25–31.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 3 tweets with 1 like about this paper.