Privacy-preserving and Privacy-attacking Approaches for Speech and Audio -- A Survey (2309.15087v1)
Abstract: In contemporary society, voice-controlled devices, such as smartphones and home assistants, have become pervasive due to their advanced capabilities and functionality. The always-on nature of their microphones offers users the convenience of readily accessing these devices. However, recent research and events have revealed that such voice-controlled devices are prone to various forms of malicious attacks, hence making it a growing concern for both users and researchers to safeguard against such attacks. Despite the numerous studies that have investigated adversarial attacks and privacy preservation for images, a conclusive study of this nature has not been conducted for the audio domain. Therefore, this paper aims to examine existing approaches for privacy-preserving and privacy-attacking strategies for audio and speech. To achieve this goal, we classify the attack and defense scenarios into several categories and provide detailed analysis of each approach. We also interpret the dissimilarities between the various approaches, highlight their contributions, and examine their limitations. Our investigation reveals that voice-controlled devices based on neural networks are inherently susceptible to specific types of attacks. Although it is possible to enhance the robustness of such models to certain forms of attack, more sophisticated approaches are required to comprehensively safeguard user privacy.
- Practical hidden voice attacks against speech and speaker recognition systems. arXiv preprint arXiv:1904.05734 (2019).
- Hear” No Evil”, See” Kenansville”: Efficient and Transferable Black-Box Attacks on Speech Recognition and Voice Identification Systems. arXiv preprint arXiv:1910.05262 (2019).
- Hear” no evil”, see” kenansville”: Efficient and transferable black-box attacks on speech recognition and voice identification systems. In 2021 IEEE Symposium on Security and Privacy (SP). IEEE, 712–729.
- Defense against universal adversarial perturbations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3389–3398.
- Efthimios Alepis and Constantinos Patsakis. 2017. Monkey says, monkey does: security and privacy on voice assistants. IEEE Access 5 (2017), 17841–17851.
- KNRK Raju Alluri and Anil Kumar Vuppala. 2019. IIIT-H Spoofing Countermeasures for Automatic Speaker Verification Spoofing and Countermeasures Challenge 2019.. In INTERSPEECH. 1043–1047.
- Adversarial Attacks against Neural Networks in Audio Domain: Exploiting Principal Components. arXiv preprint arXiv:2007.07001 (2020).
- Did you hear that? adversarial examples against automatic speech recognition. arXiv preprint arXiv:1801.00554 (2018).
- Deep residual neural networks for audio spoofing detection. arXiv preprint arXiv:1907.00501 (2019).
- S Abhishek Anand and Nitesh Saxena. 2018. Speechless: Analyzing the threat to speech privacy from smartphone motion sensors. In 2018 IEEE Symposium on Security and Privacy (SP). IEEE, 1000–1017.
- MP3 compression to diminish adversarial noise in end-to-end speech recognition. In International Conference on Speech and Computer. Springer, 22–34.
- Deep voice: Real-time neural text-to-speech. arXiv preprint arXiv:1702.07825 (2017).
- Synthesizing robust adversarial examples. In International conference on machine learning. PMLR, 284–293.
- Bekir Bakar and Cemal Hanilçi. 2018. Replay spoofing attack detection using deep neural networks. In 2018 26th Signal Processing and Communications Applications Conference (SIU). IEEE, 1–4.
- Accent conversion using artificial neural networks. Technical Report. Stanford University, Tech. Rep.
- Parrotron: An end-to-end speech-to-speech conversion model and its applications to hearing-impaired speech and speech separation. arXiv preprint arXiv:1904.04169 (2019).
- Robust Bayesian and Light Neural Networks for Voice Spoofing Detection. Proc. Interspeech 2019 (2019), 1028–1032.
- Marie Biolková and Bac Nguyen. 2022. Neural Predictor for Black-Box Adversarial Attacks on Speech Recognition. arXiv preprint arXiv:2203.09849 (2022).
- Alan W Black and Paul A Taylor. 1997. Automatically clustering similar units for unit selection in speech synthesis. (1997).
- Countermeasures for Automatic Speaker Verification Replay Spoofing Attack: On Data Augmentation, Feature Representation, Classification and Fusion.. In INTERSPEECH. 17–21.
- The dku replay detection system for the asvspoof 2019 challenge: On data augmentation, feature representation, classification, and fusion. arXiv preprint arXiv:1907.02663 (2019).
- Hidden voice commands. In 25th {normal-{\{{USENIX}normal-}\}} Security Symposium ({normal-{\{{USENIX}normal-}\}} Security 16). 513–530.
- Nicholas Carlini and David Wagner. 2018. Audio adversarial examples: Targeted attacks on speech-to-text. In 2018 IEEE Security and Privacy Workshops (SPW). IEEE, 1–7.
- transfer-Representation Learning for Detecting Spoofing Attacks with Converted and Synthesized Speech in Automatic Speaker Verification System. Transfer 51 (2019), 2.
- Who is real bob? adversarial attacks on speaker recognition systems. In 2021 IEEE Symposium on Security and Privacy (SP). IEEE, 694–711.
- You can hear but you cannot steal: Defending against voice impersonation attacks on smartphones. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE, 183–195.
- Wearable Microphone Jamming. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–12.
- Devil’s whisper: A general approach for physical adversarial attacks against commercial black-box speech recognition devices. In 29th USENIX Security Symposium (USENIX Security 20).
- ResNet and Model Fusion for Automatic Spoofing Detection.. In INTERSPEECH. 102–106.
- Ensemble models for spoofing detection in automatic speaker verification. arXiv preprint arXiv:1904.04589 (2019).
- Adagio: Interactive experimentation with adversarial attack and defense for audio. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 677–681.
- Long Range Acoustic Features for Spoofed Speech Detection.. In Interspeech. 1058–1062.
- Detecting Adversarial Examples for Speech Recognition via Uncertainty Quantification. arXiv preprint arXiv:2005.14611 (2020).
- Your voice assistant is mine: How to abuse speakers to steal information and control your phone. In Proceedings of the 4th ACM Workshop on Security and Privacy in Smartphones & Mobile Devices. 63–74.
- Towards Resistant Audio Adversarial Examples. In Proceedings of the 1st ACM Workshop on Security and Privacy on Artificial Intelligence. 3–10.
- A study of the effect of jpg compression on adversarial images. arXiv preprint arXiv:1608.00853 (2016).
- Dompteur: Taming audio adversarial examples. arXiv preprint arXiv:2102.05431 (2021).
- A robust approach for securing audio classification against adversarial attacks. IEEE Transactions on Information Forensics and Security 15 (2019), 2147–2159.
- Class-Conditional Defense GAN Against End-to-End Speech Attacks. arXiv preprint arXiv:2010.11352 (2020).
- Speaker verification security improvement by means of speech watermarking. Speech communication 48, 12 (2006), 1608–1619.
- Foreign accent conversion in computer assisted pronunciation training. Speech communication 51, 10 (2009), 920–932.
- Continuous authentication for voice assistants. In Proceedings of the 23rd Annual International Conference on Mobile Computing and Networking. 343–355.
- Deepcloak: Masking deep neural network models for robustness against adversarial samples. arXiv preprint arXiv:1702.06763 (2017).
- Deep voice 2: Multi-speaker neural text-to-speech. In Advances in neural information processing systems. 2962–2970.
- A Light Convolutional GRU-RNN Deep Feature Extractor for ASV Spoofing Detection. Proc. Interspeech 2019 (2019), 1068–1072.
- Real-time adversarial attacks. arXiv preprint arXiv:1905.13399 (2019).
- Yuan Gong and Christian Poellabauer. 2017. Crafting adversarial examples for speech paralinguistics applications. arXiv preprint arXiv:1711.03280 (2017).
- Yuan Gong and Christian Poellabauer. 2018a. An overview of vulnerabilities of voice controlled systems. arXiv preprint arXiv:1803.09156 (2018).
- Yuan Gong and Christian Poellabauer. 2018b. Protecting voice controlled systems using sound source identification based on acoustic cues. In 2018 27th International Conference on Computer Communication and Networks (ICCCN). IEEE, 1–9.
- Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).
- Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning. 369–376.
- Melody Y Guan and Gregory Valiant. 2019. A Surprising Density of Illusionable Natural Speech. arXiv preprint arXiv:1906.01040 (2019).
- SpecPatch: Human-in-the-Loop Adversarial Audio Spectrogram Patch Attack on Speech Recognition. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. 1353–1366.
- Deep speech: Scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567 (2014).
- John HL Hansen and Bryan L Pellom. 1998. An effective quality evaluation protocol for speech enhancement algorithms. In Fifth international conference on spoken language processing.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
- Voice conversion from non-parallel corpora using variational auto-encoder. In 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA). IEEE, 1–6.
- Andrew J Hunt and Alan W Black. 1996. Unit selection in a concatenative speech synthesis system using a large speech database. In 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, Vol. 1. IEEE, 373–376.
- WaveGuard: Understanding and mitigating audio adversarial examples. arXiv preprint arXiv:2103.03344 (2021).
- Shoma Ishida. 2020. Adjust-free adversarial example generation in speech recognition using evolutionary multi-objective optimization under black-box condition. arXiv preprint arXiv:2012.11138 (2020).
- A11y attacks: Exploiting accessibility in operating systems. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security. 103–115.
- Detecting Audio Attacks on ASR Systems with Dropout Uncertainty. arXiv preprint arXiv:2006.01906 (2020).
- Spoof Detection Using Source, Instantaneous Frequency and Cepstral Features.. In Interspeech. 22–26.
- Exploration of Compressed ILPR Features for Replay Attack Detection.. In Interspeech. 631–635.
- Transfer learning from speaker verification to multispeaker text-to-speech synthesis. In Advances in neural information processing systems. 4480–4490.
- Replay attack detection with complementary high-resolution information using end-to-end DNN for the ASVspoof 2019 Challenge. arXiv preprint arXiv:1904.10134 (2019).
- Alexander Kain and Michael W Macon. 2001. Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction. In 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 01CH37221), Vol. 2. IEEE, 813–816.
- Advances in anti-spoofing: from the perspective of ASVspoof challenges. APSIPA Transactions on Signal and Information Processing 9 (2020).
- Effectiveness of Speech Demodulation-Based Features for Replay Detection.. In Interspeech. 641–645.
- Hideki Kawahara. 2006. STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds. Acoustical science and technology 27, 6 (2006), 349–353.
- Adversarial black-box attacks on automatic speech recognition systems using multi-objective evolutionary optimization. arXiv preprint arXiv:1811.01312 (2018).
- t-DCF: a detection cost function for the tandem assessment of spoofing countermeasures and automatic speaker verification. arXiv preprint arXiv:1804.09618 (2018).
- The ASVspoof 2017 challenge: Assessing the limits of replay spoofing attack detection. (2017).
- Fooling end-to-end speaker verification with adversarial examples. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1962–1966.
- Audio adversarial examples for robust hybrid ctc/attention speech recognition. In International Conference on Speech and Computer. Springer, 255–266.
- POSTER: Detecting Audio Adversarial Example through Audio Modification. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. 2521–2523.
- ASSERT: Anti-Spoofing with squeeze-excitation and residual networks. arXiv preprint arXiv:1904.01120 (2019).
- Adversarial machine learning and speech emotion recognition: Utilizing generative adversarial networks for robustness. arXiv preprint arXiv:1811.11402 (2018).
- Audio-replay attack detection countermeasures. In International conference on speech and computer. Springer, 171–181.
- Stc antispoofing systems for the asvspoof2019 challenge. arXiv preprint arXiv:1904.05576 (2019).
- Generative adversarial trainer: Defense to adversarial perturbations with gan. arXiv preprint arXiv:1705.03387 (2017).
- Chirp signal-based aerial acoustic communication for smart devices. In 2015 IEEE Conference on Computer Communications (INFOCOM). IEEE, 2407–2415.
- The Insecurity of Home Digital Voice Assistants–Amazon Alexa as a Case Study. arXiv preprint arXiv:1712.03327 (2017).
- Multi-step Jailbreaking Privacy Attacks on ChatGPT. arXiv preprint arXiv:2304.05197 (2023).
- Adversarial music: Real world audio adversary against wake-word detection system. In Advances in Neural Information Processing Systems. 11931–11941.
- Audio-Visual Event Recognition through the lens of Adversary. arXiv preprint arXiv:2011.07430 (2020).
- A study on replay attack and anti-spoofing for automatic speaker verification. arXiv preprint arXiv:1706.02101 (2017).
- Anti-Spoofing Speaker Verification System with Multi-Feature Integration and Multi-Task Learning.. In Interspeech. 1048–1052.
- Advpulse: Universal, synchronization-free, and targeted audio adversarial attacks via subsecond perturbations. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security. 1121–1134.
- Johan Lindberg and Mats Blomberg. 1999. Vulnerability in speaker verification-a study of technical impostor techniques. In Sixth European Conference on Speech Communication and Technology.
- Weighted-Sampling Audio Adversarial Example Attack.. In AAAI. 4908–4915.
- Preventing sensitive-word recognition using self-supervised learning to preserve user-privacy for automatic speech recognition. Proc. Interspeech 2022 (2022), 4207–4211.
- Defending against microphone-based attacks with personalized noise. Proceedings on Privacy Enhancing Technologies 2021, 2 (2021).
- Detecting Adversarial Attacks On Audio-Visual Speech Recognition. arXiv preprint arXiv:1912.08639 (2019).
- SampleRNN: An unconditional end-to-end neural audio generation model. arXiv preprint arXiv:1612.07837 (2016).
- Is deep learning safe for robot vision? adversarial examples against the icub humanoid. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 751–759.
- Ethan Mendes and Kyle Hogan. 2020. Defending Against Imperceptible Audio Adversarial Examples Using Proportional Additive Gaussian Noise. (2020).
- Seyed Hamidreza Mohammadi and Alexander Kain. 2017. An overview of voice conversion systems. Speech Communication 88 (2017), 65–82.
- Universal adversarial perturbations. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1765–1773.
- WORLD: a vocoder-based high-quality speech synthesis system for real-time applications. IEICE TRANSACTIONS on Information and Systems 99, 7 (2016), 1877–1884.
- Black-box audio adversarial attack using particle swarm optimization. IEEE Access 10 (2022), 23532–23544.
- Alain Muzet. 2007. Environmental noise, sleep and health. Sleep medicine reviews 11, 2 (2007), 135–142.
- Universal adversarial perturbations for speech recognition systems. arXiv preprint arXiv:1905.03828 (2019).
- Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016).
- Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277 (2016).
- Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference on computer and communications security. 506–519.
- The limitations of deep learning in adversarial settings. In 2016 IEEE European symposium on security and privacy (EuroS&P). IEEE, 372–387.
- Reverberation robust acoustic modeling using i-vectors with time delay neural networks. In Sixteenth Annual Conference of the International Speech Communication Association.
- Audroid: Preventing attacks on audio channels in mobile devices. In Proceedings of the 31st Annual Computer Security Applications Conference. 181–190.
- Deep voice 3: Scaling text-to-speech with convolutional sequence learning. arXiv preprint arXiv:1710.07654 (2017).
- The Kaldi speech recognition toolkit. In IEEE 2011 workshop on automatic speech recognition and understanding. IEEE Signal Processing Society.
- Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. In International Conference on Machine Learning. PMLR, 5231–5240.
- Krishan Rajaratnam and Jugal Kalita. 2018. Noise flooding for detecting audio adversarial examples against automatic speech recognition. In 2018 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT). IEEE, 197–201.
- Isolated and ensemble audio preprocessing methods for detecting adversarial examples against automatic speech recognition. arXiv preprint arXiv:1809.04397 (2018).
- Inaudible voice commands: The long-range attack and defense. In 15th {normal-{\{{USENIX}normal-}\}} Symposium on Networked Systems Design and Implementation ({normal-{\{{NSDI}normal-}\}} 18). 547–560.
- Adversarial Example Detection by Classification for Deep Speech Recognition. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3102–3106.
- Regularizing deep networks using efficient layerwise adversarial training. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
- On the use of deep recurrent neural networks for detecting audio spoofing attacks. In 2017 International Joint Conference on Neural Networks (IJCNN). IEEE, 3483–3490.
- Imperio: Robust over-the-air adversarial examples for automatic speech recognition systems. In Annual Computer Security Applications Conference. 843–855.
- Adversarial attacks against automatic speech recognition systems via psychoacoustic hiding. arXiv preprint arXiv:1808.05665 (2018).
- Handbook of biometric anti-spoofing: trusted biometrics under spoofing attacks.
- Adversarial examples on object recognition: A comprehensive survey. ACM Computing Surveys (CSUR) 53, 3 (2020), 1–38.
- Lingvo: a modular and scalable framework for sequence-to-sequence modeling. arXiv preprint arXiv:1902.08295 (2019).
- Natural tts synthesis by conditioning wavenet on mel spectrogram predictions. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4779–4783.
- Liwei Song and Prateek Mittal. 2017. POSTER: Inaudible voice commands. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 2583–2585.
- Char2wav: End-to-end speech synthesis. (2017).
- Training augmentation with adversarial examples for robust speech recognition. arXiv preprint arXiv:1806.02782 (2018).
- Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).
- Voiceloop: Voice fitting and synthesis via a phonological loop. arXiv preprint arXiv:1707.06588 (2017).
- Adversarial attacks on audio source separation. arXiv preprint arXiv:2010.03164 (2020).
- Targeted adversarial examples for black box audio systems. In 2019 IEEE Security and Privacy Workshops (SPW). IEEE, 15–20.
- Acoustic-to-articulatory inversion mapping with Gaussian mixture model. In Eighth International Conference on Spoken Language Processing.
- One-to-many and many-to-one voice conversion based on eigenvoices. In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, Vol. 4. IEEE, IV–1249.
- Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification. Computer Speech & Language 45 (2017), 516–535.
- Asvspoof 2019: Future horizons in spoofed and fake audio detection. arXiv preprint arXiv:1904.05441 (2019).
- Speech parameter generation algorithms for HMM-based speech synthesis. In 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 00CH37100), Vol. 3. IEEE, 1315–1318.
- Arthur R Toth and Alan W Black. 2007. Using articulatory position data in voice transformation.. In SSW. 182–187.
- Jon Vadillo and Roberto Santana. 2019. Universal adversarial examples in speech command classification. arXiv preprint arXiv:1911.10182 (2019).
- Cocaine noodles: exploiting the gap between human and machine speech recognition. In 9th {normal-{\{{USENIX}normal-}\}} Workshop on Offensive Technologies ({normal-{\{{WOOT}normal-}\}} 15).
- Jesús Villalba and Eduardo Lleida. 2010. Speaker verification performance degradation against spoofing and tampering attacks. In FALA workshop. 131–134.
- Feature Selection Based on CQCCs for Automatic Speaker Verification Spoofing.. In INTERSPEECH. 32–36.
- Tacotron: Towards end-to-end speech synthesis. arXiv preprint arXiv:1703.10135 (2017).
- Jennifer Williams and Joanna Rownicka. 2019. Speech Replay Detection with x-Vector Attack Embeddings and Spectral Features. arXiv preprint arXiv:1909.10324 (2019).
- Audio Replay Attack Detection Using High-Frequency Features.. In Interspeech. 27–31.
- Spoofing and countermeasures for speaker verification: A survey. speech communication 66 (2015), 130–153.
- ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. In Sixteenth Annual Conference of the International Speech Communication Association.
- HASP: A High-Performance Adaptive Mobile Security Enhancement Against Malicious Speech Recognition. arXiv preprint arXiv:1809.01697 (2018).
- Hiromu Yakura and Jun Sakuma. 2018. Robust audio adversarial example for a physical attack. arXiv preprint arXiv:1810.11793 (2018).
- Characterizing Speech Adversarial Examples Using Self-Attention U-Net Enhancement. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3107–3111.
- Feature with complementarity of statistics and principal information for spoofing detection. (2018).
- The SJTU Robust Anti-Spoofing System for the ASVspoof 2019 Challenge.. In INTERSPEECH. 1038–1042.
- Characterizing audio adversarial examples using temporal dependency. arXiv preprint arXiv:1809.10875 (2018).
- Towards mitigating audio adversarial perturbations. (2018).
- Commandersong: A systematic approach for practical adversarial voice recognition. In 27th {normal-{\{{USENIX}normal-}\}} Security Symposium ({normal-{\{{USENIX}normal-}\}} Security 18). 49–64.
- Efficient defenses against adversarial attacks. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security. 39–49.
- Detecting spoofing attacks using vgg and sincnet: but-omilia submission to asvspoof 2019 challenge. arXiv preprint arXiv:1907.12908 (2019).
- Statistical parametric speech synthesis. speech communication 51, 11 (2009), 1039–1064.
- Dolphinattack: Inaudible voice commands. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 103–117.
- Adversarial attacks on deep-learning models in natural language processing: A survey. ACM Transactions on Intelligent Systems and Technology (TIST) 11, 3 (2020), 1–41.
- Black-box adversarial attacks on commercial speech platforms with minimal information. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. 86–107.
- Yuchen Liu (156 papers)
- Apu Kapadia (11 papers)
- Donald Williamson (5 papers)