A privacy-preserving method using secret key for convolutional neural network-based speech classification (2310.04035v1)
Abstract: In this paper, we propose a privacy-preserving method with a secret key for convolutional neural network (CNN)-based speech classification tasks. Recently, many methods related to privacy preservation have been developed in image classification research fields. In contrast, in speech classification research fields, little research has considered these risks. To promote research on privacy preservation for speech classification, we provide an encryption method with a secret key in CNN-based speech classification systems. The encryption method is based on a random matrix with an invertible inverse. The encrypted speech data with a correct key can be accepted by a model with an encrypted kernel generated using an inverse matrix of a random matrix. Whereas the encrypted speech data is strongly distorted, the classification tasks can be correctly performed when a correct key is provided. Additionally, in this paper, we evaluate the difficulty of reconstructing the original information from the encrypted spectrograms and waveforms. In our experiments, the proposed encryption methods are performed in automatic speech recognition~(ASR) and automatic speaker verification~(ASV) tasks. The results show that the encrypted data can be used completely the same as the original data when a correct secret key is provided in the transformer-based ASR and x-vector-based ASV with self-supervised front-end systems. The robustness of the encrypted data against reconstruction attacks is also illustrated.
- H. Tabrizchi and M. Kuchaki Rafsanjani, “A survey on security challenges in cloud computing: issues, threats, and solutions,” The journal of supercomputing, vol. 76, no. 12, pp. 9493–9532, 2020.
- N. Tomashenko et al., “The voiceprivacy 2022 challenge evaluation plan,” [Online]. Available: https://www.voiceprivacychallenge.org/vp2020/docs/VoicePrivacy_2020_Eval_Plan_v1_4.pdf, 2020.
- H. Kiya, A. MaungMaung, Y. Kinoshita, S. Imaizumi, and S. Shiota, “An overview of compressible and learnable image transformation with secret key and its applications,” APSIPA Transactions on Signal and Information Processing, vol. 11, no. 1, 2022.
- A. Maungmaung and H. Kiya, “Privacy-preserving image classification using an isotropic network,” IEEE MultiMedia, vol. 29, no. 2, pp. 23–33, 2022.
- Z. Průša, P. Balazs, and P. L. Søndergaard, “A noniterative method for reconstruction of phase from stft magnitude,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 5, pp. 1154–1164, 2017.
- R. Shokri and V. Shmatikov, “Privacy-preserving deep learning,” in Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 1310–1321, 2015. [Online]. Available: https://doi.org/10.1145/2810103.2813687
- H. Kiya, R. Iijima, A. Maungmaung, and Y. Kinoshita, “Image and model transformation with secret key for vision transformer,” IEICE Transactions on Information and Systems, vol. E106.D, no. 1, pp. 2–11, 2023.
- R. Iijima and H. Kiya, “An encryption method of convmixer models without performance degradation,” in 2022 International Conference on Machine Learning and Cybernetics (ICMLC), pp. 159–164, 2022.
- S. Karita et al., “A comparative study on transformer vs rnn in speech applications,” in 2019 IEEE Automatic Speech Recognition and Understanding Workshop, pp. 449–456, 2019.
- W.-N. Hsu et al., “Hubert: Self-supervised speech representation learning by masked prediction of hidden units,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 3451–3460, 2021.
- A. H. Chang and B. M. Case, “Attacks on image encryption schemes for privacy-preserving deep neural networks,” 2020. [Online]. Available: https://arxiv.org/abs/2004.13263
- V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, “Librispeech: An asr corpus based on public domain audio books,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5206–5210, 2015.
- A. Nagrani, J. S. Chung, W. Xie, and A. Zisserman, “Voxceleb: Large-scale speaker verification in the wild,” Computer Speech & Language, vol. 60, p. 101027, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0885230819302712
- S. Watanabe et al., “Espnet: End-to-end speech processing toolkit,” in Proc. Interspeech 2018, pp. 2207–2211, 2018. [Online]. Available: http://dx.doi.org/10.21437/Interspeech.2018-1456
- D. Snyder, D. Garcia-Romero, G. Sell, D. Povey, and S. Khudanpur, “X-vectors: Robust dnn embeddings for speaker recognition,” 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5329–5333, 2018.
- S. wen Yang et al., “SUPERB: Speech Processing Universal PERformance Benchmark,” in Proc. Interspeech 2021, pp. 1194–1198, 2021.