Microphone Conversion: Mitigating Device Variability in Sound Event Classification (2401.06913v1)
Abstract: In this study, we introduce a new augmentation technique to enhance the resilience of sound event classification (SEC) systems against device variability through the use of CycleGAN. We also present a unique dataset to evaluate this method. As SEC systems become increasingly common, it is crucial that they work well with audio from diverse recording devices. Our method addresses limited device diversity in training data by enabling unpaired training to transform input spectrograms as if they are recorded on a different device. Our experiments show that our approach outperforms existing methods in generalization by 5.2% - 11.5% in weighted f1 score. Additionally, it surpasses the current methods in adaptability across diverse recording devices by achieving a 6.5% - 12.8% improvement in weighted f1 score.
- “Low-complexity acoustic scene classification in dcase 2022 challenge,” 2022.
- “Sound event detection in domestic environments with weakly labeled data and soundscape synthesis,” in Workshop on Detection and Classification of Acoustic Scenes and Events, New York City, United States, October 2019.
- “Mic2mic: using cycle-consistent generative adversarial networks to overcome microphone variability in speech systems,” in Proceedings of the 18th international conference on information processing in sensor networks, 2019, pp. 169–180.
- “Deep feature cyclegans: Speaker identity preserving non-parallel microphone-telephone domain adaptation for speaker verification,” arXiv preprint arXiv:2104.01433, 2021.
- “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2223–2232.
- “Acoustical sound database in real environments for sound scene understanding and hands-free speech recognition.,” in LREC, 2000.
- Laurens van der Maaten and Geoffrey Hinton, “Visualizing data using t-sne,” Journal of Machine Learning Research, vol. 9, no. 86, pp. 2579–2605, 2008.
- “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1125–1134.
- “Least squares generative adversarial networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2794–2802.
- “Learning from simulated and unsupervised images through adversarial training,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2107–2116.
- “Deep Residual Learning for Image Recognition,” in Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. June 2016, CVPR ’16, pp. 770–778, IEEE.
- “Decoupled weight decay regularization,” in International Conference on Learning Representations, 2019.
- “CP-JKU submission to dcase22: Distilling knowledge for low-complexity convolutional neural networks from a patchout audio transformer,” Tech. Rep., DCASE2022 Challenge, June 2022.
- “QTI submission to DCASE 2021: Residual normalization for device-imbalanced acoustic scene classification with efficient design,” Tech. Rep., DCASE2021 Challenge, June 2021.
- “Domain Generalization with Relaxed Instance Frequency-wise Normalization for Multi-device Acoustic Scene Classification,” in Proc. Interspeech 2022, 2022, pp. 2393–2397.
- “Heavily augmented sound event detection utilizing weak predictions,” Tech. Rep., DCASE2021 Challenge, June 2021.
- “Filteraugment: An acoustic environmental data augmentation method,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 4308–4312.
- “SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition,” in Proc. Interspeech 2019, 2019, pp. 2613–2617.
- “mixup: Beyond empirical risk minimization,” in International Conference on Learning Representations, 2018.