On the Transferability of Large-Scale Self-Supervision to Few-Shot Audio Classification (2402.01274v3)
Abstract: In recent years, self-supervised learning has excelled for its capacity to learn robust feature representations from unlabelled data. Networks pretrained through self-supervision serve as effective feature extractors for downstream tasks, including Few-Shot Learning. While the evaluation of unsupervised approaches for few-shot learning is well-established in imagery, it is notably absent in acoustics. This study addresses this gap by assessing large-scale self-supervised models' performance in few-shot audio classification. Additionally, we explore the relationship between a model's few-shot learning capability and other downstream task benchmarks. Our findings reveal state-of-the-art performance in some few-shot problems such as SpeechCommandsv2, as well as strong correlations between speech-based few-shot problems and various downstream audio tasks.
- “Meta-learning in neural networks: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
- “How well do self-supervised models transfer?,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021.
- “Metaaudio: A few-shot audio classification benchmark,” in ICANN, 2022.
- “MT-SLVR: Multi-Task Self-Supervised Learning for Transformation In(Variant) Representations,” in Proc. INTERSPEECH 2023, 2023.
- “Self-supervised speech representation learning: A review,” IEEE Journal of Selected Topics in Signal Processing, 2022.
- “An Unsupervised Autoregressive Model for Speech Representation Learning,” in Proc. Interspeech 2019, 2019.
- “Audio albert: A lite bert for self-supervised learning of audio representation,” in 2021 IEEE Spoken Language Technology Workshop (SLT), 2021.
- “Decoar 2.0: Deep contextualized acoustic representations with vector quantization,” 2020.
- “Mockingjay: Unsupervised speech representation learning with deep bidirectional transformer encoders,” in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 6419–6423.
- “Tera: Self-supervised learning of transformer encoder representation for speech,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, 2021.
- “Wavlm: Large-scale self-supervised pre-training for full stack speech processing,” IEEE Journal of Selected Topics in Signal Processing, vol. 16, 2022.
- “wav2vec: Unsupervised pre-training for speech recognition,” Interspeech 2019, 2019.
- “wav2vec 2.0: A framework for self-supervised learning of speech representations,” Advances in neural information processing systems, vol. 33, 2020.
- “Multi-task self-supervised learning for robust speech recognition,” in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020.
- “Hubert: Self-supervised speech representation learning by masked prediction of hidden units,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, 2021.
- “Distilhubert: Speech representation learning by layer-wise distillation of hidden-unit bert,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
- “SUPERB: Speech Processing Universal PERformance Benchmark,” in Proc. Interspeech 2021, 2021.
- “vq-wav2vec: Self-supervised learning of discrete speech representations,” in International Conference on Learning Representations, 2019.
- “Vector-quantized autoregressive predictive coding,” 2020.
- “Non-autoregressive predictive coding for learning speech representations from local dependencies,” 2021.
- Calum Heggan (4 papers)
- Sam Budgett (3 papers)
- Timothy Hospedales (101 papers)
- Mehrdad Yaghoobi (17 papers)