Audio Contrastive based Fine-tuning (2309.11895v3)
Abstract: Audio classification plays a crucial role in speech and sound processing tasks with a wide range of applications. There still remains a challenge of striking the right balance between fitting the model to the training data (avoiding overfitting) and enabling it to generalise well to a new domain. Leveraging the transferability of contrastive learning, we introduce Audio Contrastive-based Fine-tuning (AudioConFit), an efficient approach characterised by robust generalisability. Empirical experiments on a variety of audio classification tasks demonstrate the effectiveness and robustness of our approach, which achieves state-of-the-art results in various settings.
- “Adaptive pre-training and collaborative fine-tuning: A win-win strategy to improve review analysis tasks,” IEEE/ACM Trans. on ASLP, vol. 30, pp. 622–634, 2022.
- “A survey on deep transfer learning,” in Artificial Neural Networks and Machine Learning–ICANN 2018: 27th International Conference on Artificial Neural Networks, 2018, pp. 270–279.
- “Low-rank adaptation method for wav2vec2-based fake audio detection,” 2023.
- “Efficient training of audio transformers with patchout,” arXiv:2110.05069, 2021.
- “Integrated parameter-efficient tuning for general-purpose audio models,” arXiv preprint arXiv:2211.02227, 2022.
- “Adapter incremental continual learning of efficient audio spectrogram transformers,” arXiv preprint arXiv:2302.14314, 2023.
- “Efficient few-shot learning without prompts,” 2022.
- “Supervised contrastive learning,” NeurIPS, vol. 33, pp. 18661–18673, 2020.
- “Multi-similarity loss with general pair weighting for deep metric learning,” in Proceedings of CVPR, 2019, pp. 5022–5030.
- John S Garofolo, “Timit acoustic phonetic continuous speech corpus,” Linguistic Data Consortium, 1993, 1993.
- “Voice source cepstrum coefficients for speaker identification,” in 2008 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2008, pp. 4821–4824.
- “Musical genre classification of audio signals,” IEEE Trans. on speech and audio processing, vol. 10, pp. 293–302, 2002.
- “On the effectiveness of speech self-supervised learning for music,” in ISMIR, 2023.
- “Deep learning and music adversaries,” IEEE Trans. on Multimedia, vol. 17, no. 11, pp. 2059–2071, 2015.
- Karol J Piczak, “Esc: Dataset for environmental sound classification,” in Proceedings of the 23rd ACM international conference on Multimedia, 2015, pp. 1015–1018.
- “Multilingual spoken words corpus,” in Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
- “A database of german emotional speech.,” in Interspeech, 2005, vol. 5, pp. 1517–1520.
- “wav2vec 2.0: A framework for self-supervised learning of speech representations,” NeurIPS, vol. 33, pp. 12449–12460, 2020.
- “Hubert: Self-supervised speech representation learning by masked prediction of hidden units,” IEEE/ACM Trans. on ASLP, vol. 29, pp. 3451–3460, 2021.
- “Wavlm: Large-scale self-supervised pre-training for full stack speech processing,” IEEE Journal of Selected Topics in Signal Processing, vol. 16, no. 6, pp. 1505–1518, 2022.
- “Umap: Uniform manifold approximation and projection for dimension reduction,” arXiv preprint arXiv:1802.03426, 2018.
- “Representation degeneration problem in training natural language generation models,” in ICLR, 2018.
- “Isotropy in the contextual embedding space: Clusters and manifolds,” in ICLR, 2020.
- “On isotropy, contextualization and learning dynamics of contrastive-based sentence representation learning,” in Findings of ACL 2023, 2023, pp. 12266–12283.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days freePaper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.