Evaluation of Barlow Twins and VICReg self-supervised learning for sound patterns of bird and anuran species (2312.11240v1)
Abstract: Taking advantage of the structure of large datasets to pre-train Deep Learning models is a promising strategy to decrease the need for supervised data. Self-supervised learning methods, such as contrastive and its variation are a promising way towards obtaining better representations in many Deep Learning applications. Soundscape ecology is one application in which annotations are expensive and scarce, therefore deserving investigation to approximate methods that do not require annotations to those that rely on supervision. Our study involves the use of the methods Barlow Twins and VICReg to pre-train different models with the same small dataset with sound patterns of bird and anuran species. In a downstream task to classify those animal species, the models obtained results close to supervised ones, pre-trained in large generic datasets, and fine-tuned with the same task.
- What is soundscape ecology? An introduction and overview of an emerging new science. Landscape Ecology, 26(9):1213–1232, 2011.
- PANNs: Large-scale pretrained audio neural networks for audio pattern recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28:2880–2894, 2020.
- Passive acoustic monitoring of animal populations with transfer learning. Ecological Informatics, 70:101688, 2022.
- Dan Stowell. Computational bioacoustics with deep learning: a review and roadmap. PeerJ, 10:e13152, 2022.
- Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Processing Letters, 24(3):279–283, 2017.
- BirdNET: A deep learning solution for avian diversity monitoring. Ecological Informatics, 61:101236, 2021.
- A pipeline for identification of bird and frog species in tropical soundscape recordings using a convolutional neural network. Ecological Informatics, page 101113, 2020.
- Training deep networks from zero to hero: avoiding pitfalls and going beyond. In 2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), pages 9–16. IEEE, 2021.
- Virginia R de Sa. Learning classification with unlabeled data. Advances in neural information processing systems, pages 112–112, 1994.
- wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in Neural Information Processing Systems, 33:12449–12460, 2020.
- Contrastive learning of general-purpose audio representations. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3875–3879, 2021.
- Audio Albert: A Lite Bert for Self-Supervised Learning of Audio Representation. In 2021 IEEE Spoken Language Technology Workshop (SLT), pages 344–350. IEEE, 2021.
- Audio-visual scene analysis with self-supervised multisensory features. In Proceedings of the European Conference on Computer Vision (ECCV), pages 631–648, 2018.
- VICReg: Variance-Invariance-Covariance regularization for Self-supervised learning. In ICLR 2022-10th International Conference on Learning Representations, 2022.
- Barlow twins: Self-supervised learning via redundancy reduction. In International Conference on Machine Learning, pages 12310–12320. PMLR, 2021.
- Signature verification using a "siamese" time delay neural network. Advances in neural information processing systems, 6, 1994.
- Learning a similarity metric discriminatively, with application to face verification. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 1, pages 539–546. IEEE, 2005.
- Training strategies with unlabeled and few labeled examples under 1-pixel attack by combining supervised and self-supervised learning. In First Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward at ICML 2022, 2022.
- What does Atlantic Forest soundscapes can tell us about landscape? Ecological Indicators, 121:107050, 2021.
- Visualization and categorization of ecological acoustic events based on discriminant features. Ecological Indicators, 126:107316, 2021a.
- Visual Active Learning for labeling: A case for Soundscape Ecology data. Information, 12(7):265, 2021b.
- A classification and quantification approach to generate features in soundscape ecology using neural networks. Neural Computing and Applications, 34(3):1923–1937, sep 2021.
- Audio set: An ontology and human-labeled dataset for audio events. In Proc. IEEE ICASSP 2017, New Orleans, LA, 2017.
- Survey on deep learning with class imbalance. Journal of Big Data, 6(1):1–54, 2019.
- The effectiveness of data augmentation in image classification using deep learning. Convolutional Neural Networks Vis. Recognit, 11:1–8, 2017.
- Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1314–1324, 2019.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- Matt Harvey. Acoustic Detection of Humpback Whales Using a Convolutional Neural Network, 10 2018. URL https://ai.googleblog.com/2018/10/acoustic-detection-of-humpback-whales.html.
- Marine mammal species classification using convolutional neural networks and a novel acoustic representation. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 290–305. Springer, 2019.
- Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
- Introduction to data mining. 1st, 2005.
- Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, 9(Nov):2579–2605, 2008.
- Multidimensional projection for visual analytics: Linking techniques with distortions, tasks, and layout enrichment. IEEE Transactions on Visualization and Computer Graphics, 25(8):2650–2673, 2018.
- Look, Listen, and Learn More: Design Choices for Deep Audio Embeddings. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3852–3856, 2019. doi:10.1109/ICASSP.2019.8682475.
- Audioclip: Extending clip to image, text and audio. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 976–980, 2022.