Online Active Learning For Sound Event Detection (2309.14460v1)
Abstract: Data collection and annotation is a laborious, time-consuming prerequisite for supervised machine learning tasks. Online Active Learning (OAL) is a paradigm that addresses this issue by simultaneously minimizing the amount of annotation required to train a classifier and adapting to changes in the data over the duration of the data collection process. Prior work has indicated that fluctuating class distributions and data drift are still common problems for OAL. This work presents new loss functions that address these challenges when OAL is applied to Sound Event Detection (SED). Experimental results from the SONYC dataset and two Voice-Type Discrimination (VTD) corpora indicate that OAL can reduce the time and effort required to train SED classifiers by a factor of 5 for SONYC, and that the new methods presented here successfully resolve issues present in existing OAL methods.
- Burr Settles, “Active Learning Literature Survey,” 2009.
- “Learning under Concept Drift: A Review,” IEEE Transactions on Knowledge and Data Engineering, vol. 31, no. 12, pp. 2346–2363, 2019.
- “A Survey on Online Active Learning,” arXiv preprint arXiv:2302.08893, 2023.
- “Stream-Based Active Learning with Verification Latency in Non-stationary Environments,” in International Conference on Artificial Neural Networks. Springer, 2022, pp. 260–272.
- “Reinforcement Online Active Learning Ensemble for Drifting Imbalanced Data Streams,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 8, pp. 3971–3983, 2022.
- “Combining active learning with concept drift detection for data stream mining,” in 2018 IEEE International Conference on Big Data (Big Data), 2018, pp. 2239–2244.
- “Focal Loss for Dense Object Detection,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), Oct 2017.
- “Dynamically Weighted Balanced Loss: Class Imbalanced Learning and Confidence Calibration of Deep Neural Networks,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 7, pp. 2940–2951, 2022.
- “Class-Balanced Loss Based on Effective Number of Samples,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
- “Active Learning From Imbalanced Data: A Solution of Online Weighted Extreme Learning Machine,” IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 4, pp. 1088–1103, 2019.
- “Active Learning of Non-Semantic Speech Tasks with Pretrained Models,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.
- “Direction of Arrival Estimation of Sound Sources Using Icosahedral CNNs,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 313–321, 2023.
- “SONYC urban sound tagging (SONYC-UST): A multilabel dataset from an urban acoustic sensor network,” 2019.
- “Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation,” in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, 2023.
- “HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection,” in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, 2022.
- “Wav2CLIP: Learning Robust Audio Representations from CLIP,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 4563–4567.
- “WavLM: Large-scale Self-supervised Pre-training for Full Stack Speech Processing,” IEEE Journal of Selected Topics in Signal Processing, vol. 16, no. 6, pp. 1505–1518, 2022.
- “Learning a Similarity Metric Discriminatively, with Application to Face Verification,” in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 2005, vol. 1, pp. 539–546 vol. 1.
- Proceedings of the Fifth Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2020), Tokyo, Japan, November 2020.
- “Energy-based Out-of-distribution Detection,” Advances in neural information processing systems, vol. 33, pp. 21464–21475, 2020.
- “Investigating Active-Learning-Based Training Data Selection for Speech Spoofing Countermeasure,” in 2022 IEEE Spoken Language Technology Workshop (SLT), 2023, pp. 585–592.