Contrastive Loss Based Frame-wise Feature disentanglement for Polyphonic Sound Event Detection (2401.05850v1)
Abstract: Overlapping sound events are ubiquitous in real-world environments, but existing end-to-end sound event detection (SED) methods still struggle to detect them effectively. A critical reason is that these methods represent overlapping events using shared and entangled frame-wise features, which degrades the feature discrimination. To solve the problem, we propose a disentangled feature learning framework to learn a category-specific representation. Specifically, we employ different projectors to learn the frame-wise features for each category. To ensure that these feature does not contain information of other categories, we maximize the common information between frame-wise features within the same category and propose a frame-wise contrastive loss. In addition, considering that the labeled data used by the proposed method is limited, we propose a semi-supervised frame-wise contrastive loss that can leverage large amounts of unlabeled data to achieve feature disentanglement. The experimental results demonstrate the effectiveness of our method.
- “Visual object detector for cow sound event detection,” IEEE Access, vol. 8, pp. 162625–162633, 2020.
- “Gun identification from gunshot audios for secure public places using transformer learning,” Scientific reports, vol. 12, no. 1, pp. 13300, 2022.
- “Scarrie: A real-time system for sound event detection for assisted living,” in Proc. IWSSIP, 2023, pp. 1–5.
- “Is your baby fine at home? baby cry sound detection in domestic environments,” in Proc. APSIPA ASC, 2022, pp. 275–280.
- “Analysis and acoustic event classification of environmental data collected in a citizen science project,” International Journal of Environmental Research and Public Health, vol. 20, no. 4, pp. 3683, 2023.
- H. Håland, “Sound event detection in the medical industry,” M.S. thesis, uis, 2023.
- “Automatic sound event detection and classification of great ape calls using neural networks,” arXiv preprint arXiv:2301.02214, 2023.
- “Convolutional recurrent neural networks for polyphonic sound event detection,” IEEE ACM Trans. Audio Speech Lang. Process., vol. 25, no. 6, pp. 1291–1303, 2017.
- “Improving sound event detection in domestic environments using sound separation,” arXiv preprint arXiv:2007.03932, 2020.
- “Polyphonic audio event detection: multi-label or multi-class multi-task classification problem?,” in Proc. ICASSP, 2022, pp. 8877–8881.
- “Multi-branch convolutional macaron net for sound event detection,” IEEE ACM Trans. Audio Speech Lang. Process., vol. 29, pp. 2972–2985, 2021.
- “Subband dependency modeling for sound event detection,” in Proc. ICASSP, 2023, pp. 1–5.
- “Ast-sed: An effective sound event detection method based on audio spectrogram transformer,” in Proc. ICASSP, 2023, pp. 1–5.
- F. Ronchini and R. Serizel, “A benchmark of state-of-the-art sound event detection systems evaluated on synthetic soundscapes,” in Proc. ICASSP, 2022, pp. 1031–1035.
- “Frequency dynamic convolution: Frequency-adaptive pattern recognition for sound event detection,” in Proc. Interspeech, 2022, pp. 2763–2767.
- “Selective kernel networks,” in Proc. CVPR, 2019, pp. 510–519.
- “mixup: Beyond empirical risk minimization,” in Proc. ICLR, 2018.
- “Learning disentangled attribute representations for robust pedestrian attribute recognition,” in Proc. AAAI, 2022, pp. 1069–1077.
- “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748, 2018.
- “A simple framework for contrastive learning of visual representations,” in Proc. ICML, 2020, pp. 1597–1607.
- “Rct: Random consistency training for semi-supervised sound event detection,” 2022, pp. 1541–1545.
- “Sound event detection in synthetic domestic environments,” in Proc. ICASSP, 2020, pp. 86–90.
- “A framework for the robust evaluation of sound event detection,” in Proc. ICASSP, 2020, pp. 61–65.
- “Filteraugment: An acoustic environmental data augmentation method,” in Proc. ICASSP, 2022, pp. 4308–4312.
- “An improved mean teacher based method for large scale weakly labeled semi-supervised sound event detection,” in Proc. ICASSP, 2021, pp. 356–360.
- “Event specific attention for polyphonic sound event detection,” in Proc. Interspeech, 2021, pp. 566–570.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.