Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 152 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 22 tok/s Pro
GPT-5 High 24 tok/s Pro
GPT-4o 94 tok/s Pro
Kimi K2 212 tok/s Pro
GPT OSS 120B 430 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Contrastive Loss Based Frame-wise Feature disentanglement for Polyphonic Sound Event Detection (2401.05850v1)

Published 11 Jan 2024 in cs.SD and eess.AS

Abstract: Overlapping sound events are ubiquitous in real-world environments, but existing end-to-end sound event detection (SED) methods still struggle to detect them effectively. A critical reason is that these methods represent overlapping events using shared and entangled frame-wise features, which degrades the feature discrimination. To solve the problem, we propose a disentangled feature learning framework to learn a category-specific representation. Specifically, we employ different projectors to learn the frame-wise features for each category. To ensure that these feature does not contain information of other categories, we maximize the common information between frame-wise features within the same category and propose a frame-wise contrastive loss. In addition, considering that the labeled data used by the proposed method is limited, we propose a semi-supervised frame-wise contrastive loss that can leverage large amounts of unlabeled data to achieve feature disentanglement. The experimental results demonstrate the effectiveness of our method.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. “Visual object detector for cow sound event detection,” IEEE Access, vol. 8, pp. 162625–162633, 2020.
  2. “Gun identification from gunshot audios for secure public places using transformer learning,” Scientific reports, vol. 12, no. 1, pp. 13300, 2022.
  3. “Scarrie: A real-time system for sound event detection for assisted living,” in Proc. IWSSIP, 2023, pp. 1–5.
  4. “Is your baby fine at home? baby cry sound detection in domestic environments,” in Proc. APSIPA ASC, 2022, pp. 275–280.
  5. “Analysis and acoustic event classification of environmental data collected in a citizen science project,” International Journal of Environmental Research and Public Health, vol. 20, no. 4, pp. 3683, 2023.
  6. H. Håland, “Sound event detection in the medical industry,” M.S. thesis, uis, 2023.
  7. “Automatic sound event detection and classification of great ape calls using neural networks,” arXiv preprint arXiv:2301.02214, 2023.
  8. “Convolutional recurrent neural networks for polyphonic sound event detection,” IEEE ACM Trans. Audio Speech Lang. Process., vol. 25, no. 6, pp. 1291–1303, 2017.
  9. “Improving sound event detection in domestic environments using sound separation,” arXiv preprint arXiv:2007.03932, 2020.
  10. “Polyphonic audio event detection: multi-label or multi-class multi-task classification problem?,” in Proc. ICASSP, 2022, pp. 8877–8881.
  11. “Multi-branch convolutional macaron net for sound event detection,” IEEE ACM Trans. Audio Speech Lang. Process., vol. 29, pp. 2972–2985, 2021.
  12. “Subband dependency modeling for sound event detection,” in Proc. ICASSP, 2023, pp. 1–5.
  13. “Ast-sed: An effective sound event detection method based on audio spectrogram transformer,” in Proc. ICASSP, 2023, pp. 1–5.
  14. F. Ronchini and R. Serizel, “A benchmark of state-of-the-art sound event detection systems evaluated on synthetic soundscapes,” in Proc. ICASSP, 2022, pp. 1031–1035.
  15. “Frequency dynamic convolution: Frequency-adaptive pattern recognition for sound event detection,” in Proc. Interspeech, 2022, pp. 2763–2767.
  16. “Selective kernel networks,” in Proc. CVPR, 2019, pp. 510–519.
  17. “mixup: Beyond empirical risk minimization,” in Proc. ICLR, 2018.
  18. “Learning disentangled attribute representations for robust pedestrian attribute recognition,” in Proc. AAAI, 2022, pp. 1069–1077.
  19. “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748, 2018.
  20. “A simple framework for contrastive learning of visual representations,” in Proc. ICML, 2020, pp. 1597–1607.
  21. “Rct: Random consistency training for semi-supervised sound event detection,” 2022, pp. 1541–1545.
  22. “Sound event detection in synthetic domestic environments,” in Proc. ICASSP, 2020, pp. 86–90.
  23. “A framework for the robust evaluation of sound event detection,” in Proc. ICASSP, 2020, pp. 61–65.
  24. “Filteraugment: An acoustic environmental data augmentation method,” in Proc. ICASSP, 2022, pp. 4308–4312.
  25. “An improved mean teacher based method for large scale weakly labeled semi-supervised sound event detection,” in Proc. ICASSP, 2021, pp. 356–360.
  26. “Event specific attention for polyphonic sound event detection,” in Proc. Interspeech, 2021, pp. 566–570.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Questions

We haven't generated a list of open questions mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.