Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge (2403.17420v1)

Published 26 Mar 2024 in cs.MM, cs.CV, cs.SD, and eess.AS

Abstract: The goal of the multi-sound source localization task is to localize sound sources from the mixture individually. While recent multi-sound source localization methods have shown improved performance, they face challenges due to their reliance on prior information about the number of objects to be separated. In this paper, to overcome this limitation, we present a novel multi-sound source localization method that can perform localization without prior knowledge of the number of sound sources. To achieve this goal, we propose an iterative object identification (IOI) module, which can recognize sound-making objects in an iterative manner. After finding the regions of sound-making objects, we devise object similarity-aware clustering (OSC) loss to guide the IOI module to effectively combine regions of the same object but also distinguish between different objects and backgrounds. It enables our method to perform accurate localization of sound-making objects without any prior knowledge. Extensive experimental results on the MUSIC and VGGSound benchmarks show the significant performance improvements of the proposed method over the existing methods for both single and multi-source. Our code is available at: https://github.com/VisualAIKHU/NoPrior_MultiSSL

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Iterative bounding box annotation for object detection. In ICPR, 2021.
  2. The design and analysis of computer algorithms. Pearson Education India, 1974.
  3. Objects that sound. In ECCV, 2018.
  4. Irondepth: Iterative refinement of single-view depth using surface normal and its uncertainty. arXiv preprint arXiv:2210.03676, 2022.
  5. Iterative methods for image deblurring. Proceedings of the IEEE, 1990.
  6. Smart room: Participant and speaker localization and identification. In ICASSP, 2005.
  7. Vggsound: A large-scale audio-visual dataset. In ICASSP, 2020.
  8. Localizing visual sounds the hard way. In CVPR, 2021.
  9. Exploring simple siamese representation learning. In CVPR, 2021.
  10. Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
  11. Hear the flow: Optical flow-based self-supervised visual sound source localization. In WACV, 2023.
  12. Speech intelligibility and localization in a multi-source environment. JASA, 1999.
  13. Deep residual learning for image recognition. In CVPR, 2016.
  14. Design of uav-embedded microphone array system for sound source localization in outdoor environments. Sensors, 2017.
  15. Deep multimodal clustering for unsupervised audiovisual learning. In CVPR, 2019.
  16. Discriminative sounding objects localization via self-supervised audiovisual matching. In NeurIPS, 2020.
  17. Mix and localize: Localizing sound sources in mixtures. In CVPR, 2022.
  18. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  19. A new design in iterative image deblurring for improved robustness and performance. Pattern Recognition, 2019.
  20. Reverberant sound localization with a robot head based on direct-path relative transfer function. In IROS, 2016.
  21. Unsupervised sound localization via iterative contrastive learning. CVIU, 2023.
  22. Exploiting transformation invariance and equivariance for self-supervised sound localisation. In ACM MM, 2022.
  23. Deep face super-resolution with iterative collaboration between attentive recovery and landmark estimation. In CVPR, 2020.
  24. Localizing visual sounds the easy way. In ECCV, 2022a.
  25. A closer look at weakly-supervised audio-visual source localization. In NeurIPS, 2022b.
  26. Audio-visual grouping network for sound localization from mixtures. In CVPR, 2023.
  27. Multiple sound sources localization from coarse to fine. In ECCV, 2020.
  28. Localization of sound sources in robotics: A review. Robotics and Autonomous Systems, 2017.
  29. Image super-resolution via iterative refinement. TPAMI, 2022.
  30. Acoustic source localization from multirotor uavs. TIE, 2019.
  31. Speaker localization using stereo-based sound source localization. In WOSSPA, 2011.
  32. Learning to localize sound source in visual scenes. In CVPR, 2018.
  33. Learning sound localization better from semantically similar samples. In ICASSP, 2022.
  34. Sound source localization is all about cross-modal alignment. In ICCV, 2023.
  35. Unsupervised sounding object localization with bottom-up and top-down attention. In WACV, 2022.
  36. Flowgrad: Using motion for visual sound source localization. In ICASSP, 2023.
  37. Reviving iterative training with mask guidance for interactive segmentation. In ICIP, 2022.
  38. Self-supervised predictive learning: A negative-free method for sound source localization in visual scenes. In CVPR, 2022.
  39. Learning audio-visual source localization via false negative aware contrastive learning. In CVPR, 2023.
  40. Audio-visual event localization in unconstrained videos. In ECCV, 2018.
  41. Audio-visual spatial integration and recursive attention for robust sound source localization. In ACM MM, 2023.
  42. An iterative and cooperative top-down and bottom-up inference network for salient object detection. In CVPR, 2019.
  43. Deepsfm: Robust deep iterative refinement for structure from motion. TPAMI, 2023.
  44. A proposal-based paradigm for self-supervised sound source localization in videos. In CVPR, 2022.
  45. Canet: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. In CVPR, 2019.
  46. The sound of pixels. In ECCV, 2018.
  47. Joint semantic segmentation and boundary detection using iterative pyramid contexts. In CVPR, 2020.
  48. Exploiting visual context semantics for sound source localization. In WACV, 2023.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com