Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Spatial Scaper: A Library to Simulate and Augment Soundscapes for Sound Event Localization and Detection in Realistic Rooms (2401.12238v1)

Published 19 Jan 2024 in eess.AS, cs.LG, and cs.SD

Abstract: Sound event localization and detection (SELD) is an important task in machine listening. Major advancements rely on simulated data with sound events in specific rooms and strong spatio-temporal labels. SELD data is simulated by convolving spatialy-localized room impulse responses (RIRs) with sound waveforms to place sound events in a soundscape. However, RIRs require manual collection in specific rooms. We present SpatialScaper, a library for SELD data simulation and augmentation. Compared to existing tools, SpatialScaper emulates virtual rooms via parameters such as size and wall absorption. This allows for parameterized placement (including movement) of foreground and background sound sources. SpatialScaper also includes data augmentation pipelines that can be applied to existing SELD data. As a case study, we use SpatialScaper to add rooms to the DCASE SELD data. Training a model with our data led to progressive performance improves as a direct function of acoustic diversity. These results show that SpatialScaper is valuable to train robust SELD models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. “A survey of sound source localization with deep learning methods,” JASA, vol. 152, no. 1, pp. 107–151, 2022.
  2. “Ambient acoustic event assistive framework for id, detection, and recognition of unknown acoustic events of a residence,” Advanced Engineering Informatics, vol. 47, 2021.
  3. “Overview and evaluation of seld in dcase 2019,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, 2020.
  4. “Starss23: An audio-visual dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events,” arXiv:2306.09126, 2023.
  5. “The locata challenge data corpus for acoustic source localization and tracking,” in IEEE 10th SAM Workshop, 2018.
  6. “A multi-room reverberant dataset for seld,” arXiv:1905.08546, 2019.
  7. “A four-stage data aug. approach to resnet-conformer based acoustic modeling for seld,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 1251–1264, 2023.
  8. “Two-source acoustic event detection and localization: Online implementation in a smart-room,” in 2011 19th European Signal Processing Conference, 2011, pp. 1317–1321.
  9. “Multi-accdoa: Localizing and detecting overlapping sounds from the same class with aux. duplicating permutation invariant training,” in ICASSP, 2022.
  10. “A track-wise ensemble event independent network for polyphonic seld,” in ICASSP, 2022, pp. 9196–9200.
  11. “A dataset of reverberant spatial sound scenes with moving sources for seld,” arXiv:2006.01919, 2020.
  12. “A dataset of dynamic sound scenes with directional interferers for seld,” arXiv:2106.06999, 2021.
  13. Andres Perez-Lopez, “Ambiscaper: A tool for automatic generation and annotation of reverberant ambisonics sound scenes,” in 2018 16th IWAENC, 2018.
  14. “Sound source distance estimation in diverse and dynamic acoustic conditions.,” in WASPAA, 2023.
  15. “Reconstructing room scales with a single sound for augmented reality displays,” Journal of Information Display, vol. 24, no. 1, pp. 1–12, 2023.
  16. “github.com/danielkrause/dcase2022-data-generator,” .
  17. “Dataset of Spatial Room Impulse Responses in a Variable Acoustics Room for Six Degrees-of-Freedom Rendering and Analysis,” Nov. 2021.
  18. “METU SPARG Eigenmike em32 Acoustic Impulse Response Dataset v0.1.0,” Apr. 2019.
  19. “How to (virtually) train your speaker localizer,” August 2023.
  20. “Leveraging geometrical acoustic simulations of spatial room impulse responses for improved seld,” in DCASE, September 2023, pp. 56–60.
  21. “Scaper: A library for soundscape synthesis and augmentation,” in WASPAA, 2017, pp. 344–348.
  22. “Pyroomacoustics: A python package for audio room simulations and array processing algorithms,” CoRR, vol. abs/1710.04196, 2017.
  23. “Micarraylib: Software for the reproducible aggregation, standardization, and signal processing of microphone array datasets,” in DCASE, 2021.
  24. Archontis Politis, “[DCASE2022 Task 3] Synthetic SELD mixtures for baseline training,” Apr. 2022.
  25. “Fsd50k: an open dataset of human-labeled sound events,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, 2021.
  26. “Fma: A dataset for music analysis,” arXiv preprint arXiv:1612.01840, 2016.
Citations (9)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com