Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 62 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 36 tok/s Pro
GPT-5 High 30 tok/s Pro
GPT-4o 67 tok/s Pro
Kimi K2 192 tok/s Pro
GPT OSS 120B 430 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

6DoF SELD: Sound Event Localization and Detection Using Microphones and Motion Tracking Sensors on self-motioning human (2403.01670v1)

Published 4 Mar 2024 in eess.AS and cs.SD

Abstract: We aim to perform sound event localization and detection (SELD) using wearable equipment for a moving human, such as a pedestrian. Conventional SELD tasks have dealt only with microphone arrays located in static positions. However, self-motion with three rotational and three translational degrees of freedom (6DoF) shall be considered for wearable microphone arrays. A system trained only with a dataset using microphone arrays in a fixed position would be unable to adapt to the fast relative motion of sound events associated with self-motion, resulting in the degradation of SELD performance. To address this, we designed 6DoF SELD Dataset for wearable systems, the first SELD dataset considering the self-motion of microphones. Furthermore, we proposed a multi-modal SELD system that jointly utilizes audio and motion tracking sensor signals. These sensor signals are expected to help the system find useful acoustic cues for SELD on the basis of the current self-motion state. Experimental results on our dataset show that the proposed method effectively improves SELD performance with a mechanism to extract acoustic features conditioned by sensor signals.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. “Sound event localization and detection of overlapping sources using convolutional recurrent neural networks,” IEEE J. sel. top. signal process., vol. 13, 2019.
  2. “STARSS23: An audio-visual dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events,” 2023.
  3. J. Herre and S. Disch, “MPEG-I immersive audio – reference model for the virtual/augmented reality audio standard,” Audio Engineering Society, vol. 71, no. 5, pp. 229–240, 2022.
  4. “Immersive audio coding for virtual reality using a metadata-assisted extension of the 3gpp evs codec,” in ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 730–734.
  5. “Surrey-cvssp system for DCASE2017 challenge task4,” in Tech. Rep., DCASE2017, 2017.
  6. “Ensemble of convolutional neural networks for weakly-supervised sound event detection using multiple scale input,” in Tech. Rep., DCASE2017, 2017.
  7. “Feature extracted DOA estimation algorithm using acoustic array for drone surveillance,” in Proc. of IEEE 87th Veh. Technol. Conf., 2018.
  8. “On-line sound event localization and detection for real-time recognition of surrounding environment,” Applied Acoustics, vol. 199, pp. 108961, 2022.
  9. “A multi-room reverberant dataset for sound event localization and detection,” in Proc.of 4th Workshop on Detection and Classification of Acoust. Scenes and Events (DCASE), 2019.
  10. “A dataset of reverberant spatial sound scenes with moving sources for sound event localization and detection,” in Proc. Workshop Detect. Classif. Acoust. Scenes Events (DCASE), 2020.
  11. “A dataset of dynamic reverberant sound scenes with directional interferers for sound event localization and detection,” arXiv preprint arXiv:2106.06999, 2021.
  12. “DCASE 2023: Sound event localization and detection evaluated in real spatial sound scenes,” https://github.com/sharathadavanne/seld-dcase2023, Accessed: 2034-09-05.
  13. “The nerc-slip system for sound event localization and detection of dcase2023 challenge,” Tech. Rep., DCASE2023 Challenge, June 2023.
  14. “Wearable SELD dataset: Dataset for sound event localization and detection using wearable devices around head,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 156–160.
  15. “What makes sound event localization and detection difficult? insights from error analysis,” in Proc. 6th Workshop Detect. Classif. Acoust. Scenes Events (DCASE), 2021.
  16. H Wallach, “The role of head movements and vestibular and visual cues in sound localization,” Experimental Psychology, vol. 27, no. 4, pp. 339–368, 1940.
  17. “The effect of head motion on the accuracy of sound localization,” Acoustical Science and Technology, vol. 24, pp. 315–317, 2003.
  18. I. Toshima and S. Aoki, “The effect of head movement on sound localization in an acoustical telepresence robot: Telehead,” in 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2006, pp. 872–877.
  19. “Binaural source localization using deep learning and head rotation information,” in Proc. 30th Eur. Signal Process. Conf. (EUSIPCO), 2022.
  20. “Self-rotation angle estimation of circular microphone array based on sound field interpolation,” in 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2021, pp. 1016–1020.
  21. “Visualechoes: Spatial image representation learning through echolocation,” in Proc. of the European Conf. on Computer Vision (ECCV), 2020.
  22. “Inference of room geometry from acoustic impulse responses,” IEEE Trans. on Audio, Speech, and Lang. Process., vol. 20, no. 10, pp. 2683–2695, 2012.
  23. “Mmtm: Multimodal transfer module for cnn fusion,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 13286–13296.
  24. “Salsa-lite: A fast and effective feature for polyphonic sound event localization and detection with microphone arrays,” in 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 716–720.
  25. A. Savitzky and M. J. E. Golay, “Smoothing and differentiation of data by simplified least squares procedures.,” Analytical Chemistry, vol. 36, no. 8, pp. 1627–1639, 1964.
  26. “MMAct: A large-scale dataset for cross modal human action understanding,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
  27. “Multi-accdoa: Localizing and detecting overlapping sounds from the same class with auxiliary duplicating permutation invariant training,” in 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 316–320.
  28. “Adam: A method for stochastic optimization,” in Proc. IEEE Int. Conf. on Learn. Represent. (ICLR), 2015.
  29. “Squeeze-and-excitation networks,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 7132–7141.
  30. “Overview and evaluation of sound event localization and detection in dcase 2019,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 684–698, 2021.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 post and received 8 likes.