Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SoundCam: A Dataset for Finding Humans Using Room Acoustics (2311.03517v2)

Published 6 Nov 2023 in cs.SD, cs.CV, and eess.AS

Abstract: A room's acoustic properties are a product of the room's geometry, the objects within the room, and their specific positions. A room's acoustic properties can be characterized by its impulse response (RIR) between a source and listener location, or roughly inferred from recordings of natural signals present in the room. Variations in the positions of objects in a room can effect measurable changes in the room's acoustic properties, as characterized by the RIR. Existing datasets of RIRs either do not systematically vary positions of objects in an environment, or they consist of only simulated RIRs. We present SoundCam, the largest dataset of unique RIRs from in-the-wild rooms publicly released to date. It includes 5,000 10-channel real-world measurements of room impulse responses and 2,000 10-channel recordings of music in three different rooms, including a controlled acoustic lab, an in-the-wild living room, and a conference room, with different humans in positions throughout each room. We show that these measurements can be used for interesting tasks, such as detecting and identifying humans, and tracking their positions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. See through walls with wifi! In Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM, pages 75–86, 2013.
  2. Capturing the human figure through a wall. ACM Transactions on Graphics (TOG), 34(6):1–13, 2015.
  3. Inference of room geometry from acoustic impulse responses. IEEE Transactions on Audio, Speech, and Language Processing, 20(10):2683–2695, 2012a. doi: 10.1109/TASL.2012.2210877.
  4. Inference of room geometry from acoustic impulse responses. IEEE Transactions on Audio, Speech, and Language Processing, 20(10):2683–2695, 2012b.
  5. Acoustic reconstruction of the geometry of an environment through acquisition of a controlled emission. In 2009 17th European Signal Processing Conference, pages 710–714, 2009.
  6. A french corpus for distant-microphone speech processing in real homes. In Interspeech 2016, 2016.
  7. Predicting reverberation times in a simulated classroom. The Journal of the Acoustical Society of America, 108(4):1721–1731, 2000.
  8. dechorate: a calibrated room impulse response dataset for echo-aware signal processing. EURASIP Journal on Audio, Speech, and Music Processing, 2021:1–15, 2021.
  9. Privacy-preserving action recognition for smart hospitals using low-resolution depth images. arXiv preprint arXiv:1811.09950, 2018.
  10. Batvision: Learning to see 3d spatial layout with two ears. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 1581–1587. IEEE, 2020.
  11. Audio tracking in noisy environments by acoustic map and spectral signature. IEEE Transactions on Cybernetics, 48(5):1619–1632, 2017.
  12. Influence of sound-absorbing material placement on room acoustical parameters. In Acoustics, volume 1, pages 644–660. MDPI, 2019.
  13. Fma: A dataset for music analysis. arXiv preprint arXiv:1612.01840, 2016.
  14. Acoustic echoes reveal room shape. Proceedings of the National Academy of Sciences, 110(30):12186–12191, 2013.
  15. Self-supervised moving vehicle tracking with stereo sound. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7053–7062, 2019.
  16. Visualechoes: Spatial image representation learning through echolocation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IX 16, pages 658–676. Springer, 2020.
  17. Datasheets for datasets, 2021.
  18. Audio set: An ontology and human-labeled dataset for audio events. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 776–780. IEEE, 2017.
  19. A dataset of higher-order ambisonic room impulse responses and 3d models measured in a room with varying furniture. In 2021 Immersive and 3D Audio: from Architecture to Automotive (I3DA), pages 1–8. IEEE, 2021.
  20. Cnn architectures for large-scale audio classification. In 2017 ieee international conference on acoustics, speech and signal processing (icassp), pages 131–135. IEEE, 2017.
  21. Acoustic imaging in enclosed spaces: Analysis of room geometry modifications on the impulse response. The Journal of the Acoustical Society of America, 116(4):2126–2137, 2004.
  22. Acoustic non-line-of-sight imaging. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6780–6789, 2019.
  23. Passive estimation of aircraft motion parameters using destructive interference between direct and ground-reflected sound waves. In 1999 Information, Decision and Control. Data and Information Fusion Symposium, Signal Processing and Communications Symposium and Decision and Control Symposium. Proceedings (Cat. No. 99EX251), pages 171–176. IEEE, 1999.
  24. On the robustness of room impulse response reshaping. In Proc. International Workshop on Acoustic Echo and Noise control (IWAENC), 2010.
  25. Microsoft. Azure kinect body tracking joints, Sep 2022. URL https://learn.microsoft.com/en-us/azure/kinect-dk/body-joints.
  26. Audio-visual floorplan reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1183–1192, 2021.
  27. Manfred R Schroeder. New method of measuring reverberation time. The Journal of the Acoustical Society of America, 37(6):1187–1188, 1965.
  28. Multi-data sensor fusion framework to detect transparent object for the efficient mobile robot mapping. International Journal of Intelligent Unmanned Systems, 2019.
  29. Gwa: A large high-quality acoustic dataset for audio processing. In ACM SIGGRAPH 2022 Conference Proceedings, pages 1–9, 2022.
  30. Acoustic reflection localization from room impulse responses. ACTA Acustica united with Acustica, 98(3):418–440, 2012.
  31. Posekernellifter: Metric lifting of 3d human pose using sound. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13179–13189, 2022.
  32. Privacy preserving automatic fall detection for elderly using rgbd cameras. In Computers Helping People with Special Needs: 13th International Conference, ICCHP 2012, Linz, Austria, July 11-13, 2012, Proceedings, Part I 13, pages 625–633. Springer, 2012.
  33. Through-wall human pose estimation using radio signals. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7356–7365, 2018.
Citations (4)

Summary

We haven't generated a summary for this paper yet.