Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Audio-Visual BatVision Dataset for Research on Sight and Sound (2303.07257v3)

Published 13 Mar 2023 in cs.RO

Abstract: Vision research showed remarkable success in understanding our world, propelled by datasets of images and videos. Sensor data from radar, LiDAR and cameras supports research in robotics and autonomous driving for at least a decade. However, while visual sensors may fail in some conditions, sound has recently shown potential to complement sensor data. Simulated room impulse responses (RIR) in 3D apartment-models became a benchmark dataset for the community, fostering a range of audiovisual research. In simulation, depth is predictable from sound, by learning bat-like perception with a neural network. Concurrently, the same was achieved in reality by using RGB-D images and echoes of chirping sounds. Biomimicking bat perception is an exciting new direction but needs dedicated datasets to explore the potential. Therefore, we collected the BatVision dataset to provide large-scale echoes in complex real-world scenes to the community. We equipped a robot with a speaker to emit chirps and a binaural microphone to record their echoes. Synchronized RGB-D images from the same perspective provide visual labels of traversed spaces. We sampled modern US office spaces to historic French university grounds, indoor and outdoor with large architectural variety. This dataset will allow research on robot echolocation, general audio-visual tasks and sound ph{\ae}nomena unavailable in simulated data. We show promising results for audio-only depth prediction and show how state-of-the-art work developed for simulated data can also succeed on our dataset. Project page: https://amandinebtto.github.io/Batvision-Dataset/

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Angelo Farina “Simultaneous measurement of impulse response and distortion with a swept-sine technique” In Audio engineering society convention 108, 2000 Audio Engineering Society
  2. “Multichannel audio database in various acoustic environments” In 14th International Workshop on Acoustic Signal Enhancement (IWAENC), 2014, pp. 313–317 IEEE
  3. Siqi Zhang, Dominique Martinez and Jean-Baptiste Masson “Multi-robot searching with sparse binary cues and limited space perception” In Frontiers in Robotics and AI 2 Frontiers Media SA, 2015, pp. 12
  4. “Virtual worlds as proxy for multi-object tracking analysis” In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 4340–4349
  5. “Visually indicated sounds” In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2405–2413
  6. “Matterport3D: Learning from RGB-D Data in Indoor Environments” In International Conference on 3D Vision (3DV), 2017 IEEE
  7. “Acoustic room modelling using a spherical camera for reverberant spatial audio objects” In Audio Engineering Society Convention 142, 2017 Audio Engineering Society
  8. “Mirage: Multichannel database of room impulse responses measured on high-resolution cube-shaped grid in multiple acoustic conditions” In arXiv preprint arXiv:1907.12421, 2019
  9. “Decoupled Weight Decay Regularization” In International Conference on Learning Representations, 2019
  10. “The Replica dataset: A digital replica of indoor spaces” In arXiv preprint arXiv:1906.05797, 2019
  11. “Soundspaces: Audio-visual navigation in 3d environments” In Computer Vision–ECCV, Proceedings, 2020 Springer
  12. Jesper Haahr Christensen, Sascha Hornauer and X Yu Stella “Batvision: Learning to see 3d spatial layout with two ears” In IEEE International Conference on Robotics and Automation (ICRA), 2020 IEEE
  13. Jesper Haahr Christensen, Sascha Hornauer and Stella Yu “BatVision with GCC-PHAT Features for Better Sound to Vision Predictions” In Sight & Sound, CVPR Workshops, 2020
  14. “Visualechoes: Spatial image representation learning through echolocation” In Proceedings of ECCV, 2020
  15. “Sim2real predictivity: Does evaluation in simulation predict real-world performance?” In IEEE Robotics and Automation Letters 5.4 IEEE, 2020, pp. 6670–6677
  16. Arun Balajee Vasudevan, Dengxin Dai and Luc Van Gool “Semantic object prediction and spatial sound super-resolution with binaural sounds” In Computer Vision–ECCV Proceedings, 2020 Springer
  17. “dEchorate: a calibrated room impulse response dataset for echo-aware signal processing” In EURASIP Journal on Audio, Speech, and Music Processing 2021 Springer, 2021, pp. 1–15
  18. Ziyang Chen, Xixi Hu and Andrew Owens “Structure from silence: Learning scene structure from ambient sound” In arXiv preprint arXiv:2111.05846, 2021
  19. Kranti Kumar Parida, Siddharth Srivastava and Gaurav Sharma “Beyond image to depth: Improving depth prediction using echoes” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 8268–8277
  20. “Hearing what you cannot see: Acoustic vehicle detection around corners” In IEEE Robotics and Automation Letters 6.2 IEEE, 2021
  21. “Catchatter: Acoustic perception for mobile robots” In IEEE Robotics and Automation Letters 6.4 IEEE, 2021, pp. 7209–7216
  22. Francisco Rivera Valverde, Juana Valeria Hurtado and Abhinav Valada “There is more than meets the eye: Self-supervised multi-object detection and tracking with sound by distilling multimodal knowledge” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021
  23. “Soundspaces 2.0: A simulation platform for visual-acoustic learning” In arXiv preprint arXiv:2206.08312, 2022
  24. “Ego4d: Around the world in 3,000 hours of egocentric video” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 18995–19012
  25. Go Irie, Takashi Shibata and Akisato Kimura “Co-Attention-Guided Bilinear Model for Echo-Based Depth Estimation” In Proceedings of ICASSP, 2022, pp. 4648–4652 IEEE
  26. “Learning Neural Acoustic Fields” In arXiv preprint arXiv:2204.00628, 2022
  27. “STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events” In arXiv preprint arXiv:2206.01948, 2022
  28. “SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain Adaptation” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
  29. “Stereo Depth Estimation with Echoes” In Computer Vision–ECCV 2022 Proceedings, 2022 Springer
  30. Lingyu Zhu, Esa Rahtu and Hang Zhao “Beyond Visual Field of View: Perceiving 3D Environment with Echoes and Vision” In arXiv preprint arXiv:2207.01136, 2022
  31. “Binaural SoundNet: Predicting Semantics, Depth and Motion With Binaural Sounds” In IEEE Transactions on Pattern Analysis and Machine Intelligence 45.1, 2023, pp. 123–136
  32. “Chat2Map: Efficient Scene Mapping from Multi-Ego Conversations” In arXiv preprint arXiv:2301.02184, 2023
  33. Annamaria Mesaros, Toni Heittola and Tuomas Virtanen “TUT database for acoustic scene classification and sound event detection” In 2016 24th European Signal Processing Conference (EUSIPCO), pp. 1128–1132 IEEE
Citations (3)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com