RealImpact: A Dataset of Impact Sound Fields for Real Objects (2306.09944v1)
Abstract: Objects make unique sounds under different perturbations, environment conditions, and poses relative to the listener. While prior works have modeled impact sounds and sound propagation in simulation, we lack a standard dataset of impact sound fields of real objects for audio-visual learning and calibration of the sim-to-real gap. We present RealImpact, a large-scale dataset of real object impact sounds recorded under controlled conditions. RealImpact contains 150,000 recordings of impact sounds of 50 everyday objects with detailed annotations, including their impact locations, microphone locations, contact force profiles, material labels, and RGBD images. We make preliminary attempts to use our dataset as a reference to current simulation methods for estimating object impact sounds that match the real world. Moreover, we demonstrate the usefulness of our dataset as a testbed for acoustic and audio-visual learning via the evaluation of two benchmark tasks, including listener location classification and visual acoustic matching.
- Look, listen and learn. In ICCV, 2017.
- M. Bebendorf. Approximation of boundary element matrices. Numerical Mathematics, 86(4):565–589, Oct 2000.
- Samuel D. Bellows. Directivity. https://scholarsarchive.byu.edu/directivity/. Accessed: 2022-06-06.
- Spherical harmonic expansions of high-resolution musical instrument directivities. In Proceedings of Meetings on Acoustics, 2018.
- Harmonic shells: a practical nonlinear sound model for near-rigid thin shells. ACM Transactions on Graphics (TOG), 28(5):1–119, 2009.
- Visual acoustic matching. In CVPR, 2022.
- Soundspaces: Audio-visual navigaton in 3d environments. In ECCV, 2020.
- Learning to set waypoints for audio-visual navigation. In ICLR, 2021.
- Sound localization by self-supervised time delay estimation. In ECCV, 2022.
- Water bottle synthesis with modal signal processing. In Int. Conf. Digital Audio Effects (DAFx), 2020.
- Boundary Element Methods in Acoustics. Computational Mechanics Publications and Elsevier Applied Science, Southampton. UK, 1991.
- DiffImpact: Differentiable Rendering and Identification of Impact Sounds. In CoRL, 2021.
- TimbreFields: 3D interactive sound models for real-time audio. Presence, 16(6):643–654, 2007.
- See, hear, explore: Curiosity via audio-visual association. In NeurIPS, 2020.
- Threedworld: A platform for interactive multi-modal physical simulation. In NeurIPS Datasets and Benchmarks Track, 2021.
- Look, listen, and act: Towards audio-visual embodied navigation. In ICRA, 2020.
- ObjectFolder: A dataset of objects with implicit visual, auditory, and tactile representations. In CoRL, 2021.
- VisualEchoes: Spatial Image Representation Learning through Echolocation. In ECCV, 2020.
- The ObjectFolder Benchmark: Multisensory Object-Centric Learning with Neural and Real Objects. In CVPR, 2023.
- 2.5D Visual Sound. In CVPR, 2019.
- ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer. In CVPR, 2022.
- Geometry-aware multi-task learning for binaural audio generation from video. In BMVC, 2021.
- Fast Multipole Methods for the Helmholtz Equation in Three Dimensions. Elsevier Science, 2005.
- Deep residual learning for image recognition. In CVPR, 2016.
- Discriminative sounding objects localization via self-supervised audiovisual matching. In NeurIPS, 2020.
- Precomputed Acoustic Transfer: Output-sensitive, accurate sound generation for geometrically complex vibration sources. ACM Transactions on Graphics (TOG), 25(3):987–995, 2006.
- Physically based sound for computer animation and virtual environments. In ACM SIGGRAPH 2016 Courses, 2016.
- Deep-modal: real-time impact sound synthesis for arbitrary shapes. In ACM MM, 2020.
- Neuralsound: learning-based modal sound synthesis with acoustic transfer. ACM Transactions on Graphics (TOG), 41(4):1–15, 2022.
- Perception of material from contact sounds. Presence, 9(4):399–410, 2000.
- Eigenmode compression for modal sound models. ACM Transactions on Graphics (TOG), 33(4), 2014.
- Toward animating water with complex acoustic bubbles. ACM Transactions on Graphics (TOG), 35(4):1–13, 2016.
- Looking into your speech: Learning cross-modal affinity for audio-visual speech separation. In CVPR, 2021.
- Sound synthesis, propagation, and rendering: a survey. arXiv preprint arXiv:2011.05538, 2020.
- Active audio-visual separation of dynamic sound sources. In ECCV, 2022.
- Study on the effect of the impact location and the type of hammer tip on the frequency response function (FRF) in experimental modal analysis of rectangular plates. In IOP Conference Series: Materials Science and Engineering, 2018.
- CDPAM: Contrastive learning for perceptual audio similarity. In ICASSP, 2021.
- Self-supervised generation of spatial audio for 360∘{}^{\circ}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT video. In NeurIPS, 2018.
- Synthesizing sounds from physically based motion. In SIGGRAPH, 2001.
- Synthesizing sounds from rigid-body simulations. In ACM SIGGRAPH/Eurographics Symposium on Computer Animation, 2002.
- Visually indicated sounds. In CVPR, 2016.
- Scanning physical interaction behavior of 3D objects. In SIGGRAPH, 2001.
- Audio-visual floorplan reconstruction. In ICCV, 2021.
- Sound localization in web-based 3D environments. Scientific Reports, 12(1):1–13, 2022.
- Localization of sound sources in robotics: A review. Robotics and Autonomous Systems, 96:184–210, 2017.
- Example-guided physically based modal sound synthesis. ACM Transactions on Graphics (TOG), 32(1):1–16, 2013.
- Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires. PLoS Computational Biology, 16(10):e1008228, 2020.
- Manfred R Schroeder. New method of measuring reverberation time. The Journal of the Acoustical Society of America, 37(6):1187–1188, 1965.
- Animating elastic rods with sound. ACM Transactions on Graphics (TOG), 36(4):1–10, 2017.
- Learning to localize sound source in visual scenes. In CVPR, 2018.
- Ahmed A Shabana. Theory of Vibration: An Introduction. Springer Science & Business Media, 2012.
- Ahmed A Shabana. Dynamics of Multibody Systems. Cambridge university press, 2013.
- Julius O Smith. Physical Audio Signal Processing for virtual musical instruments and digital audio effects. Center for Computer Research in Music and Acoustics (CCRMA), Stanford University, 2010.
- Dassault systemes. Abaqus. Simulia Corporation, 2021.
- 3D convolutional neural networks for cross audio-visual matching recognition. IEEE Access, 5:22081–22091, 2017.
- A perceptually inspired generative model of rigid-body contact sounds. In International Conference on Digital Audio Effects (DAFx), 2019.
- Kleinpat: Optimal mode conflation for time-domain precomputation of acoustic transfer. ACM Transactions on Graphics (TOG), 38(4):1–12, 2019.
- Toward wave-based sound synthesis for computer animation. ACM Transactions on Graphics (TOG), 37(4):1–16, 2018.
- Dual attention matching for audio-visual event localization. In ICCV, 2019.
- Visually informed binaural audio generation without binaural audios. In CVPR, 2021.
- Telling left from right: Learning spatial correspondence of sight and sound. In CVPR, 2020.
- Generative modeling of audible shapes for object perception. In ICCV, 2017.
- Rigid-body fracture sound with precomputed soundbanks. ACM Transactions on Graphics (TOG), 29(4):1–13, 2010.
- Sep-stereo: Visually guided stereophonic audio generation by associating source separation. In ECCV, 2020.