Robust Sound Source Localization Using a Microphone Array on a Mobile Robot (1602.08213v1)

Published 26 Feb 2016 in cs.RO and cs.SD

Abstract: The hearing sense on a mobile robot is important because it is omnidirectional and it does not require direct line-of-sight with the sound source. Such capabilities can nicely complement vision to help localize a person or an interesting event in the environment. To do so the robot auditory system must be able to work in noisy, unknown and diverse environmental conditions. In this paper we present a robust sound source localization method in three-dimensional space using an array of 8 microphones. The method is based on time delay of arrival estimation. Results show that a mobile robot can localize in real time different types of sound sources over a range of 3 meters and with a precision of 3 degrees.

Citations (399)

View on Semantic Scholar

Summary

The paper presents a methodology using an eight-microphone array to estimate TDOA for accurate 3D sound localization on mobile robots.
It enhances traditional cross-correlation with whitening and spectral weighting to improve robustness in noisy and reverberant conditions.
Experimental results demonstrate real-time tracking with approximately 3° angular precision within a three-meter range.

Robust Sound Source Localization Using a Microphone Array on a Mobile Robot

The paper "Robust Sound Source Localization Using a Microphone Array on a Mobile Robot" presents a sophisticated methodology for auditory perception in mobile robots, using an array of eight microphones for sound source localization in three-dimensional space. The approach leverages Time Delay of Arrival (TDOA) estimation to achieve high precision across various environmental conditions.

Technical Overview and Methodology

The challenge addressed pertains to the innate limitations of using a small number of microphones to emulate human auditory capabilities on mobile robots. Human auditory systems factor in elements like acoustic shadowing and ear shape; however, a microphone pair can typically only localize sound in two dimensions without distinguishing front from back. This work circumvents these issues by employing eight microphones, which allows for enhanced resolution and robustness, particularly in noisy and reverberant environments.

The core of the method is TDOA, where the delay in signal propagation between pairs of microphones is used to deduce the source's direction. Cross-correlation is the primary technique for estimating these delays, and the paper delineates enhancements to traditional cross-correlation to improve robustness and accuracy. Issues such as computation in the time domain being overly complex (O(N²)) are addressed by moving the computation to the frequency domain, reducing it to O(N log N).

To further improve on traditional cross-correlation, the paper introduces a whitening process to moderate spectral dominance and implements spectral weighting. These innovations help in accentuating frequency components where the signal-to-noise ratio (SNR) is highest, thus making the system more resilient to noise and enabling it to focus on signal components more effectively.

Results and Implications

The localization system, integrated on an ActivMedia Pioneer 2 robot, demonstrated effective real-time performance. The setup allowed the robot to track sound sources within a three-meter range with an angular precision of approximately 3 degrees. Experiments verified reliability in different spatial orientations without precision degradation, a significant advantage over configurations relying on fewer microphones. This system allows the robot to direct its camera towards sound sources, facilitating improved interaction with humans or other agents in the environment.

Significantly, the system's performance remains stable irrespective of the source's continuity, allowing for effective localization of transient sounds. While robust, the current limitations include difficulty in localizing tonal sounds and the inability to estimate sound source distance or handle multiple simultaneous sources effectively.

Future Directions

The foundation laid by this research opens avenues for further optimization and enhancement in robotic audition. Acknowledging potential improvements, the paper hints at developing mechanisms to estimate source distance and handling multiple concurrent sound sources. Furthermore, integrating advanced noise cancellation and machine learning techniques could expand the system's applicability to more complex auditory scenes.

In conclusion, by advancing TDOA estimation using an arrayed microphone approach, this paper contributes to the evolving domain of robotic auditory perception, offering insights and methodologies that could be leveraged in future research and implementations within mobile robotics and associated intelligent systems.

PDF Markdown