Quantify robustness of the auditory neurogram encoder to noise, reverberation, and higher-bandwidth sources

Quantify the robustness of the proposed convolutional auditory encoder that approximates the deterministic mean-rate pathway of the Bruce et al. (2018) auditory-nerve model via the AMT bruce2018 implementation, specifically under additive noise, room reverberation, and audio sources sampled above 16 kHz, by evaluating neurogram fidelity across characteristic-frequency channels using speech-centric metrics on non-clean and higher-bandwidth inputs.

Background

The paper introduces a compact, differentiable encoder that maps raw audio to deterministic rate-domain neurograms, approximating the Bruce et al. (2018) auditory-nerve model. Training and evaluation are conducted on clean speech at a 16 kHz sampling rate, and reported metrics demonstrate high fidelity to the analytical reference under these conditions.

The authors explicitly note that the model’s performance outside the clean 16 kHz regime—such as in noisy or reverberant environments and with higher-bandwidth sources—has not been quantified, indicating a concrete unresolved question regarding robustness beyond the current evaluation scope.

References

Moreover, training and evaluation used clean speech sampled at 16kHz, so robustness to noise, reverberation, and higher-bandwidth sources remains to be quantified.

— An Efficient Neural Network for Modeling Human Auditory Neurograms for Speech (2510.19354 - Zohar et al., 22 Oct 2025) in Section: Conclusion and Future Work

Quantify robustness of the auditory neurogram encoder to noise, reverberation, and higher-bandwidth sources

Background

References

Related Problems