Papers
Topics
Authors
Recent
Search
2000 character limit reached

A comparative study of eight human auditory models of monaural processing

Published 5 Jul 2021 in eess.AS and cs.SD | (2107.01753v2)

Abstract: A number of auditory models have been developed using diverging approaches, either physiological or perceptual, but they share comparable stages of signal processing, as they are inspired by the same constitutive parts of the auditory system. We compare eight monaural models that are openly accessible in the Auditory Modelling Toolbox. We discuss the considerations required to make the model outputs comparable to each other, as well as the results for the following model processing stages or their equivalents: Outer and middle ear, cochlear filter bank, inner hair cell, auditory nerve synapse, cochlear nucleus, and inferior colliculus. The discussion includes a list of recommendations for future applications of auditory models.

Citations (29)

Summary

  • The paper presents a detailed framework comparing eight auditory models, emphasizing cochlear filtering variations and neural adaptation differences.
  • It evaluates models ranging from biophysical to functional effective approaches, highlighting differences in frequency selectivity, compression, and modulation detection.
  • The study underscores the trade-off between model complexity and usability, guiding model selection for applications in auditory neuroscience and speech recognition.

Comparative Analysis of Human Auditory Models for Monaural Processing

Introduction

The study conducted by Osses et al. titled "A Comparative Study of Eight Human Auditory Models of Monaural Processing" (2107.01753) explores a thorough comparison among eight computational auditory models focusing on monaural processing. These models, which are part of the Auditory Modelling Toolbox (AMT), simulate various stages of the human auditory system. The primary aim of the study is to establish a framework for making the outputs of these diverse models comparable and to highlight their functional intricacies through a detailed evaluation.

Model Overview

The models under comparison include those developed using physiological approaches (such as biophysical and phenomenological models) and those that are perceptually oriented (functional effective models). Notable models among these include Zilany et al.'s phenomenological model (2014), Verhulst et al.'s biophysical models (2015, 2018), and Dau et al.'s functional effective model (1997). Each model replicates critical processes such as cochlear filtering, auditory nerve synapse, and subcortical processing, specifically focusing on stages like the cochlear nucleus and inferior colliculus.

Comparative Methodology

To effectively compare these models, the study investigates their performance in simulating responses across a set of standardized auditory stimuli. This includes assessing their spectral and temporal processing capabilities, cochlear filter tuning and compression, adaptation characteristics, and responses to amplitude modulated tones. The models were evaluated on their ability to recreate known auditory phenomena such as the compression of basilar membrane responses and auditory nerve adaptation.

Key Findings and Observations

Cochlear Filtering: Among the models, variations were observed in their frequency selectivity and compressive characteristics. Models like Verhulst's exhibited strong level-dependent tuning reflective of complex transmission line modelling, which captures the nonlinear cochlear processing with high fidelity. In contrast, simpler models demonstrated more linear characteristics in high-intensity sound conditions.

Auditory Nerve Modelling: All models accounted for adaptation, a critical auditory nerve response characterized by an initial rapid increase in firing rate followed by a plateau. However, differences emerged in the modeling approaches of synaptic release and firing rate consistency, with models like Zilany's offering both mean-rate and PSTH outputs for detailed neuronal behaviour simulations.

Subcortical Processing: The modulation filter bank concept was pivotal in evaluating the temporal processing capabilities across models. While effective models utilized linear modulation filter banks, phenomenological models incorporated SFIE-based circuitries to simulate CN and IC functionalities, revealing distinctive frequency tuning and modulation detection capabilities.

Implications

The findings underscore the importance of selecting a model based on the specific auditory processing feature of interest. Biophysical models are suitable for detailed physiological emulation, whereas effective models can suffice for perceptual simulation tasks. Discrepancies in computational demand also highlight the trade-off between model complexity and usability, especially in real-time auditory processing applications.

Conclusion

This comparative study provides an essential benchmark for auditory modellers, offering insights into the operational strengths and limitations of different auditory models. While each model has its niche in simulating certain auditory phenomena, the study emphasizes the criticality of understanding underlying assumptions and configurations in auditory modelling. This research lays a foundation for future developments and improvements in auditory processing models, potentially enhancing their applicability in fields such as auditory neuroscience, hearing aid development, and speech recognition technologies.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Explain it Like I'm 14

What this paper is about

This paper compares eight computer models that try to mimic how one human ear (monaural processing) turns sound into brain signals. The models represent different parts of the hearing pathway—like the outer ear, middle ear, the cochlea, inner hair cells, the auditory nerve, and early brain areas—and they were all tested in the same way so we can see how similar or different their “hearing” is.

The big questions the researchers asked

  • How do different hearing models handle the same sounds at each key stage of the hearing pathway?
  • Do they respond similarly to quiet vs. loud sounds?
  • How do they deal with fast sound details (temporal fine structure) and slower changes like beats and rhythms (temporal envelope)?
  • What needs to be adjusted so their outputs can be fairly compared?
  • Based on the comparison, what should people keep in mind when choosing a model for future work?

How they did the comparison (explained simply)

The team used the Auditory Modelling Toolbox (AMT), which is an open-source collection of hearing models. They grouped the models into three “families” based on how detailed they are:

  • Biophysical models: Very detailed, like building the ear from tiny parts. They simulate how fluid and structures in the cochlea interact. Think of a complex machine with many connected parts.
  • Phenomenological models: Medium detail. They are designed to match measured nerve and cochlea behavior using clever shortcuts. Think of a good imitation that captures key behaviors without modeling every tiny part.
  • Functional-effective models: Simpler and fast. They aim to predict hearing performance (like what a listener can hear) rather than exact neural activity. Think of a “good-enough” graphic equalizer plus some smart rules.

To keep things fair, the same sounds were fed into all models:

  • Steady pure tones at 500 Hz and 4000 Hz (low vs. high pitch)
  • White noise (like static) covering a wide range of frequencies
  • Different loudness levels: 40, 70, and 100 dB SPL

They then looked at the outputs after key stages:

  1. Cochlear filtering (how the ear splits sound into frequency bands—like a detailed equalizer)
  2. Inner hair cell (IHC) processing (turning motion into electrical signals)
  3. Auditory nerve (how signals are sent to the brain, including “adaptation” as the nerve gets used to a sound)
  4. Subcortical brain processing (early brain areas, especially the cochlear nucleus and inferior colliculus, which are sensitive to rhythms or “modulations” in sound)

They also aligned settings (like levels and filter parameters) so differences were due to the models themselves, not mismatched inputs.

A quick look at the models compared

Model label Family (type) Key idea
dau1997 Functional-effective (linear) Simple, fast filters and modulation analysis
zilany2014 Phenomenological Dynamic filters + nerve adaptation; can feed brain-stage model
bruce2018 Phenomenological Updated nerve synapse; often used with the same brain-stage model
verhulst2015 Biophysical Nonlinear transmission-line cochlea; detailed hair cell and nerve
verhulst2018 Biophysical Extended, more detailed inner hair cell model
king2019 Functional-effective (nonlinear) Adds compression like automatic volume control, mainly on the “on-frequency” channel
relanoiborra2019 Functional-effective (nonlinear, DRNL) Dual-path filters that capture nonlinear cochlear behavior
osses2021 Functional-effective (linear) Clean, level-calibrated pipeline with modulation filters

What they found and why it matters

1) Middle ear filtering changes “how loud” the cochlea sees the sound

Different models use different middle-ear filters. This matters because it shifts where compression (the ear’s automatic volume control) starts. Models with higher middle-ear gain push the cochlea into compression at lower input levels; lower gain delays compression. If you compare models without accounting for this, you might misinterpret their behavior.

2) Cochlear filtering and compression: on-frequency vs. off-frequency

  • On-frequency channels (the cochlear filter tuned to the tone’s frequency) often showed compression: as you increase input level, output grows less than linearly—like an automatic volume limiter.
  • Off-frequency channels (nearby filters) were usually more linear (output grows more proportionally), which matches biology. However, a couple of effective models also showed compression off-frequency, which can lead to unrealistic level balances between filters if not carefully set.

At very high levels, some models showed distortions in the frequency response (especially in the high tails of the filters). That’s a side effect of strong compression and depends on how the nonlinear stage is implemented.

3) Frequency selectivity (how sharp the filters are) changes with level

Filter sharpness is often measured with a “Q factor”: higher Q means sharper tuning.

  • At lower levels (40 dB), biophysical and phenomenological models matched sharper tuning curves often used in physiological studies, while effective models matched broader tuning curves commonly used in perceptual models.
  • As sounds got louder (70 to 100 dB), biophysical and phenomenological models’ filters broadened (Q dropped), especially at lower frequencies. That’s realistic: real cochlear filters get wider as level increases.
  • Many effective models stayed nearly the same across levels (they’re simpler and often level-independent inside their main passband).
  • One nonlinear effective model (king2019) mainly compresses the on-frequency channel, so its -3 dB bandwidth didn’t change much, even though there was broadening outside that core region.

This matters because filter sharpness affects how well the model separates nearby sounds (like notes in music or consonants in speech).

4) How many filters do you need?

If filters get wider at high levels, you need fewer of them to cover the frequency range without gaps. Biophysical models at 100 dB had much wider filters (so fewer were needed to cover the range). Effective models generally needed more filters to achieve the same overlap. This affects speed and memory: fewer filters mean faster computation, but only if the behavior matches your study needs.

5) Inner hair cell and auditory nerve: envelope extraction and adaptation

  • Simple IHC stages act like envelope detectors: they keep slower changes (the “outline” of the sound) and reduce phase details at high frequencies.
  • The more detailed IHC in verhulst2018 models the biophysics of hair cells more closely (three-channel Hodgkin–Huxley style), which can capture richer behavior.
  • Auditory nerve stages include adaptation: the nerve fires a lot at sound onset, then settles down. Models simulate fibers with high, medium, or low “spontaneous rates” to reflect the variety found in biology. The researchers standardized fiber mixes and repeated simulations where needed to get stable results.

6) Early brain processing: rhythm detectors around ~80 Hz

All model families can feed into a stage that’s sensitive to sound modulations (the “beats” or rhythmic envelope). The biophysical and phenomenological models often used the SFIE circuit (Same-Frequency Inhibition-Excitation), which behaves like a broad modulation filter with a best rate near 80 Hz. Effective models use modulation filter banks (sets of rhythm detectors) with different tunings and ranges. This is important for predicting how speech clarity and certain sound features are tracked by the brain.

What this means in practice

  • Choose the model that fits your goal:
    • Want biological realism (for neuroscience)? Biophysical or phenomenological models.
    • Want speed and good enough predictions of listening performance (for engineering or hearing-aid algorithms)? Functional-effective models.
  • Be careful with levels and calibration. Middle-ear settings can shift where compression starts, changing model behavior.
  • Check whether you need level-dependent filter behavior. Real ears have filters that broaden with loudness; many simple models don’t.
  • For tasks involving rhythm or speech envelopes, make sure the modulation stage matches the rates you care about (e.g., ~4–16 Hz for speech syllable rates vs. ~80 Hz for certain brainstem sensitivities).

Why this research is useful

  • It gives a clear, side-by-side look at eight widely used hearing models, showing where they agree and where they differ.
  • It highlights the trade-off between realism and speed.
  • It offers practical tips on configuration and comparability, encouraging reproducible research with open tools (AMT).
  • The insights can guide better choices in:
    • Designing hearing aids and audio processing
    • Building speech intelligibility predictors
    • Planning neuroscience experiments and simulations
    • Creating machine-hearing systems that respect human hearing limits

In short, this study helps researchers and engineers pick and tune the right “digital ear” for their job, and reminds everyone to be careful when applying a model outside the conditions it was built or tested for.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.