Papers
Topics
Authors
Recent
2000 character limit reached

Bone Conduction Functions (BCFs)

Updated 4 December 2025
  • Bone Conduction Functions (BCFs) are transfer functions that model how speech vibrations propagate through bone, linking sensor data and cochlear dynamics.
  • They integrate analytical and data-driven methods to estimate spectral characteristics, enabling effective multi-modal speech enhancement in earable devices.
  • BCFs facilitate synthetic vibration data generation and advanced cochlear modeling, delivering measurable improvements in speech processing performance.

Bone Conduction Functions (BCFs) quantify and model the physical and signal-processing pathways by which vibrations propagate through bone and are detected by inertial sensors or contribute to hearing perception. BCFs form the foundation for developing multi-modal speech enhancement, earable device signal processing, and analytical cochlear modeling. They formalize the bone pathway as a transfer function—linking ground-truth speech, vibrational sensor readings, and, in the case of physiological modeling, basilar membrane dynamics. Recent research has advanced both data-driven and analytical frameworks for BCF estimation and utilization, enabling synthetic vibration data generation and quantitative predictions of bone-conducted hearing and emissions.

1. Mathematical Formalism and Physical Interpretation

The Bone Conduction Function is defined as a mapping from the clean speech signal to the bone-conducted vibration as measured by an inertial sensor or modeled within the cochlea. For device-level modeling, as in VibOmni, this mapping is formalized in the time domain: svib(t)=f[sspeech](t)+ϵvib(t),smic(t)=sspeech(t)+ϵmic(t)s_{\mathrm{vib}}(t) = f[s_{\mathrm{speech}}](t) + \epsilon_{\mathrm{vib}}(t), \quad s_{\mathrm{mic}}(t) = s_{\mathrm{speech}}(t) + \epsilon_{\mathrm{mic}}(t) where svibs_{\mathrm{vib}} is the bone-conduction (accelerometer) measurement, smics_{\mathrm{mic}} is the microphone signal, sspeechs_{\mathrm{speech}} the clean speech, and ϵ\epsilon_{*} the respective noise processes (He et al., 2 Dec 2025).

When treated as a linear time-invariant system, ff is represented as a convolution with an impulse response h(t)h(t), such that

svib(t)=(hsspeech)(t)+ϵvib(t)s_{\mathrm{vib}}(t) = (h * s_{\mathrm{speech}})(t) + \epsilon_{\mathrm{vib}}(t)

In the frequency domain, this is

F(ω)Svib(ω)Sspeech(ω)F(\omega) \approx \frac{S_{\mathrm{vib}}(\omega)}{S_{\mathrm{speech}}(\omega)}

capturing the characteristic low-pass and user-dependent filtering as speech propagates through the head.

In cochlear modeling, the BCF is rigorously derived as a transfer function Hbc(ω)H_{bc}(\omega) mapping bone deformation amplitudes to basilar membrane velocity: Hbc(ω)=2(A2A1)C(A1+A2)Zbm(ω)H_{bc}(\omega) = \frac{2(A_2 - A_1)}{C(A_1 + A_2) Z_{\mathrm{bm}}(\omega)} with A1,A2A_1, A_2 cochlear chamber areas, CC a geometry- and elasticity-dependent coefficient, and ZbmZ_{\mathrm{bm}} the frequency-dependent basilar membrane impedance (Tchumatchenko et al., 2014).

2. Analytical and Empirical Estimation of BCFs

Empirical estimation of BCFs in device contexts addresses the scarcity of paired data by fitting spectral-domain models. The pipeline comprises:

  1. Collecting simultaneous microphone and accelerometer data in controlled (quiet) conditions, segmenting into fixed windows (e.g., 5 s).
  2. Computing power spectral densities Φvv(f)\Phi_{vv}(f) (vibration) and Φss(f)\Phi_{ss}(f) (speech) via Welch’s method.
  3. Estimating the raw BCF as the PSD ratio:

F^raw(f)=Φvv(f)Φss(f)\hat F_{\mathrm{raw}}(f) = \frac{\Phi_{vv}(f)}{\Phi_{ss}(f)}

  1. Modeling the distribution of BCFs across users as a frequency-wise Gaussian:

F(f)N(μ(f),σ2(f))F(f) \sim \mathcal{N}(\mu(f), \sigma^2(f))

where μ(f)\mu(f) and σ2(f)\sigma^2(f) are sample means and variances over all windows and users. This approach does not require end-to-end loss-driven learning—parameters are directly fit from data (He et al., 2 Dec 2025).

Analytically, the physiological BCF is derived from coupled wave equations in the cochlea and depends on anatomical and material properties. The model yields closed-form expressions for Hbc(ω)H_{bc}(\omega) and the dispersion relations for bone-conduction modes, explicitly parameterizing frequency selectivity, energy transmission, and the mechanisms coupling bone deformation to sensory excitation (Tchumatchenko et al., 2014).

3. Synthesis of Vibration Data and BCF-Augmented Learning

Synthetic vibration generation leverages the BCF as a parametric frequency-domain filter over arbitrary audio, enabling large-scale augmentation for training multi-modal neural networks. The procedure is:

  • Compute the STFT magnitude A[n,f]=STFT{x[t]}A[n,f] = |STFT\{x[t]\}| from an input waveform.
  • For every frequency bin ff, sample a BCF instance F^(f)N(μ(f),σ2(f))\hat F(f) \sim \mathcal{N}(\mu(f), \sigma^2(f)).
  • Construct the synthetic vibration spectrogram V~[n,f]=A[n,f]×F^(f)\tilde V[n,f] = A[n,f] \times \hat F(f).
  • Synthesize the time-domain vibration v~[t]\tilde v[t] via inverse STFT combined with the original audio phase.

Fidelity is quantified by the mean spectrogram similarity error: Error=1maxn,fSreal(n,f)n,fSreal(n,f)Ssyn(n,f)\mathrm{Error} = \frac{1}{\max_{n,f} S_{\mathrm{real}}(n, f)} \sum_{n,f} |S_{\mathrm{real}}(n, f) - S_{\mathrm{syn}}(n, f)| Empirically, this process achieves an average error of 4.5% across users, indicating high realism of the synthetic signals (He et al., 2 Dec 2025).

4. Integration of BCFs in Multi-Modal Speech Enhancement

In end-to-end neural networks for speech enhancement in earables, BCF-simulated vibration data provides a critical modality. The VibOmni architecture employs:

  • An audio encoder (dilated 2D Conv–BN–ReLU–MaxPool blocks) operating on 16 kHz audio.
  • A vibration encoder (smaller Conv–BN–ReLU–MaxPool stack) tailored for lower-frequency, downsampled accelerometer signals.
  • Channel-wise feature fusion, followed by a dual-path RNN separator to model dependencies across time and frequency.
  • Two decoder branches: a main fusion decoder that outputs a full-band spectral mask for enhanced audio, and an auxiliary decoder predicting the clean low-band speech from vibration features alone.

The training loss combines scale-invariant signal-to-distortion ratio (SI-SNR) on both wideband and lowband reconstructions: L=LSISNR(s,s^)+0.05LSISNR(slow,s^low)L = L_{\mathrm{SISNR}}(s, \hat s) + 0.05\,L_{\mathrm{SISNR}}(s_{\mathrm{low}}, \hat s_{\mathrm{low}}) This architecture is empirically shown to capitalize on the complementarity of audio and vibration, especially under adverse noise conditions (He et al., 2 Dec 2025).

5. Quantitative Evaluation and Impact

In rigorous in-the-wild evaluation, BCF-based augmentation is vital for practical system performance. Ablation studies removing BCF-augmented synthetic data result in a prominent drop in metrics: PESQ from 2.6 → 1.9, SNR from 15.6 → 14.0 dB, and LSD from 3.5 → 5.0.

With BCFs integrated, the VibOmni system achieves:

  • 21% relative improvement in PESQ (e.g., 2.21 → 2.7)
  • 26% SNR gain (e.g., 12.4 → 15.6 dB)
  • 40–44% reduction in ASR word error rate (38% → 21.5%)
  • User study preference: 87% over audio-only baseline, 72% over unprocessed audio

These results demonstrate both the fidelity of synthetic bone-vibration data and the substantial benefit in deploying BCF-informed multi-modal enhancement (He et al., 2 Dec 2025).

6. BCFs in Cochlear Mechanics and Bone-Conduction Hearing

Analytical modeling of cochlear bone conduction contextualizes BCFs as transfer functions from bone deformation to auditory perception. The eigenanalysis of the coupled cochlear fluid and elastic bone system yields two propagation modes:

  • Slow (basilar-membrane) mode: classic traveling wave, excited by both air- and bone-conduction paths.
  • Fast (cochlear-bone) mode: rapid wave due to bone deformation, with minimal basilar membrane excitation at leading order.

Bone conduction sensitivity (Hbc(ω)|H_{bc}(\omega)|) scales with the area asymmetry (A2A1)/(A1+A2)(A_2-A_1)/(A_1+A_2) and inversely with bone stiffness (EE), and exhibits broadband characteristics with resonance at the local basilar membrane frequency. Bone conduction also bypasses middle ear impedance mismatches, resulting in distinctive temporal features (e.g., rapid otoacoustic emission components with ~1–2 ms delay) (Tchumatchenko et al., 2014).

7. Applications and Outlook

BCFs enable effective modeling of bone-conducted signal pathways in both device-based and physiological contexts. In earable devices, they underpin scalable data augmentation and multi-modal neural speech enhancement. In auditory science, they clarify mechanisms of bone-conducted hearing, clarify the origins of otoacoustic emissions, and link device-centric transfer functions to anatomical and material determinants.

A plausible implication is that further refinement of BCF estimation—from personalized device calibration to subject-specific cochlear parameterization—may yield improvements in robust speech interaction, hearing-assistive technology, and basic auditory modeling.


References:

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Bone Conduction Functions (BCFs).