Spatial Audio Quality Inventory (SAQI)

Updated 1 July 2025

The Spatial Audio Quality Inventory (SAQI) is a structured framework defining and evaluating spatial audio quality through specific technical and perceptual measurement criteria.
SAQI benchmarks spatial audio methods and employs technical (beam pattern, SNR) and perceptual (localization) metrics to quantify performance.
The framework guides experimental design by recommending reproduction methods, minimum speaker counts, and metrics tailored to evaluating specific spatial algorithms and tasks.

The Spatial Audio Quality Inventory (SAQI) is a structured framework for categorizing, evaluating, and guiding the assessment of spatial audio quality, originally motivated by the need for systematic, perceptually meaningful evaluation in laboratory and applied contexts such as hearing aid research and immersive media. In its foundational realization, SAQI provides a comprehensive set of technical and perceptual performance measures tailored to the properties and limitations of different spatial audio reproduction methods, and offers practical benchmarks for experimental design and technology assessment.

1. Spatial Audio Reproduction Methods and Their Technical Basis

SAQI, as implemented in the context of hearing aid evaluation, benchmarks three primary spatial audio reproduction methods, each with distinct mathematical and perceptual implications for spatial fidelity and artifact generation:

Nearest Speaker Panning (NSP, Discrete Speakers): Selects the loudspeaker closest to each virtual source, invoking

$w_k = \begin{cases} 1 & k = k_{\min} \ 0 & \text{otherwise} \end{cases}$

This method’s resolution is limited by the angular spacing between speakers, producing pronounced spatial “jumps” and discrete source localization.

Vector Base Amplitude Panning (VBAP): Interpolates between the two nearest loudspeakers using amplitude weights,

$[w_l, w_m] = \hat{\mathbf{r}}^T \mathbf{S}_{l,m}^{-1}$

where $\hat{\mathbf{r}}$ denotes the unit vector to the source and $\mathbf{S}_{l,m}$ the unit vectors to the speakers. VBAP provides smoother source movement and reduces angular quantization error relative to NSP, but remains bounded by speaker density.

Higher Order Ambisonics (HOA): Encodes the sound field with spherical/cylindrical harmonics, achieving

$w_k = \frac{\sin \left( \frac{1}{2}(N-1)\varphi_k \right)}{N \sin\left( \frac{1}{2}\varphi_k \right)}$

for each loudspeaker $k$ and source-loudspeaker azimuth $\varphi_k$ . HOA yields the highest spatial resolution; more speakers and higher order allow for accurate spatial imaging and extended alias-free frequency reproduction.

Spatial aliasing in interpolative methods is governed by

$f \leq \frac{c}{4\pi r} N_{\min}$

where $c$ is the speed of sound, $r$ listener distance from array center, and $N_{\min}$ is the minimum number of speakers for the HOA order. This sets the usable bandwidth, critical for matching material to array design.

2. Performance Measures: Technical and Perceptual Quantification

The SAQI framework recognizes that evaluation must span both technical accuracy and subjective perception. Accordingly, it leverages a multi-criteria measurement approach for quantifying the suitability of each reproduction method:

Beam Pattern Analysis: For directional (e.g., beamformer) algorithms, the deviation in spatial directivity as a function of frequency is expressed as an RMS error:

$E(f) = \sqrt{\sum_\alpha (\Delta G(\alpha, f))^2}$

where $\Delta G$ denotes the dB gain difference.

Signal-to-Noise Ratio (SNR) Analysis: Captures SNR improvements/deficits in multi-source noise configurations,

$E(f) = \sqrt{\frac{1}{9} \sum_{R_{i,n}} \left[ \Delta R_{\text{ref}}(f) - \Delta R_{\text{test}}(f) \right]^2}$

relevant for adaptive filtering and beamforming.

Perceptual Localization Prediction: Uses computational binaural models to predict perceived direction of arrival (DOA) and root-mean-square localization error, quantifying spatial cue preservation.
Audio Quality Modeling (Spectral Distortion): Computes distances in excitation patterns via auditory filterbanks, as in Moore (2004), measuring coloration and perceived naturalness.

3. Reproduction Method/Algorithm Class Interactions

SAQI not only ranks technical metrics, but also systematically studies interaction effects between spatial reproduction schemes and classes of hearing aid algorithms:

Directional Beamformers: Highly sensitive to spatial inaccuracies and insufficient channel counts; only HOA and high-density VBAP accurately sustain functionality, particularly above 2 kHz.
Adaptive Differential Microphones (ADM): NSP offers the best match with real-world SNR improvements; increasing speaker count above 8 gives minimal additional benefit, and interpolative methods do not improve fidelity.
Binaural Noise Reduction: Algorithm outcomes depend chiefly on the number of loudspeakers and listening position, not the reproduction algorithm per se, provided spatial aliasing is controlled.
Single Channel Noise Reduction: Although not explicitly spatial, HOA/VBAP outperform NSP in matching SNR enhancement patterns in spatially complex fields.

The study further demonstrates that interpolative methods (VBAP, HOA) are susceptible to performance degradation with off-center listener positioning, diminishing their advantage in practical (non-head-fixed) applications.

4. Best Practices in Experimental Design and Method Selection

SAQI’s synthesis yields pragmatic recommendations for experimental protocols in spatial audio research:

Task/Algorithm	Best Reproduction Method	Min. # Speakers (central)	Notes
Beam Pattern Analysis	HOA or VBAP	≥18 (4 kHz)	Increased for mobile listeners
ADM SNR Analysis	NSP (Discrete)	≥8	Diminishing returns >8
Single Channel NR	VBAP or HOA	≥18	Optimal for SNR behavior
General Localization	HOA	≥8	HOA’s advantage declines off-center

Fewere than 8 loudspeakers is inadequate for most analytical and perceptual outcomes. The spatial aliasing criterion should drive array design, and method selection must be tailored to algorithm class and experimental aims.

5. Mathematical Foundations and Core Formulas

SAQI explicitly enumerates the core mathematical structures underpinning spatial audio reproduction and quality assessment, including:

Driving weights ( $w_k$ ) for NSP, VBAP, HOA
Alias-free frequency threshold
Beam pattern and SNR error metrics
Interaural coherence and perceptual distortion measures

These formulas enable transparent, replicable, and theoretically grounded evaluation protocols.

6. Practical Impact and Laboratory Implementation

The SAQI framework empowers researchers and practitioners to:

Systematically benchmark reproduction methods for laboratory hearing aid/algorithm evaluation
Quantify the interaction between spatial resolution, algorithmic function, and perceptual outcomes
Guide hardware and protocol selection for psychoacoustic studies and device development

By establishing best practices and unifying technical, perceptual, and interaction-based criteria, SAQI advances the field’s ability to design controlled, scalable, and perceptually relevant spatial audio experiments.

Summary Table: Practical Recommendations

Dimension	Key SAQI Guidance
Reproduction method	Select based on task: HOA for localization, NSP for ADM, etc.
Array size	≥8 for basic, ≥18 for high-fidelity (esp. >2 kHz or mobile listeners)
Artifact control	Apply spatial aliasing criterion; more speakers for wider/moving tests
Performance metrics	Use beam pattern error, SNR error, perceptual localization error
Experimental validity	Select configuration per algorithm spatial sensitivity, not convenience

The Spatial Audio Quality Inventory thus serves as a robust reference for evaluating and optimizing the technical and perceptual integrity of spatial audio processing chains, particularly in research and product development contexts emphasizing hearing aid and spatial algorithm testing.

PDF Markdown Chat (Pro)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Spatial Audio Quality Inventory (SAQI).