Spatial Audio Quality Inventory (SAQI)
- The Spatial Audio Quality Inventory (SAQI) is a structured framework defining and evaluating spatial audio quality through specific technical and perceptual measurement criteria.
- SAQI benchmarks spatial audio methods and employs technical (beam pattern, SNR) and perceptual (localization) metrics to quantify performance.
- The framework guides experimental design by recommending reproduction methods, minimum speaker counts, and metrics tailored to evaluating specific spatial algorithms and tasks.
The Spatial Audio Quality Inventory (SAQI) is a structured framework for categorizing, evaluating, and guiding the assessment of spatial audio quality, originally motivated by the need for systematic, perceptually meaningful evaluation in laboratory and applied contexts such as hearing aid research and immersive media. In its foundational realization, SAQI provides a comprehensive set of technical and perceptual performance measures tailored to the properties and limitations of different spatial audio reproduction methods, and offers practical benchmarks for experimental design and technology assessment.
1. Spatial Audio Reproduction Methods and Their Technical Basis
SAQI, as implemented in the context of hearing aid evaluation, benchmarks three primary spatial audio reproduction methods, each with distinct mathematical and perceptual implications for spatial fidelity and artifact generation:
- Nearest Speaker Panning (NSP, Discrete Speakers): Selects the loudspeaker closest to each virtual source, invoking
This method’s resolution is limited by the angular spacing between speakers, producing pronounced spatial “jumps” and discrete source localization.
- Vector Base Amplitude Panning (VBAP): Interpolates between the two nearest loudspeakers using amplitude weights,
where denotes the unit vector to the source and the unit vectors to the speakers. VBAP provides smoother source movement and reduces angular quantization error relative to NSP, but remains bounded by speaker density.
- Higher Order Ambisonics (HOA): Encodes the sound field with spherical/cylindrical harmonics, achieving
for each loudspeaker and source-loudspeaker azimuth . HOA yields the highest spatial resolution; more speakers and higher order allow for accurate spatial imaging and extended alias-free frequency reproduction.
Spatial aliasing in interpolative methods is governed by
where is the speed of sound, listener distance from array center, and is the minimum number of speakers for the HOA order. This sets the usable bandwidth, critical for matching material to array design.
2. Performance Measures: Technical and Perceptual Quantification
The SAQI framework recognizes that evaluation must span both technical accuracy and subjective perception. Accordingly, it leverages a multi-criteria measurement approach for quantifying the suitability of each reproduction method:
- Beam Pattern Analysis: For directional (e.g., beamformer) algorithms, the deviation in spatial directivity as a function of frequency is expressed as an RMS error:
where denotes the dB gain difference.
- Signal-to-Noise Ratio (SNR) Analysis: Captures SNR improvements/deficits in multi-source noise configurations,
relevant for adaptive filtering and beamforming.
- Perceptual Localization Prediction: Uses computational binaural models to predict perceived direction of arrival (DOA) and root-mean-square localization error, quantifying spatial cue preservation.
- Audio Quality Modeling (Spectral Distortion): Computes distances in excitation patterns via auditory filterbanks, as in Moore (2004), measuring coloration and perceived naturalness.
3. Reproduction Method/Algorithm Class Interactions
SAQI not only ranks technical metrics, but also systematically studies interaction effects between spatial reproduction schemes and classes of hearing aid algorithms:
- Directional Beamformers: Highly sensitive to spatial inaccuracies and insufficient channel counts; only HOA and high-density VBAP accurately sustain functionality, particularly above 2 kHz.
- Adaptive Differential Microphones (ADM): NSP offers the best match with real-world SNR improvements; increasing speaker count above 8 gives minimal additional benefit, and interpolative methods do not improve fidelity.
- Binaural Noise Reduction: Algorithm outcomes depend chiefly on the number of loudspeakers and listening position, not the reproduction algorithm per se, provided spatial aliasing is controlled.
- Single Channel Noise Reduction: Although not explicitly spatial, HOA/VBAP outperform NSP in matching SNR enhancement patterns in spatially complex fields.
The paper further demonstrates that interpolative methods (VBAP, HOA) are susceptible to performance degradation with off-center listener positioning, diminishing their advantage in practical (non-head-fixed) applications.
4. Best Practices in Experimental Design and Method Selection
SAQI’s synthesis yields pragmatic recommendations for experimental protocols in spatial audio research:
Task/Algorithm | Best Reproduction Method | Min. # Speakers (central) | Notes |
---|---|---|---|
Beam Pattern Analysis | HOA or VBAP | ≥18 (4 kHz) | Increased for mobile listeners |
ADM SNR Analysis | NSP (Discrete) | ≥8 | Diminishing returns >8 |
Single Channel NR | VBAP or HOA | ≥18 | Optimal for SNR behavior |
General Localization | HOA | ≥8 | HOA’s advantage declines off-center |
Fewere than 8 loudspeakers is inadequate for most analytical and perceptual outcomes. The spatial aliasing criterion should drive array design, and method selection must be tailored to algorithm class and experimental aims.
5. Mathematical Foundations and Core Formulas
SAQI explicitly enumerates the core mathematical structures underpinning spatial audio reproduction and quality assessment, including:
- Driving weights () for NSP, VBAP, HOA
- Alias-free frequency threshold
- Beam pattern and SNR error metrics
- Interaural coherence and perceptual distortion measures
These formulas enable transparent, replicable, and theoretically grounded evaluation protocols.
6. Practical Impact and Laboratory Implementation
The SAQI framework empowers researchers and practitioners to:
- Systematically benchmark reproduction methods for laboratory hearing aid/algorithm evaluation
- Quantify the interaction between spatial resolution, algorithmic function, and perceptual outcomes
- Guide hardware and protocol selection for psychoacoustic studies and device development
By establishing best practices and unifying technical, perceptual, and interaction-based criteria, SAQI advances the field’s ability to design controlled, scalable, and perceptually relevant spatial audio experiments.
Summary Table: Practical Recommendations
Dimension | Key SAQI Guidance |
---|---|
Reproduction method | Select based on task: HOA for localization, NSP for ADM, etc. |
Array size | ≥8 for basic, ≥18 for high-fidelity (esp. >2 kHz or mobile listeners) |
Artifact control | Apply spatial aliasing criterion; more speakers for wider/moving tests |
Performance metrics | Use beam pattern error, SNR error, perceptual localization error |
Experimental validity | Select configuration per algorithm spatial sensitivity, not convenience |
The Spatial Audio Quality Inventory thus serves as a robust reference for evaluating and optimizing the technical and perceptual integrity of spatial audio processing chains, particularly in research and product development contexts emphasizing hearing aid and spatial algorithm testing.