ODAQ: Open Dataset of Audio Quality
- ODAQ is an open repository of high-fidelity stereo audio paired with expert quality scores for reliable audio assessments.
- It organizes raw, processed, and anchor audio files across six degradation classes with precisely controlled quality levels.
- The dataset facilitates benchmarking objective metrics, training machine learning models, and studying psychoacoustic phenomena.
The Open Dataset of Audio Quality (ODAQ) is a rigorously designed, openly licensed corpus of stereo audio stimuli paired with expert listener subjective quality scores. It explicitly targets the reproducible evaluation of both subjective and objective measures of perceived audio quality, providing high-fidelity material, finely parameterized artifacts, and comprehensive metadata to support a wide spectrum of audio-quality research (Torcoli et al., 2023).
1. Structure and Composition
ODAQ comprises processed stereo audio excerpts, ground-truth references, and anchor stimuli, designed to cover the artifact space relevant to modern audio coding and source separation. The core dataset is organized into:
- Raw audio: 25 original studio-quality stereo signals (11 movie-like soundtracks, 14 music excerpts), sampled at either 44.1 or 48 kHz.
- Processed audio: Six degradation class directories, each mapping five quality levels (Q1–Q5) across 30 trials, yielding 900 processed files in the initial release.
- Anchors: Two fixed low-pass filtered anchors at 3.5 kHz and 7 kHz per trial as MUSHRA standards.
- References: Pristine, unprocessed counterparts for every trial.
Each stimulus is distributed as an uncompressed 16-bit PCM stereo WAV at the original sampling rate. Metadata includes signal ID, artifact type, artifact parameters, loudness normalization level (–23 LUFS), and reference mappings (Torcoli et al., 2023).
2. Processing-Method Classes and Quality Levels
Six processing-method classes encapsulate real-world degradations encountered in coding and source-separation scenarios. Each class is exhaustively spanned by five parameterized quality levels:
| Code | Artifact | Key Parameters | Quality Param Span |
|---|---|---|---|
| LP | Low-pass | Brick-wall cutoff | 5.0–15.0 kHz |
| TM | Tonality Mismatch | Tonal/noise crossover, spectral envelope transplant | 5.0–15.0 kHz |
| UN | Unmasked Noise | Tonal→noise substitution, spectral envelope shape | 5.0–15.0 kHz |
| SH | Spectral Holes | Transform-domain zeroing, probability p_hole | 70–10% |
| PE | Pre-Echo | NMR (10, 16 dB); STFT block size (1024, 2048, 4096) | 5 NMR/block size configs |
| DE | Dialogue Enhancement | 5 different separation systems and remix gain α | Oracle/4 DNNs; α = –20 dB |
The processing ensures the artifact’s perceptual quality is distributed from “Poor” to “Excellent” (original-paper scale: 15–95) on the MUSHRA scale (Torcoli et al., 2023). Parameters are explicitly controlled and documented to facilitate artifact-specific research.
3. Subjective Quality Assessment Protocol
ODAQ subjects each stimulus to formal perceptual evaluation using the ITU-R BS.1534 MUSHRA protocol. Key features of the subjective test:
- Participants: Initially, 26 expert listeners (mean age 37.5 years, SD 8.5) with mean professional audio experience of 12.7 years, from Fraunhofer IIS (DE) and Netflix (US). Expansion studies increased the cohort to 42 listeners, including undergraduate student groups with protocolically controlled training (Dick et al., 1 Apr 2025).
- Environment: Sessions carried out in acoustically treated rooms, with calibrated Beyerdynamic DT770 Pro headphones and standardized workstations.
- Trial structure: Each trial contains eight stimuli—reference, two low-pass anchors, and five processed versions from a single artifact class/level.
- Rating: Listeners assign continuous MUSHRA scores (0–100). Data is post-processed to calculate per-stimulus means, standard deviations, and 95% confidence intervals (via Student’s t-distribution on N ratings).
ODAQ rigorously randomizes trial order, controls environment, and enforces listener screening (reference score ≥90 in ≥85% of trials), ensuring dataset reliability and cross-lab consistency (Torcoli et al., 2023).
4. Expansion, Stereo Processing, and Analysis
The initial ODAQ focused on monaural artifacts and standard stereo processing. Subsequent expansions systematically extended the corpus and protocols:
- Dataset expansion: Added two undergraduate cohorts (B1 anchor-informed, B2 anchor-blind), with explicit analysis of scale usage and anchoring bias (Dick et al., 1 Apr 2025). Resulting dataset: 42 listeners and 10,080 subjective trial ratings.
- Stereo processing: New subcorpora introduced artifacts adapted via both Left/Right (LR) and Mid/Side (MS) domain processing, across stereo signals varying from solo, centered, wide, to hard-panned mixes (Dick et al., 16 Dec 2025, Delgado et al., 11 Dec 2025).
- Findings: Listener sensitivity is dominated by timbral artifacts unless spatial image differences are made salient via direct comparison. Hard-panned content amplifies LR/MS distinctions; spatial image is used primarily as a tie-breaker when timbral cues are confounded (Dick et al., 16 Dec 2025).
- Mono anchor inclusion: Mean score 65 on MUSHRA, with low variance and no significant interaction with stereo width, reinforcing timbral dominance in listener judgement.
5. Benchmarking Objective Audio Quality Metrics
ODAQ's structure—with exhaustive parameterization and expert subjective scores—enables granular evaluation and comparison of objective audio quality metrics:
- Metrics benchmarked: Traditional intrusive metrics (NMR, PEAQ ODG, PEAQ-CSM, PEMO-Q, 2f-model), non-intrusive/binaural models (MoBi-Q, eMoBi-Q), and recent data-driven models (ViSQOLAudio V3, SMAQ, DNSMOS).
- Performance: Highest aggregate correlation with mean subjective quality (across artifacts): NMR, PEAQ-CSM at ρ ≈ 0.89; PEMO-Q(fb), 2f-model similar; ViSQOLAudio and SMAQ moderate (ρ ≈ 0.77); DNSMOS, SI-SDR, and other speech-centric models underperform on mixed-content material (Dick et al., 1 Apr 2025).
- Stereo-specific insights: Timbre-centric metrics remain robust for “simple” LR or MS-only contexts (ρ > 0.9), but metric performance deteriorates in “mixed” LR/MS presentation (mean ρ ≈ 0.44) and for hard-panned items, indicating limitations of channel-averaging and inadequacy of current models for fine-grained spatial artifact assessment (Delgado et al., 11 Dec 2025).
6. Availability, Licensing, and Usage Recommendations
ODAQ and all expansion data, software, and metadata are distributed under permissive licenses:
- Audio + data: CC-BY 4.0; software: MIT License.
- Access: Publicly available through GitHub (https://github.com/Fraunhofer-IIS/ODAQ/), Zenodo, and included in open audio quality evaluation suites (e.g., OpenACE (Coldenhoff et al., 12 Sep 2024)).
- Recommended use cases:
- Benchmarking cross-family audio quality metrics over coding and source-separation artifacts.
- Training of machine learning models for granular perceptual quality prediction.
- Studying listener variability, anchoring effects, and the psychoacoustic basis of quality assessment.
- Extension with additional codecs, artifact types, or more diverse source material, with systematic reapplication of the ODAQ protocol (Torcoli et al., 2023, Dick et al., 1 Apr 2025).
7. Practical Implications and Research Directions
ODAQ establishes a reproducible, extensible foundation for audio-quality research:
- Enables meta-analyses of perceptual score distributions, scale usage, and listener bias.
- Provides a robust baseline for the development and comparative evaluation of new objective and machine learning-driven quality metrics.
- Supports investigation of timbral versus spatial contributions to perceived audio quality, informed by controlled stereo artifact studies (Dick et al., 16 Dec 2025).
- Facilitates public leaderboard competition and community-driven augmentation of test material, with granular per-trial subjective scores supporting diagnostic and interpretive research (Dick et al., 1 Apr 2025).
A plausible implication is that as new objective and data-driven metrics are developed, ODAQ will remain a touchstone for validation due to its combination of artifact coverage, expert annotation, and open accessibility.