Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ODAQ: Open Dataset of Audio Quality (2401.00197v1)

Published 30 Dec 2023 in eess.AS

Abstract: Research into the prediction and analysis of perceived audio quality is hampered by the scarcity of openly available datasets of audio signals accompanied by corresponding subjective quality scores. To address this problem, we present the Open Dataset of Audio Quality (ODAQ), a new dataset containing the results of a MUSHRA listening test conducted with expert listeners from 2 international laboratories. ODAQ contains 240 audio samples and corresponding quality scores. Each audio sample is rated by 26 listeners. The audio samples are stereo audio signals sampled at 44.1 or 48 kHz and are processed by a total of 6 method classes, each operating at different quality levels. The processing method classes are designed to generate quality degradations possibly encountered during audio coding and source separation, and the quality levels for each method class span the entire quality range. The diversity of the processing methods, the large span of quality levels, the high sampling frequency, and the pool of international listeners make ODAQ particularly suited for further research into subjective and objective audio quality. The dataset is released with permissive licenses, and the software used to conduct the listening test is also made publicly available.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. ISO/IEC JTC1/SC29/WG11, “USAC verification test report N12232,” Int. Org. Standardisation (ISO), 2011.
  2. “Objective measures of perceptual audio quality reviewed: An evaluation of their application domain dependence,” IEEE/ACM Trans. Audio, Speech, Language Proc., vol. 29, pp. 1530–1541, 2021.
  3. A. Biswas and H. Mundt, “AudioVMAF: Audio quality prediction with VMAF,” in Audio Eng. Soc. (AES) Conv., New York, USA, Oct. 2023.
  4. ITU-R, “Recommendation ITU-R BS.1534-3: Method for the subjective assessment of intermediate quality level of audio systems,” Int. Telecommunication Union (ITU), Radiocommunication Sector, Oct. 2015.
  5. “A multitask teacher-student framework for perceptual audio quality assessment,” in European Sig. Proc. Conf. (EUSIPCO), Dublin, Ireland, Aug. 2021, pp. 396–400.
  6. “Generation and evaluation of isolated audio coding artifacts,” in Audio Eng. Soc. (AES) Conv., New York, USA, Oct. 2017.
  7. P. M. Delgado and J. Herre, “A data-driven cognitive salience model for objective perceptual audio quality assessment,” in IEEE Int. Conf. Acoustics, Speech and Sig. Proc. (ICASSP), Singapore, May 2022, pp. 986–990.
  8. ISO/IEC JTC1/SC29/WG11, “Report on the verification test of MPEG-4 enhanced low delay AAC N10032,” Int. Org. Standardisation (ISO), 2008.
  9. ISO/IEC JTC1/SC29/WG11, “Submission and evaluation procedures for 3D audio N13633,” Int. Org. Standardisation (ISO), 2013.
  10. ITU-R, “Recommendation ITU-R BS.1387-1: Method for objective measurements of perceived audio quality,” Int. Telecommunication Union (ITU), Radiocommunication Sector, Nov. 2001.
  11. H. Martinez and M. Farias, “A no-reference audio-visual video quality metric,” in European Sig. Proc. Conf. (EUSIPCO), Lisbon, Portugal, Sep. 2014, pp. 2125–2129.
  12. “Results of the public multiformat listening test,” http://listening-test.coresv.net/results.htm, July 2014, Accessed: 2023-08-22.
  13. EBU, “Sound quality assessment material recordings for subjective tests - EBU SQAM CD,” European Broadcasting Union (EBU), Tech 3253, Sep. 2008.
  14. J. Herre and S. Dick, “Introducing the free web edition of the ”Perceptual Audio Coders – What To Listen For” educational material,” in Audio Eng. Soc. (AES) Conv., Helsinki, Finland, May 2023.
  15. “Subjective and objective quality assessment of audio source separation,” IEEE Trans. Audio, Speech, Language Proc., vol. 19, no. 7, pp. 2046–2057, Jan. 2011.
  16. T. Kastner and J. Herre, “The SEBASS-DB: A consolidated public data base of listening test results for perceptual evaluation of BSS quality measures,” in Int. Workshop Acoustic Sig. Enhancement (IWAENC), Bamberg, Germany, Sep. 2022.
  17. “NISQA: A deep CNN-self-attention model for multidimensional speech quality prediction with crowdsourced datasets,” in Interspeech, Brno, Czechia, Aug. 2021, pp. 2127–2131.
  18. “ConferencingSpeech 2022 challenge: Non-intrusive objective speech quality assessment (NISQA) challenge for online conferencing applications,” in Interspeech, Incheon, Korea, Sep. 2022, pp. 3308–3312.
  19. “DNSMOS p.835: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors,” in IEEE Int. Conf. Acoustics, Speech and Sig. Proc. (ICASSP), Singapore, Singapore, May 2022, pp. 886–890.
  20. “SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis,” in Interspeech, Incheon, Korea, 2022, pp. 2388–2392.
  21. “On some biases encountered in modern audio quality listening tests-a review,” J. Audio Eng. Soc. (JAES), vol. 56, no. 6, pp. 427–451, June 2008.
  22. “Open-Unmix - A reference implementation for music source separation,” J. Open Source Software, vol. 4, no. 41, pp. 1667, 2019.
  23. “KUIELab-MDX-Net: A two-stream neural network for music demixing,” in Music Demixing Workshop, Nov. 2021.
  24. “The cocktail fork problem: Three-stem audio separation for real-world soundtracks,” in IEEE Int. Conf. Acoustics, Speech and Sig. Proc. (ICASSP), Singapore, Singapore, May 2022, pp. 526–530.
  25. “DeepFilterNet2: Towards real-time speech enhancement on embedded devices for full-band audio,” in Int. Workshop Acoustic Sig. Enhancement (IWAENC 2022), Bamberg, Germany, Sep. 2022.
  26. K. Brandenburg, “Evaluation of quality for audio encoding at low bit rates,” in Audio Eng. Soc. (AES) Conv., London, UK, March 1987.
  27. EBU R 128, “Loudness normalisation and permitted maximum level of audio signals,” European Broadcasting Union (EBU), Aug. 2020.
  28. M. Torcoli and E. Habets, “Better together: Dialogue separation and voice activity detection for audio personalization in TV,” in IEEE Int. Conf. Acoustics, Speech and Sig. Proc. (ICASSP), Rhodes Island, Greece, June 2023.
  29. “Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks,” in IEEE Int. Conf. Acoustics, Speech and Sig. Proc. (ICASSP), South Brisbane, Australia, April 2015, pp. 708–712.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub

X Twitter Logo Streamline Icon: https://streamlinehq.com