Speech Loudness in Broadcasting and Streaming (2405.17364v1)
Abstract: The introduction and regulation of loudness in broadcasting and streaming brought clear benefits to the audience, e.g., a level of uniformity across programs and channels. Yet, speech loudness is frequently reported as being too low in certain passages, which can hinder the full understanding and enjoyment of movies and TV programs. This paper proposes expanding the set of loudness-based measures typically used in the industry. We focus on speech loudness, and we show that, when clean speech is not available, Deep Neural Networks (DNNs) can be used to isolate the speech signal and so to accurately estimate speech loudness, providing a more precise estimate compared to speech-gated loudness. Moreover, we define critical passages, i.e., passages in which speech is likely to be hard to understand. Critical passages are defined based on the local Speech Loudness Deviation (SLD) and the local Speech-to-Background Loudness Difference (SBLD), as SLD and SBLD significantly contribute to intelligibility and listening effort. In contrast to other more comprehensive measures of intelligibility and listening effort, SLD and SBLD can be straightforwardly measured, are intuitive, and, most importantly, can be easily controlled by adjusting the speech level in the mix or by enabling personalization at the user's end. Finally, examples are provided that show how the detection of critical passages can support the evaluation and control of the speech signal during and after content production.
- M. Torcoli, C. Simon, J. Paulus et al., “Dialog+ in broadcasting: First field tests using deep-learning-based dialogue enhancement,” in International Broadcasting Convention (IBC) Technical Papers, 2021.
- P. Mapp, “Intelligibility of cinema & tv sound dialogue,” in Audio Engineering Society Convention 141, 2016.
- C. D. Mathers, “A study of sound balances for the hard of hearing,” in BBC White Paper, Report 1991-03, 1991.
- M. Thornton, “Loudness - everything you need to know,” May 2021, https://www.production-expert.com/production-expert-1/loudness-everything-you-need-to-know [Accessed: Feb. 2024].
- L. Ward, M. Paradis, B. Shirley, L. Russon, R. Moore, and R. Davies, “Casualty accessible and enhanced (A&E) audio: Trialling object-based accessible tv audio,” in Audio Engineering Society Convention 147, 2019.
- D. Rieger, C. Simon, M. Torcoli, and H. Fuchs, “Dialogue enhancement with MPEG-H Audio: An update on technology and adoption,” in Audio Engineering Society Convention 154, 2023.
- About Amazon, “Prime video launches a new accessibility feature that makes it easier to hear dialogue in your favorite movies and series,” April 2023, https://www.aboutamazon.com/news/entertainment/prime-video-dialogue-boost [Accessed: Feb. 2024].
- EBU R 128 S4, “Loudness normalisation of cinematic content,” European Broadcasting Union (EBU), Nov. 2023.
- ITU-R, “Recommendation ITU-R BS.1770-5: Algorithms to measure audio programme loudness and true-peak audio level,” Int. Telecommunication Union (ITU), Radiocommunication Sector, Nov. 2023.
- B. Ohlenforst, D. Wendt, S. E. Kramer et al., “Impact of SNR, masker type and noise reduction processing on sentence recognition performance and listening effort as indicated by the pupil dilation response,” Hearing research, vol. 365, pp. 90–99, 2018.
- K. B. Klink, M. Schulte, and M. Meis, “Measuring listening effort in the field of audiology — A literature review of methods (part 1),” Z. Audiol., vol. 51, no. 2, pp. 60–67, 2012.
- M. Nilsson, S. D. Soli, and J. A. Sullivan, “Development of the hearing in noise test for the measurement of speech reception thresholds in quiet and in noise,” The Journal of the Acoustical Society of America, vol. 95, no. 2, pp. 1085–1099, 1994.
- S. Alhanbali, P. Dawes, R. E. Millman, and K. J. Munro, “Measures of listening effort are multidimensional,” Ear and Hearing, vol. 40, no. 5, pp. 1084–1097, 2019.
- M. Torcoli, T. Robotham, and E. A. P. Habets, “Dialogue enhancement and listening effort in broadcast audio: A multimodal evaluation,” in 2022 14th International Conference on Quality of Multimedia Experience (QoMEX), 2022.
- G. M. Rood and S. H. James, “In-flight communication,” Ernsting’s Aviation Medicine, fourth edition. Edward Arnold (Publishers) Ltd, London (UK), pp. 385–394, 2006.
- EBU R 128, “Loudness normalization and permitted maximum level of audio signals,” European Broadcasting Union (EBU), Nov. 2023.
- H. U. Berendes, A. Travaglini, and C. Uhle, “Validating loudness alignment via subjective preference: Towards improving itu-r bs. 1770-4,” in Audio Engineering Society Convention 153, 2022.
- ATSC Doc A/85, “ATSC Recommended practice: Techniques for establishing and maintaining audio loudness for digital television,” Advanced Television Systems Committee (ATSC), Mar. 2013.
- F. Camerer, “On the way to loudness Nirvana - Audio levelling with EBU R 128,” in EBU Technical Review - Q3, 2010.
- EBU Tech Doc 3341, “Loudness metering: ’EBU Mode’ metering to supplement loudness normalisation in accordance with EBU R 128,” European Broadcasting Union (EBU), Nov. 2023.
- AES77-2023, “AES recommended practice loudness guidelines for internet audio streaming and on-demand distribution,” Audio Engineering Society (AES), Jul. 2023.
- J. Riedmiller, C. Robinson, A. Seefeldt, and M. Vinton, “Practical program loudness measurement for effective loudness control,” in Audio Engineering Society Convention 118, 2005.
- AES71-2018, “AES recommended practice loudness guidelines for over the top television and online video distribution,” Audio Engineering Society (AES), Jul. 2018.
- “Netflix sound mix specifications & best practices v1.5,” Netflix Partner Help Center, Nov. 2023, https://partnerhelp.netflixstudios.com/hc/en-us/articles/360001794307-Netflix-Sound-Mix-Specifications-Best-Practices-v1-5 [Accessed: Feb. 2024].
- P. Williams and V. Gondi, “Optimizing the aural experience on android devices with xHE-AAC,” Netflix Technology Blog, Jan. 2021, https://netflixtechblog.com/optimizing-the-aural-experience-on-android-devices-with-xhe-aac-c27714292a33 [Accessed: Feb. 2024].
- Apple Developer Documentation, “Adjusting anchor loudness,” –, https://developer.apple.com/documentation/http-live-streaming/adjusting-anchor-loudness [Accessed: Feb. 2024].
- C. Robinson and M. Vinton, “Automated speech/other discrimination for loudness monitoring,” in Audio Engineering Society Convention 118, 2005.
- E. Skovenborg and T. Lund, “Level-normalization of feature films using loudness vs speech,” in Audio Engineering Society Convention 135, 2013.
- C. Uhle, M. Kratschmer, A. Travaglini, and B. Neugebauer, “Clean dialogue loudness measurements based on deep neural networks,” in Audio Engineering Society Convention 150, 2021.
- EBU Tech Doc 3342, “Loudness range: A measure to supplement EBU R 128 loudness normalization,” European Broadcasting Union (EBU), Nov. 2023.
- F. Kuech, M. Kratschmer, B. Neugebauer, M. Meier, and F. Baumgarte, “Dynamic range and loudness control in mpeg-h 3d audio,” in Audio Engineering Society Convention 139, 2015.
- M. Torcoli, A. Freke-Morin, J. Paulus et al., “Preferred levels for background ducking to produce esthetically pleasing audio for tv with clear speech,” Journal of the Audio Engineering Society, vol. 67, no. 12, pp. 1003–1011, 2019.
- J. Jensen and C. H. Taal, “An algorithm for predicting the intelligibility of speech masked by modulated noise maskers,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 11, pp. 2009–2022, 2016.