Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Integrating IP Broadcasting with Audio Tags: Workflow and Challenges (2407.15423v2)

Published 22 Jul 2024 in eess.AS, cs.AI, cs.MM, and cs.SD

Abstract: The broadcasting industry is increasingly adopting IP techniques, revolutionising both live and pre-recorded content production, from news gathering to live music events. IP broadcasting allows for the transport of audio and video signals in an easily configurable way, aligning with modern networking techniques. This shift towards an IP workflow allows for much greater flexibility, not only in routing signals but with the integration of tools using standard web development techniques. One possible tool could include the use of live audio tagging, which has a number of uses in the production of content. These include from automated closed captioning to identifying unwanted sound events within a scene. In this paper, we describe the process of containerising an audio tagging model into a microservice, a small segregated code module that can be integrated into a multitude of different network setups. The goal is to develop a modular, accessible, and flexible tool capable of seamless deployment into broadcasting workflows of all sizes, from small productions to large corporations. Challenges surrounding latency of the selected audio tagging model and its effect on the usefulness of the end product are discussed.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. Docker: Accelerated container application development. [Online]. Available: https://www.docker.com/
  2. S. Ward and R. Dawes. AudioWatch - live audio monitoring for autumnwatch 2021 - BBC r&d. [Online]. Available: https://www.bbc.co.uk/rd/blog/2021-11-live-audio-monitoring-autumnwatch-ai
  3. Y. Raimond, C. Lowis, R. Hodgson, and J. Tweed, “Automated semantic tagging of speech audio,” in Proceedings of the 21st International Conference on World Wide Web, 2012, pp. 405–408.
  4. K. Levin, I. Ponomareva, A. Bulusheva, G. Chernykh, I. Medennikov, N. Merkin, A. Prudnikov, and N. Tomashenko, “Automated closed captioning for russian live broadcasting,” in Fifteenth Annual Conference of the International Speech Communication Association, 2014.
  5. “SMPTE OV 2110-0:2018 - SMPTE overview document - professional media over managed IP networks roadmap for the 2110 document suite.” [Online]. Available: https://ieeexplore.ieee.org/document/8626804
  6. NewTek, “NDI 5.6 white paper.” [Online]. Available: https://ndi.video/wp-content/uploads/2023/09/NDI-5.6-White-Paper-2023.pdf
  7. Closed captioning software - IBM watson media. [Online]. Available: https://www.ibm.com/products/video-streaming/closed-captioning
  8. enCaption: Automated closed captioning system | ENCO systems. [Online]. Available: https://www.enco.com/products/encaption
  9. H. Purwins, B. Li, T. Virtanen, J. Schlüter, S.-Y. Chang, and T. Sainath, “Deep learning for audio signal processing,” IEEE Journal of Selected Topics in Signal Processing, vol. 13, no. 2, pp. 206–219, 2019.
  10. Q. Kong, Y. Cao, T. Iqbal, Y. Wang, W. Wang, and M. D. Plumbley, “Panns: Large-scale pretrained audio neural networks for audio pattern recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 2880–2894, 2020.
  11. J. F. Gemmeke, D. P. Ellis, D. Freedman, A. Jansen, W. Lawrence, R. C. Moore, M. Plakal, and M. Ritter, “Audio set: An ontology and human-labeled dataset for audio events,” in IEEE international conference on acoustics, speech and signal processing (ICASSP), 2017, pp. 776–780.
  12. A. Singh, H. Liu, and M. D. Plumbley, “E-panns: Sound recognition using efficient pre-trained audio neural networks,” in INTER-NOISE and NOISE-CON Congress and Conference Proceedings, vol. 268, no. 1.   Institute of Noise Control Engineering, 2023, pp. 7220–7228.
  13. N. INC, “Ndi sdk v6.0.1.” [Online]. Available: https://ndi.video/for-developers/ndi-sdk
  14. Q. kong, “qiuqiangkong/panns_inference,” original-date: 2020-03-08T06:22:30Z. [Online]. Available: https://github.com/qiuqiangkong/panns˙inference
  15. N. Kondo, “buresu/ndi-python,” original-date: 2019-04-16T10:59:04Z. [Online]. Available: https://github.com/buresu/ndi-python
  16. “obsproject/obs-studio,” original-date: 2013-10-01T02:40:31Z. [Online]. Available: https://github.com/obsproject/obs-studio
  17. Y. Gong, Y.-A. Chung, and J. Glass, “Ast: Audio spectrogram transformer,” Interspeech, 2021.
  18. F. Schmid, K. Koutini, and G. Widmer, “Efficient large-scale audio tagging via transformer-to-cnn knowledge distillation,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).   IEEE, 2023, pp. 1–5.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com