Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Optimal Transcoding Resolution Prediction for Efficient Per-Title Bitrate Ladder Estimation (2401.04405v1)

Published 9 Jan 2024 in cs.MM, cs.AI, cs.CV, and eess.IV

Abstract: Adaptive video streaming requires efficient bitrate ladder construction to meet heterogeneous network conditions and end-user demands. Per-title optimized encoding typically traverses numerous encoding parameters to search the Pareto-optimal operating points for each video. Recently, researchers have attempted to predict the content-optimized bitrate ladder for pre-encoding overhead reduction. However, existing methods commonly estimate the encoding parameters on the Pareto front and still require subsequent pre-encodings. In this paper, we propose to directly predict the optimal transcoding resolution at each preset bitrate for efficient bitrate ladder construction. We adopt a Temporal Attentive Gated Recurrent Network to capture spatial-temporal features and predict transcoding resolutions as a multi-task classification problem. We demonstrate that content-optimized bitrate ladders can thus be efficiently determined without any pre-encoding. Our method well approximates the ground-truth bitrate-resolution pairs with a slight Bj{\o}ntegaard Delta rate loss of 1.21% and significantly outperforms the state-of-the-art fixed ladder.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. U Cisco, “Cisco annual internet report (2018–2023) white paper,” Cisco: San Jose, CA, USA, vol. 10, no. 1, pp. 1–35, 2020.
  2. “Down-scaling for better transform compression,” IEEE Transactions on Image Processing, vol. 12, no. 9, pp. 1132–1144, 2003.
  3. Weisi Lin and Li Dong, “Adaptive downsampling to improve image compression at low bit rates,” IEEE Transactions on Image Processing, vol. 15, no. 9, pp. 2513–2521, 2006.
  4. Iraj Sodagar, “The mpeg-dash standard for multimedia streaming over the internet,” IEEE multimedia, vol. 18, no. 4, pp. 62–67, 2011.
  5. “Per-title encode optimization,” Apr 2017.
  6. “Efficient bitrate ladder construction for content-optimized adaptive video streaming,” IEEE Open Journal of Signal Processing, vol. 2, pp. 496–511, 2021.
  7. “Efficient per-shot convex hull prediction by recurrent learning,” arXiv preprint arXiv:2206.04877, 2022.
  8. “Content-aware convex hull prediction,” in Proceedings of the 2nd Mile-High Video Conference, 2023, pp. 1–7.
  9. “Multi-codec dash dataset,” in Proceedings of the 9th ACM Multimedia Systems Conference, 2018, pp. 438–443.
  10. “Vmaf-based bitrate ladder estimation for adaptive streaming,” in 2021 Picture Coding Symposium (PCS). IEEE, 2021, pp. 1–5.
  11. “Empirical evaluation of gated recurrent neural networks on sequence modeling,” arXiv preprint arXiv:1412.3555, 2014.
  12. “Quo vadis, action recognition? a new model and the kinetics dataset,” in proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6299–6308.
  13. “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  14. “Adapool: Exponential adaptive pooling for information-retaining downsampling,” IEEE Transactions on Image Processing, vol. 32, pp. 251–266, 2022.
  15. “Toward a practical perceptual video quality metric,” The Netflix Tech Blog, vol. 6, no. 2, pp. 2, 2016.
  16. Gisle Bjontegaard, “Calculation of average psnr differences between rd-curves,” ITU SG16 Doc. VCEG-M33, 2001.
  17. “Adaptive downsampling for high-definition video coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, no. 3, pp. 480–488, 2013.
  18. “Adaptive downsampling video coding with spatially scalable rate-distortion modeling,” IEEE transactions on circuits and systems for video technology, vol. 24, no. 11, pp. 1957–1968, 2014.
  19. “Fast encoding parameter selection for convex hull video encoding,” in Applications of Digital Image Processing XLIII. SPIE, 2020, vol. 11510, pp. 181–194.
  20. “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  21. “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2980–2988.
  22. “The sjtu 4k video sequence dataset,” in 2013 Fifth International Workshop on Quality of Multimedia Experience (QoMEX). IEEE, 2013, pp. 34–35.
  23. “Uvg dataset: 50/120fps 4k sequences for video codec analysis and development,” in Proceedings of the 11th ACM Multimedia Systems Conference, 2020, pp. 297–302.
  24. “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009, pp. 248–255.
  25. “Evaluation measures of the classification performance of imbalanced data sets,” in Computational Intelligence and Intelligent Systems: 4th International Symposium, ISICA 2009, Huangshi, China, October 23-25, 2009. Proceedings 4. Springer, 2009, pp. 461–471.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets