Optimal Transcoding Resolution Prediction for Efficient Per-Title Bitrate Ladder Estimation (2401.04405v1)
Abstract: Adaptive video streaming requires efficient bitrate ladder construction to meet heterogeneous network conditions and end-user demands. Per-title optimized encoding typically traverses numerous encoding parameters to search the Pareto-optimal operating points for each video. Recently, researchers have attempted to predict the content-optimized bitrate ladder for pre-encoding overhead reduction. However, existing methods commonly estimate the encoding parameters on the Pareto front and still require subsequent pre-encodings. In this paper, we propose to directly predict the optimal transcoding resolution at each preset bitrate for efficient bitrate ladder construction. We adopt a Temporal Attentive Gated Recurrent Network to capture spatial-temporal features and predict transcoding resolutions as a multi-task classification problem. We demonstrate that content-optimized bitrate ladders can thus be efficiently determined without any pre-encoding. Our method well approximates the ground-truth bitrate-resolution pairs with a slight Bj{\o}ntegaard Delta rate loss of 1.21% and significantly outperforms the state-of-the-art fixed ladder.
- U Cisco, “Cisco annual internet report (2018–2023) white paper,” Cisco: San Jose, CA, USA, vol. 10, no. 1, pp. 1–35, 2020.
- “Down-scaling for better transform compression,” IEEE Transactions on Image Processing, vol. 12, no. 9, pp. 1132–1144, 2003.
- Weisi Lin and Li Dong, “Adaptive downsampling to improve image compression at low bit rates,” IEEE Transactions on Image Processing, vol. 15, no. 9, pp. 2513–2521, 2006.
- Iraj Sodagar, “The mpeg-dash standard for multimedia streaming over the internet,” IEEE multimedia, vol. 18, no. 4, pp. 62–67, 2011.
- “Per-title encode optimization,” Apr 2017.
- “Efficient bitrate ladder construction for content-optimized adaptive video streaming,” IEEE Open Journal of Signal Processing, vol. 2, pp. 496–511, 2021.
- “Efficient per-shot convex hull prediction by recurrent learning,” arXiv preprint arXiv:2206.04877, 2022.
- “Content-aware convex hull prediction,” in Proceedings of the 2nd Mile-High Video Conference, 2023, pp. 1–7.
- “Multi-codec dash dataset,” in Proceedings of the 9th ACM Multimedia Systems Conference, 2018, pp. 438–443.
- “Vmaf-based bitrate ladder estimation for adaptive streaming,” in 2021 Picture Coding Symposium (PCS). IEEE, 2021, pp. 1–5.
- “Empirical evaluation of gated recurrent neural networks on sequence modeling,” arXiv preprint arXiv:1412.3555, 2014.
- “Quo vadis, action recognition? a new model and the kinetics dataset,” in proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6299–6308.
- “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- “Adapool: Exponential adaptive pooling for information-retaining downsampling,” IEEE Transactions on Image Processing, vol. 32, pp. 251–266, 2022.
- “Toward a practical perceptual video quality metric,” The Netflix Tech Blog, vol. 6, no. 2, pp. 2, 2016.
- Gisle Bjontegaard, “Calculation of average psnr differences between rd-curves,” ITU SG16 Doc. VCEG-M33, 2001.
- “Adaptive downsampling for high-definition video coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, no. 3, pp. 480–488, 2013.
- “Adaptive downsampling video coding with spatially scalable rate-distortion modeling,” IEEE transactions on circuits and systems for video technology, vol. 24, no. 11, pp. 1957–1968, 2014.
- “Fast encoding parameter selection for convex hull video encoding,” in Applications of Digital Image Processing XLIII. SPIE, 2020, vol. 11510, pp. 181–194.
- “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
- “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2980–2988.
- “The sjtu 4k video sequence dataset,” in 2013 Fifth International Workshop on Quality of Multimedia Experience (QoMEX). IEEE, 2013, pp. 34–35.
- “Uvg dataset: 50/120fps 4k sequences for video codec analysis and development,” in Proceedings of the 11th ACM Multimedia Systems Conference, 2020, pp. 297–302.
- “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009, pp. 248–255.
- “Evaluation measures of the classification performance of imbalanced data sets,” in Computational Intelligence and Intelligent Systems: 4th International Symposium, ISICA 2009, Huangshi, China, October 23-25, 2009. Proceedings 4. Springer, 2009, pp. 461–471.