Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Optimizing Mobile-Friendly Viewport Prediction for Live 360-Degree Video Streaming (2403.02693v1)

Published 5 Mar 2024 in cs.MM and eess.IV

Abstract: Viewport prediction is the crucial task for adaptive 360-degree video streaming, as the bitrate control algorithms usually require the knowledge of the user's viewing portions of the frames. Various methods are studied and adopted for viewport prediction from less accurate statistic tools to highly calibrated deep neural networks. Conventionally, it is difficult to implement sophisticated deep learning methods on mobile devices, which have limited computation capability. In this work, we propose an advanced learning-based viewport prediction approach and carefully design it to introduce minimal transmission and computation overhead for mobile terminals. We also propose a model-agnostic meta-learning (MAML) based saliency prediction network trainer, which provides a few-sample fast training solution to obtain the prediction model by utilizing the information from the past models. We further discuss how to integrate this mobile-friendly viewport prediction (MFVP) approach into a typical 360-degree video live streaming system by formulating and solving the bitrate adaptation problem. Extensive experiment results show that our prediction approach can work in real-time for live video streaming and can achieve higher accuracies compared to other existing prediction methods on mobile end, which, together with our bitrate adaptation algorithm, significantly improves the streaming QoE from various aspects. We observe the accuracy of MFVP is 8.1$\%$ to 28.7$\%$ higher than other algorithms and achieves 3.73$\%$ to 14.96$\%$ higher average quality level and 49.6$\%$ to 74.97$\%$ less quality level change than other algorithms.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. F. Qian, B. Han, Q. Xiao, and V. Gopalakrishnan, “Flare: Practical viewport-adaptive 360-degree video streaming for mobile devices,” in Proceedings of the 24th Annual International Conference on Mobile Computing and Networking.   ACM, 2018, pp. 99–114.
  2. L. Xie, Z. Xu, Y. Ban, X. Zhang, and Z. Guo, “360probdash: Improving qoe of 360 video streaming using tile-based http adaptive streaming,” in Proceedings of the 25th ACM International Conference on Multimedia.   ACM, 2017, pp. 315–323.
  3. H. Lv, Q. Yang, C. Li, W. Dai, J. Zou, and H. Xiong, “Salgcn: Saliency prediction for 360-degree images based on spherical graph convolutional networks,” in Proceedings of the 28th ACM International Conference on Multimedia.   ACM, 2020, pp. 682–690.
  4. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510–4520.
  5. X. Feng, Y. Liu, and S. Wei, “Livedeep: Online viewport prediction for live virtual reality streaming using lifelong deep learning,” in 2020 IEEE Conference on Virtual Reality and 3D User Interfaces (VR).   IEEE, 2020, pp. 800–808.
  6. J. He, M. A. Qureshi, L. Qiu, J. Li, F. Li, and L. Han, “Rubiks: Practical 360-degree streaming for smartphones,” in Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services.   ACM, 2018, pp. 482–494.
  7. A. T. Nasrabadi, A. Samiei, and R. Prakash, “Viewport prediction for 360 videos: a clustering approach,” in Proceedings of the 30th ACM Workshop on Network and Operating Systems Support for Digital Audio and Video, 2020, pp. 34–39.
  8. Y. Ban, L. Xie, Z. Xu, X. Zhang, Z. Guo, and Y. Wang, “Cub360: Exploiting cross-users behaviors for viewport prediction in 360 video adaptive streaming,” in 2018 IEEE International Conference on Multimedia and Expo (ICME).   IEEE, 2018, pp. 1–6.
  9. S. Van Damme, M. T. Vega, and F. De Turck, “Machine learning based content-agnostic viewport prediction for 360-degree video,” ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), vol. 18, no. 2, pp. 1–24, 2022.
  10. T. Xu, B. Han, and F. Qian, “Analyzing viewport prediction under different vr interactions,” in Proceedings of the 15th International Conference on Emerging Networking Experiments And Technologies, 2019, pp. 165–171.
  11. C. Fan, J. Lee, W. Lo, C. Huang, K. Chen, and C. Hsu, “Fixation prediction for 360 video streaming to head-mounted displays,” Proceedings of ACM NOSSDAV 2017, 2017.
  12. Y. Xu, Y. Dong, J. Wu, Z. Sun, Z. Shi, J. Yu, and S. Gao, “Gaze prediction in dynamic 360 immersive videos,” in proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 5333–5342.
  13. A. Nguyen, Z. Yan, and K. Nahrstedt, “Your attention is unique: Detecting 360-degree video saliency in head-mounted display for head movement prediction,” in Proceedings of the 26th ACM international conference on Multimedia, 2018, pp. 1190–1198.
  14. C. Li, W. Zhang, Y. Liu, and Y. Wang, “Very long term field of view prediction for 360-degree video streaming,” in 2019 IEEE conference on multimedia information processing and retrieval (MIPR).   IEEE, 2019, pp. 297–302.
  15. L. Sun, Y. Mao, T. Zong, Y. Liu, and Y. Wang, “Live 360 degree video delivery based on user collaboration in a streaming flock,” IEEE Transactions on Multimedia, 2022.
  16. A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “Yolov4: Optimal speed and accuracy of object detection,” arXiv preprint arXiv:2004.10934, 2020.
  17. C. Wu, R. Zhang, Z. Wang, and L. Sun, “A spherical convolution approach for learning long term viewport prediction in 360 immersive video,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 01, 2020, pp. 14 003–14 040.
  18. Y. Zhang, P. Zhao, K. Bian, Y. Liu, L. Song, and X. Li, “Drl360: 360-degree video streaming with deep reinforcement learning,” in IEEE INFOCOM 2019-IEEE Conference on Computer Communications.   IEEE, 2019, pp. 1252–1260.
  19. L. Chopra, S. Chakraborty, A. Mondal, and S. Chakraborty, “Parima: Viewport adaptive 360-degree video streaming,” in Proceedings of the Web Conference 2021, 2021, pp. 2379–2391.
  20. X. Feng, W. Li, and S. Wei, “Liveroi: region of interest analysis for viewport prediction in live mobile virtual reality streaming,” in Proceedings of the 12th ACM Multimedia Systems Conference, 2021, pp. 132–145.
  21. X. Feng, Z. Bao, and S. Wei, “Liveobj: object semantics-based viewport prediction for live mobile virtual reality streaming,” IEEE Transactions on Visualization and Computer Graphics, vol. 27, no. 5, pp. 2736–2745, 2021.
  22. M. Kümmerer, L. Theis, and M. Bethge, “Deep gaze i: Boosting saliency prediction with feature maps trained on imagenet,” arXiv preprint arXiv:1411.1045, 2014.
  23. J. Pan, E. Sayrol, X. Giro-i Nieto, K. McGuinness, and N. E. O’Connor, “Shallow and deep convolutional networks for saliency prediction,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 598–606.
  24. Y. Zhang, L. Qin, Q. Huang, K. Yang, J. Zhang, and H. Yao, “From seed discovery to deep reconstruction: Predicting saliency in crowd via deep networks,” in Proceedings of the 24th ACM international conference on Multimedia, 2016, pp. 72–76.
  25. Q. Yang, C. Li, W. Dai, J. Zou, G.-J. Qi, and H. Xiong, “Rotation equivariant graph convolutional network for spherical image classification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 4303–4312.
  26. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention.   Springer, 2015, pp. 234–241.
  27. M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural networks on graphs with fast localized spectral filtering,” Advances in neural information processing systems, vol. 29, pp. 3844–3852, 2016.
  28. C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in International conference on machine learning.   PMLR, 2017, pp. 1126–1135.
  29. S. Xingjian, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, and W.-c. Woo, “Convolutional lstm network: A machine learning approach for precipitation nowcasting,” in Advances in neural information processing systems, 2015, pp. 802–810.
  30. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
  31. W. Wang, J. Shen, F. Guo, M.-M. Cheng, and A. Borji, “Revisiting video saliency: A large-scale benchmark and a new model,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4894–4903.
  32. J. Guo and H. Chao, “Building an end-to-end spatial-temporal convolutional network for video super-resolution,” in Thirty-First AAAI Conference on Artificial Intelligence, 2017.
  33. Z. Lu, S. Rallapalli, K. Chan, S. Pu, and T. La Porta, “Augur: Modeling the resource requirements of convnets on mobile devices,” IEEE Transactions on Mobile Computing, vol. 20, no. 2, pp. 352–365, 2019.
  34. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818–2826.
  35. A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
  36. F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1251–1258.
  37. J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132–7141.
  38. https://pytorch.org/, Accessed: 2022.
  39. https://caffe2.ai/, Accessed: 2022.
  40. https://keras.io/, Accessed: 2022.
  41. W.-C. Lo, C.-L. Fan, J. Lee, C.-Y. Huang, K.-T. Chen, and C.-H. Hsu, “360 video viewing dataset in head-mounted virtual reality,” in Proceedings of the 8th ACM on Multimedia Systems Conference.   ACM, 2017, pp. 211–216.
  42. “TensorFlow,” https://www.tensorflow.org/lite/performance/post\_training\_float16\_quant, Accessed: 2022.
  43. J. Van Der Hooft, S. Petrangeli, T. Wauters, R. Huysegems, P. R. Alface, T. Bostoen, and F. De Turck, “Http/2-based adaptive streaming of hevc video over 4g/lte networks,” IEEE Communications Letters, vol. 20, no. 11, pp. 2177–2180, 2016.
  44. E. J. David, J. Gutiérrez, A. Coutrot, M. P. Da Silva, and P. L. Callet, “A dataset of head and eye movements for 360° videos,” in Proceedings of the 9th ACM Multimedia Systems Conference.   ACM, 2018, pp. 432–437.
  45. J. Yao, S. S. Kanhere, and M. Hassan, “An empirical study of bandwidth predictability in mobile computing,” in Proceedings of the third ACM International Workshop on Wireless Network Testbeds, Experimental Evaluation and Characterization.   ACM, 2008, pp. 11–18.
  46. N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “Shufflenet v2: Practical guidelines for efficient cnn architecture design,” in Proceedings of the European Conference on Computer Vision (ECCV), September 2018.
  47. N. Bouzakaria, C. Concolato, and J. Le Feuvre, “Overhead and performance of low latency live streaming using mpeg-dash,” in IISA 2014, The 5th International Conference on Information, Intelligence, Systems and Applications.   IEEE, 2014, pp. 92–97.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Lei Zhang (1689 papers)
  2. Tao Long (16 papers)
  3. Weizhen Xu (2 papers)
  4. Laizhong Cui (16 papers)
  5. Jiangchuan Liu (29 papers)

Summary

We haven't generated a summary for this paper yet.