Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Offline to Online Learning for Real-Time Bandwidth Estimation (2309.13481v3)

Published 23 Sep 2023 in cs.NI and cs.LG

Abstract: Real-time video applications require accurate bandwidth estimation (BWE) to maintain user experience across varying network conditions. However, increasing network heterogeneity challenges general-purpose BWE algorithms, necessitating solutions that adapt to end-user environments. While widely adopted, heuristic-based methods are difficult to individualize without extensive domain expertise. Conversely, online reinforcement learning (RL) offers ease of customization but neglects prior domain expertise and suffers from sample inefficiency. Thus, we present Merlin, an imitation learning-based solution that replaces the manual parameter tuning of heuristic-based methods with data-driven updates to streamline end-user personalization. Our key insight is that transforming heuristic-based BWE algorithms into neural networks facilitates data-driven personalization. Merlin utilizes Behavioral Cloning to efficiently learn from offline telemetry logs, capturing heuristic policies without live network interactions. The cloned policy can then be seamlessly tailored to end user network conditions through online finetuning. In real intercontinental videoconferencing calls, Merlin matches our heuristic's policy with no statistically significant differences in user quality of experience (QoE). Finetuning Merlin's control policy to end-user environments enables QoE improvements of up to 7.8% compared to the heuristic policy. Lastly, our IL-based design performs competitively with current state-of-the-art online RL techniques but converges with 80% fewer videoconferencing samples, facilitating practical end-user personalization.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. Classic meets modern: A pragmatic learning-based congestion control for the internet. In Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication, pages 632–647, 2020.
  2. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twenty-first international conference on Machine learning, page 1, 2004.
  3. Bob: Bandwidth prediction for real-time communications using heuristic and reinforcement learning. IEEE Transactions on Multimedia, 2022.
  4. Webrtc 1.0: Real-time communication between browsers. w3c working draft. World Wide Web Consortium, 2012.
  5. Qflow: A learning approach to high qoe video streaming at the wireless edge. IEEE/ACM Transactions on Networking, 30(1):32–46, 2021.
  6. Bbr: Congestion-based congestion control: Measuring bottleneck bandwidth and round-trip propagation time. Queue, 14(5):20–53, oct 2016.
  7. Analysis and design of the google congestion control for web real-time communication (WebRTC). In Proceedings of the 7th International Conference on Multimedia Systems - MMSys ’16, pages 1–12, Klagenfurt, Austria, 2016. ACM Press.
  8. Rl-afec: adaptive forward error correction for real-time video communication based on reinforcement learning. In Proceedings of the 13th ACM Multimedia Systems Conference, pages 96–108, 2022.
  9. R. Droms. Automated configuration of tcp/ip with dhcp. IEEE Internet Computing, 3(4):45–53, 1999.
  10. A unified congestion control framework for diverse application preferences and network conditions. In Proceedings of the 17th International Conference on emerging Networking EXperiments and Technologies, pages 282–296, 2021.
  11. Eagle: Refining congestion control by learning from the experts. In IEEE INFOCOM 2020-IEEE Conference on Computer Communications, pages 676–685. IEEE, 2020.
  12. Opennetlab: Open platform for rl-based congestion control for real-time communications. Proc. of APNet, 2022.
  13. Reinforcement learning for bandwidth estimation and congestion control in real-time communications. 2019. arXiv:1912.02222 [cs].
  14. Guided cost learning: Deep inverse optimal control via policy optimization. In International conference on machine learning, pages 49–58. PMLR, 2016.
  15. Implicit behavioral cloning. In Conference on Robot Learning, pages 158–168. PMLR, 2022.
  16. Salsify:{{\{{Low-Latency}}\}} network video through tighter integration between a video codec and a transport protocol. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18), pages 267–282, 2018.
  17. Learning robust rewards with adversarial inverse reinforcement learning. arXiv preprint arXiv:1710.11248, 2017.
  18. Implementing reinforcement learning datacenter congestion control in nvidia nics. In 2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid), pages 331–343, 2023.
  19. Inverse reward design. Advances in neural information processing systems, 30, 2017.
  20. Generative adversarial imitation learning. Advances in neural information processing systems, 29, 2016.
  21. Estimating available bandwidth using packet pair probing. Technical report, CARNEGIE-MELLON UNIV PITTSBURGH PA SCHOOL OF COMPUTER SCIENCE, 2002.
  22. Qarc: Video quality aware rate control for real-time video streaming based on deep reinforcement learning. In Proceedings of the 26th ACM international conference on Multimedia, pages 1208–1216, 2018.
  23. Learned internet congestion control for short video uploading. In Proceedings of the 30th ACM International Conference on Multimedia, pages 3064–3075, 2022.
  24. A deep reinforcement learning perspective on internet congestion control. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 3050–3059. PMLR, 09–15 Jun 2019.
  25. A deep reinforcement learning perspective on internet congestion control. In International Conference on Machine Learning, pages 3050–3059. PMLR, 2019.
  26. A bayesian approach to generative adversarial imitation learning. Advances in neural information processing systems, 31, 2018.
  27. Ingemar Johansson. Self-clocked rate adaptation for conversational video in lte. In Proceedings of the 2014 ACM SIGCOMM workshop on Capacity sharing workshop, pages 51–56, 2014.
  28. Online available bandwidth estimation using multiclass supervised learning techniques. Computer Communications, 170:177–189, 2021.
  29. Integration of imitation learning using gail and reinforcement learning using task-achievement rewards via probabilistic graphical model. Advanced Robotics, 34(16):1055–1067, 2020.
  30. R-fec: Rl-based fec adjustment for better qoe in webrtc. In Proceedings of the 30th ACM International Conference on Multimedia, pages 2948–2956, 2022.
  31. Continuous inverse optimal control with locally optimal examples. arXiv preprint arXiv:1206.4617, 2012.
  32. Nonlinear inverse reinforcement learning with gaussian processes. Advances in neural information processing systems, 24, 2011.
  33. Reinforcement learning based cross-layer congestion control for real-time communication. In 2022 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), pages 01–06, 2022.
  34. Reinforcement learning based cross-layer congestion control for real-time communication. In 2022 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), pages 01–06. IEEE, 2022.
  35. Proto: Iterative policy regularized offline-to-online reinforcement learning. arXiv preprint arXiv:2305.15669, 2023.
  36. Multi-objective network congestion control via constrained reinforcement learning. In 2021 IEEE Global Communications Conference (GLOBECOM), pages 1–6. IEEE, 2021.
  37. Real-world video adaptation with reinforcement learning. arXiv preprint arXiv:2008.12858, 2020.
  38. Neural Adaptive Video Streaming with Pensieve. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication - SIGCOMM ’17, pages 197–210, Los Angeles, CA, USA, 2017. ACM Press.
  39. Neural adaptive video streaming with pensieve. In Proceedings of the conference of the ACM special interest group on data communication, pages 197–210, 2017.
  40. Lstm-based video quality prediction accounting for temporal distortions in videoconferencing calls. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023.
  41. Algorithms for inverse reinforcement learning. In Icml, volume 1, page 2, 2000.
  42. Ali Rezagholizadeh. Source rate control in videoconferencing application using state–action–reward–state–action temporal difference reinforcement learning. PhD thesis, École de technologie supérieure, 2022.
  43. The ns-3 network simulator. In Modeling and tools for network simulation, pages 15–34. Springer, 2010.
  44. A reduction of imitation learning and structured prediction to no-regret online learning. In Geoffrey Gordon, David Dunson, and Miroslav Dudík, editors, Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, volume 15 of Proceedings of Machine Learning Research, pages 627–635, Fort Lauderdale, FL, USA, 11–13 Apr 2011. PMLR.
  45. Rtp: A transport protocol for real-time applications. Technical report, 2003.
  46. Towards ai-enabled traffic management in multipath tcp: A survey. Computer Communications, 181:412–427, 2022.
  47. Mvfst-rl: An asynchronous rl framework for congestion control with delayed actions. arXiv preprint arXiv:1910.04054, 2019.
  48. Adversarial imitation learning from incomplete demonstrations. arXiv preprint arXiv:1905.12310, 2019.
  49. Apprenticeship learning using linear programming. In Proceedings of the 25th international conference on Machine learning, pages 1032–1039, 2008.
  50. A game-theoretic approach to apprenticeship learning. Advances in neural information processing systems, 20, 2007.
  51. Reinforcement learning for datacenter congestion control. SIGMETRICS Perform. Eval. Rev., 49(2):43–46, jan 2022.
  52. Behavioral cloning from observation. arXiv preprint arXiv:1805.01954, 2018.
  53. A hybrid receiver-side congestion control scheme for web real-time communication. In Proceedings of the 12th ACM Multimedia Systems Conference, pages 332–338, 2021.
  54. Congestion control: A renaissance with machine learning. IEEE Network, 35(4):262–269, 2021.
  55. Stochastic forecasts achieve high throughput and low delay over cellular networks. In 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13), pages 459–471, 2013.
  56. Glider: rethinking congestion control with deep reinforcement learning. World Wide Web, 26(1):115–137, 2023.
  57. From ember to blaze: Swift interactive video adaptation via meta-reinforcement learning. arXiv preprint arXiv:2301.05541, 2023.
  58. Continual learning improves internet video streaming. arXiv preprint arXiv:1906.01113, 2019.
  59. Computers can learn from the heuristic designs and master internet congestion control. In Proceedings of the ACM SIGCOMM 2023 Conference, ACM SIGCOMM ’23, page 255–274, New York, NY, USA, 2023. Association for Computing Machinery.
  60. Gemini: An ensemble framework for bandwidth estimation in web real-time communications.
  61. Safr: A real-time communication system with adaptive frame rate. In Proceedings of the 1st International Workshop on Networked AI Systems, pages 1–6, 2023.
  62. Loki: improving long tail performance of learning-based real-time video adaptation by fusing rule-based models. In Proceedings of the 27th Annual International Conference on Mobile Computing and Networking, pages 775–788, 2021.
  63. Onrl: improving mobile video telephony via online reinforcement learning. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, pages 1–14, 2020.
  64. Hybridrts: A hybrid congestion control framework with rule and reinforcement learning for low-latency webrtc live video streaming. In 2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys), pages 1757–1764. IEEE, 2022.
  65. Deepcc: Bridging the gap between congestion control and applications via multiobjective optimization. IEEE/ACM Transactions on Networking, 2022.
  66. Reinforcement learning based congestion control in a real environment. In 2020 29th International Conference on Computer Communications and Networks (ICCCN), pages 1–9. IEEE, 2020.
  67. Learning to coordinate video codec with transport protocol for mobile video telephony. In The 25th Annual International Conference on Mobile Computing and Networking, pages 1–16, 2019.
  68. Nada: A unified congestion control scheme for low-latency interactive video. In 2013 20th International Packet Video Workshop, pages 1–8. IEEE, 2013.
  69. Modeling interaction via the principle of maximum causal entropy. 2010.
  70. Maximum entropy inverse reinforcement learning. In Aaai, volume 8, pages 1433–1438. Chicago, IL, USA, 2008.
Citations (4)

Summary

We haven't generated a summary for this paper yet.