ACM MMSys 2024 Bandwidth Estimation in Real Time Communications Challenge (2403.06324v2)
Abstract: The quality of experience (QoE) delivered by video conferencing systems to end users depends in part on correctly estimating the capacity of the bottleneck link between the sender and the receiver over time. Bandwidth estimation for real-time communications (RTC) remains a significant challenge, primarily due to the continuously evolving heterogeneous network architectures and technologies. From the first bandwidth estimation challenge which was hosted at ACM MMSys 2021, we learned that bandwidth estimation models trained with reinforcement learning (RL) in simulations to maximize network-based reward functions may not be optimal in reality due to the sim-to-real gap and the difficulty of aligning network-based rewards with user-perceived QoE. This grand challenge aims to advance bandwidth estimation model design by aligning reward maximization with user-perceived QoE optimization using offline RL and a real-world dataset with objective rewards which have high correlations with subjective audio/video quality in Microsoft Teams. All models submitted to the grand challenge underwent initial evaluation on our emulation platform. For a comprehensive evaluation under diverse network conditions with temporal fluctuations, top models were further evaluated on our geographically distributed testbed by using each model to conduct 600 calls within a 12-day period. The winning model is shown to deliver comparable performance to the top behavior policy in the released dataset. By leveraging real-world data and integrating objective audio/video quality scores as rewards, offline RL can therefore facilitate the development of competitive bandwidth estimators for RTC.
- BoB: Bandwidth prediction for real-time communications using heuristic and reinforcement learning. IEEE Transactions on Multimedia (2022).
- OpenNetLab: Open platform for RL-based congestion control for real-time communications. Proc. of APNet (2022).
- A closer look at offline rl agents. Advances in Neural Information Processing Systems 35 (2022), 8591–8604.
- Scott Fujimoto and Shixiang Shane Gu. 2021. A minimalist approach to offline reinforcement learning. Advances in neural information processing systems 34 (2021), 20132–20145.
- Off-policy deep reinforcement learning without exploration. In International conference on machine learning. PMLR, 2052–2062.
- Real-time Bandwidth Estimation from Offline Expert Demonstrations. arXiv preprint arXiv:2309.13481 (2023).
- Offline reinforcement learning with fisher divergence critic regularization. In International Conference on Machine Learning. PMLR, 5774–5783.
- Stabilizing off-policy q-learning via bootstrapping error reduction. Advances in Neural Information Processing Systems 32 (2019).
- When should we prefer offline reinforcement learning over behavioral cloning? arXiv preprint arXiv:2204.05618 (2022).
- Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems 33 (2020), 1179–1191.
- Batch reinforcement learning. In Reinforcement learning: State-of-the-art. Springer, 45–73.
- Reinforcement learning based cross-layer congestion control for real-time communication. In 2022 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB). IEEE, 01–06.
- Reinforcement Learning Based Cross-Layer Congestion Control for Real-Time Communication. In 2022 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB). 01–06. https://doi.org/10.1109/BMSB55706.2022.9828569
- Dena Markudova and Michela Meo. 2023. ReCoCo: Reinforcement learning-based Congestion control for Real-time applications. In 2023 IEEE 24th International Conference on High Performance Switching and Routing (HPSR). IEEE, 68–74.
- LSTM-Based Video Quality Prediction Accounting for Temporal Distortions in Videoconferencing Calls. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1–5.
- Cal-ql: Calibrated offline rl pre-training for efficient online fine-tuning. Advances in Neural Information Processing Systems 36 (2024).
- RTP: A transport protocol for real-time applications. Technical Report.
- ReDS: Offline RL With Heteroskedastic Datasets via Support Constraints. Advances in Neural Information Processing Systems 36 (2024).
- Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.
- A hybrid receiver-side congestion control scheme for web real-time communication. In Proceedings of the 12th ACM Multimedia Systems Conference. 332–338.
- Critic regularized regression. Advances in Neural Information Processing Systems 33 (2020), 7768–7778.
- Behavior regularized offline reinforcement learning. arXiv preprint arXiv:1911.11361 (2019).
- Computers Can Learn from the Heuristic Designs and Master Internet Congestion Control. In Proceedings of the ACM SIGCOMM 2023 Conference. 255–274.
- OnRL: improving mobile video telephony via online reinforcement learning. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking. 1–14.