OMRA: Online Motion Resolution Adaptation to Remedy Domain Shift in Learned Hierarchical B-frame Coding (2402.12816v1)
Abstract: Learned hierarchical B-frame coding aims to leverage bi-directional reference frames for better coding efficiency. However, the domain shift between training and test scenarios due to dataset limitations poses a challenge. This issue arises from training the codec with small groups of pictures (GOP) but testing it on large GOPs. Specifically, the motion estimation network, when trained on small GOPs, is unable to handle large motion at test time, incurring a negative impact on compression performance. To mitigate the domain shift, we present an online motion resolution adaptation (OMRA) method. It adapts the spatial resolution of video frames on a per-frame basis to suit the capability of the motion estimation network in a pre-trained B-frame codec. Our OMRA is an online, inference technique. It need not re-train the codec and is readily applicable to existing B-frame codecs that adopt hierarchical bi-directional prediction. Experimental results show that OMRA significantly enhances the compression performance of two state-of-the-art learned B-frame codecs on commonly used datasets.
- “Dvc: An end-to-end deep video compression framework,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 11006–11015.
- “Scale-space flow for end-to-end optimized video compression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 8503–8512.
- “Neural video coding using multiscale motion compensation and spatiotemporal context model,” IEEE Transactions on Circuits and Systems for Video Technology, 2020.
- “Fvc: A new framework towards deep video compression in feature space,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 1502–1511.
- “Video compression through image interpolation,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 416–431.
- “Learning for video compression with hierarchical quality and recurrent enhancement,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 6628–6637.
- “End-to-end rate-distortion optimized learned hierarchical bi-directional video compression,” IEEE Transactions on Image Processing, vol. 31, pp. 974–983, 2022.
- “Extending neural p-frame codecs for b-frame coding,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6660–6669, 2021.
- “B-canf: Adaptive b-frame coding with conditional augmented normalizing flows,” IEEE Transactions on Circuits and Systems for Video Technology, pp. 1–1, 2023.
- “Maskcrt: Masked conditional residual transformer for learned video compression,” arXiv preprint arXiv:2312.15829, 2023.
- “Hierarchical b-frame video coding using two-layer canf without motion coding,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 10249–10258.
- “Video enhancement with task-oriented flow,” International Journal of Computer Vision, vol. 127, no. 8, pp. 1106–1125, 2019.
- “Accflow: backward accumulation for long-range optical flow,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 12119–12128.
- “Tapir: Tracking any point with per-frame initialization and temporal refinement,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 10061–10072.
- “Explicit motion disentangling for efficient optical flow estimation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 9521–9530.
- “Film: Frame interpolation for large motion,” in European Conference on Computer Vision (ECCV), 2022.
- “Optical flow estimation using a spatial pyramid network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4161–4170.
- “Residual conv-deconv grid network for semantic segmentation,” in Proceedings of the British Machine Vision Conference, 2017, 2017.
- “Hm-16.25,” https://vcgit.hhi.fraunhofer.de/jvet/HM/, Accessed: 2023-10-30.
- “Uvg dataset: 50/120fps 4k sequences for video codec analysis and development,” in Proceedings of the 11th ACM Multimedia Systems Conference, 2020, pp. 297–302.
- F. Bossen et al., “Common test conditions and software reference configurations,” JCTVC-L1100, vol. 12, no. 1, 2013.
- “Mcl-jcv: a jnd-based h. 264/avc video quality assessment dataset,” in 2016 IEEE International Conference on Image Processing (ICIP). IEEE, 2016, pp. 1509–1513.
- Gisle Bjontegaard, “Calculation of average psnr differences between rd-curves,” ITU SG16 Doc. VCEG-M33, 2001.
- “Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8934–8943.
- Zong-Lin Gao (3 papers)
- Sang NguyenQuang (4 papers)
- Wen-Hsiao Peng (39 papers)
- Xiem HoangVan (17 papers)