Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Object Segmentation-Assisted Inter Prediction for Versatile Video Coding (2403.11694v2)

Published 18 Mar 2024 in eess.IV and cs.CV

Abstract: In modern video coding standards, block-based inter prediction is widely adopted, which brings high compression efficiency. However, in natural videos, there are usually multiple moving objects of arbitrary shapes, resulting in complex motion fields that are difficult to represent compactly. This problem has been tackled by more flexible block partitioning methods in the Versatile Video Coding (VVC) standard, but the more flexible partitions require more overhead bits to signal and still cannot be made arbitrarily shaped. To address this limitation, we propose an object segmentation-assisted inter prediction method (SAIP), where objects in the reference frames are segmented by some advanced technologies. With a proper indication, the object segmentation mask is translated from the reference frame to the current frame as the arbitrary-shaped partition of different regions without any extra signal. Using the segmentation mask, motion compensation is separately performed for different regions, achieving higher prediction accuracy. The segmentation mask is further used to code the motion vectors of different regions more efficiently. Moreover, the segmentation mask is considered in the joint rate-distortion optimization for motion estimation and partition estimation to derive the motion vector of different regions and partition more accurately. The proposed method is implemented into the VVC reference software, VTM version 12.0. Experimental results show that the proposed method achieves up to 1.98%, 1.14%, 0.79%, and on average 0.82%, 0.49%, 0.37% BD-rate reduction for common test sequences, under the Low-delay P, Low-delay B, and Random Access configurations, respectively.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. G. J. Sullivan et al., “Overview of the High Efficiency Video Coding (HEVC) Standard,” IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), vol. 22, no. 12, pp. 1649–1668, 2012.
  2. B. Bross et al., “Overview of the Versatile Video Coding (VVC) Standard and its Applications,” TCSVT, vol. 31, no. 10, pp. 3736–3764, 2021.
  3. Y.-Q. Zhang and S. Zafar, “Predictive Block-Matching Motion Estimation for TV coding. II. Inter-frame prediction,” IEEE Transactions on Broadcasting (TBC), vol. 37, no. 3, pp. 102–105, 1991.
  4. J.-Y. Lee et al., “Efficient Inter-view Motion Vector Prediction in Multi-view HEVC,” IEEE Transactions on Broadcasting (TBC), vol. 64, no. 3, pp. 666–680, 2017.
  5. W.-J. Chien et al., “Motion Vector Coding and Block Merging in the VVC Standard,” TCSVT, vol. 31, no. 10, pp. 3848–3861, 2021.
  6. H. Yang et al., “Subblock-based Motion Derivation and Inter Prediction Refinement in the Versatile Video Coding Standard,” TCSVT, vol. 31, no. 10, pp. 3862–3877, 2021.
  7. T. Wiegand et al., “Overview of the H.264/AVC Video Coding Standard,” TCSVT, vol. 13, no. 7, pp. 560–576, 2003.
  8. I.-K. Kim et al., “Block Partitioning Structure in the HEVC Standard,” TCSVT, vol. 22, no. 12, pp. 1697–1706, 2012.
  9. Y.-W. Huang, J. An, H. Huang, X. Li et al., “Block Partitioning Structure in the VVC Standard,” TCSVT, vol. 31, no. 10, pp. 3818–3833, 2021.
  10. M. Bläser, C. Heithausen, and M. Wien, “Geometry-Adaptive Motion Partitioning using Improved Temporal Prediction,” in IEEE Visual Communications and Image Processing (VCIP).   IEEE, 2017, pp. 1–4.
  11. H. Gao, S. Esenlik et al., “Geometric Partitioning Mode in Versatile Video Coding: Algorithm Review and Analysis,” TCSVT, 2020.
  12. J. Chen et al., “Object Boundary Based Motion Partition for Video Coding,” in Picture Coding Symposium (PCS).   Citeseer, 2007.
  13. M. Bläser et al., “Segmentation-Based Partitioning for Motion Compensated Prediction in Video Coding,” in PCS.   IEEE, 2016, pp. 1–5.
  14. Z. Wang et al., “Three-Zone Segmentation-Based Motion Compensation for Video Compression,” IEEE Transactions on Image Processing (TIP), vol. 28, no. 10, pp. 5091–5104, 2019.
  15. S. Minaee et al., “Image Segmentation Using Deep Learning: A Survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021.
  16. A. Kirillov et al., “Segment Anything,” ArXiv:2304.02643, 2023.
  17. T. Zhou et al., “A Survey on Deep Learning Technique for Video Segmentation,” TPAMI, vol. 45, no. 6, pp. 7099–7122, 2022.
  18. H. Gao et al., “Integrated Text for GEO,” in JVET document, JVET-Q0806, 2016.
  19. X. Zheng et al., “CE2: Non-rectangular Motion Partitioning,” in JCTVC document, JCTVC-F415, 2011.
  20. M. Bläser et al., “CE10: Results on Geometric Block Partitioning,” JVET document, JVET-K0146, 2018.
  21. M. Pardas et al., “Partition Tree for a Segmentation-Based Video Coding System,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 4.   IEEE, 1996, pp. 1982–1985.
  22. M. T. Orchard, “Predictive Motion-Field Segmentation for Image Sequence Coding,” TCSVT, vol. 3, no. 1, pp. 54–70, 1993.
  23. J. H. Kim et al., “Motion Compensation Based on Implicit Block Segmentation,” in IEEE International Conference on Image Processing (ICIP).   IEEE, 2008, pp. 2452–2455.
  24. K. He et al., “Mask R-CNN,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2961–2969.
  25. R. Girshick, “Fast R-CNN,” in ICCV, 2015, pp. 1440–1448.
  26. Z. Cai et al., “Cascade R-CNN: High Quality Object Detection and Instance Segmentation,” TPAMI, vol. 43, no. 5, pp. 1483–1498, 2019.
  27. T. Vu et al., “SCNet: Training Inference Sample Consistency for Instance Segmentation,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), vol. 35, no. 3, 2021, pp. 2701–2709.
  28. K.-K. Maninis et al., “Video Object Segmentation without Temporal Information,” TPAMI, vol. 41, no. 6, pp. 1515–1530, 2018.
  29. Z. Yang and Y. Yang, “Decoupling Features in Hierarchical Propagation for Video Object Segmentation,” Advances in Neural Information Processing Systems (NeurIPS), vol. 35, pp. 36 324–36 336, 2022.
  30. Y. Xu et al., “Integrating Boxes and Masks: A Multi-Object Framework for Unified Tracking and Segmentation,” in ICCV, 2023, pp. 9738–9751.
  31. Y.-T. Hu et al., “Videomatch: Matching Based Video Object Segmentation,” in ECCV, 2018, pp. 54–70.
  32. S. W. Oh et al., “Video Object Segmentation Using Space-Time Memory Networks,” in ICCV, 2019, pp. 9226–9235.
  33. H. K. Cheng et al., “Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation,” NeurIPS, vol. 34, 2021.
  34. F. Perazzi et al., “A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 724–732.
  35. C. Auyeung, J. J. Kosmach et al., “Overlapped Block Motion Compensation,” in VCIP, vol. 1818, 1992, pp. 561–573.
  36. M. T. Orchard and G. J. Sullivan, “Overlapped Block Motion Compensation: An Estimation-Theoretic Approach,” TIP, vol. 3, no. 5, pp. 693–699, 1994.
  37. J. M. Prewitt et al., “Object Enhancement and Extraction,” Picture Processing and Psychopictorics, vol. 10, no. 1, pp. 15–19, 1970.
  38. J. Canny, “A Computational Approach to Edge Detection,” TPAMI, no. 6, pp. 679–698, 1986.
  39. H. Lv et al., “A Comparison of Fractional-pel Interpolation Filters in HEVC and H. 264/AVC,” in VCIP.   IEEE, pp. 1–6.
  40. K. Ugur et al., “Motion Compensated Prediction and Interpolation Filter Design in H. 265/HEVC,” IEEE Journal of Selected Topics in Signal Processing (JSTSP), vol. 7, no. 6, pp. 946–956, 2013.
  41. K. Chen et al., “MMDetection: Open Mmlab Detection Toolbox and Benchmark,” ArXiv:1906.07155, 2019.
  42. K. Suehring and X. Li, “JVET Common Test Conditions and Software Reference Configurations,” JVET document, JVET-G1010, 2017.
  43. G. Bjontegaard, “Calculation of Average PSNR Differences between RD-curves,” ITU-T VCEG-M33, April, 2001, 2001.
  44. K. McCann et al., “High Efficiency Video Coding (HEVC) Test Model 14 (HM 14) Encoder Description,” JCTVC document, JCTVC-P1002, 2014.
  45. B. Bross, “Versatile Video Coding (Draft 1),” JVET document, JVET-J1001, 2018.
  46. F. Bossen et al., “Common test conditions and software reference configurations,” JVET document, JCTVC-L1100, vol. 12, no. 7, 2013.
  47. Xiph Video Test Media, “https://media.xiph.org/video/derf/.”
  48. S. Caelles et al., “The 2018 davis challenge on video object segmentation,” ArXiv:1803.00557, 2018.
  49. N. Xu et al., “Youtube-vos: Sequence-to-sequence Video Object Segmentation,” in ECCV, 2018, pp. 585–601.
  50. H. Ding et al., “MOSE: A New Dataset for Video Object Segmentation in Complex Scenes,” in ICCV, 2023.
  51. H. Watanabe et al., “Windowed Motion Compensation,” in VCIP, vol. 1605, 1991, pp. 582–589.
  52. L. Li, H. Li, D. Liu et al., “An Efficient Four-Parameter Affine Motion Model for Video Coding,” TCSVT, vol. 28, no. 8, pp. 1934–1948, 2017.
  53. J. Wang, D. Wang, and W. Zhang, “Temporal Compensated Motion Estimation with Simple Block-based Prediction,” IEEE Transactions on Broadcasting (TBC), vol. 49, no. 3, pp. 241–248, 2003.
  54. H. Gao et al., “Decoder-side Motion Vector Refinement in VVC: Algorithm and Hardware Implementation Considerations,” TCSVT, vol. 31, no. 8, pp. 3197–3211, 2020.
  55. Y. Li et al., “Global Homography Motion Compensation for Versatile Video Coding,” in VCIP.   IEEE, 2022.
  56. H. Schwarz et al., “Quantization and Entropy Coding in the Versatile Video Coding (VVC) Standard,” TCSVT, vol. 31, no. 10, pp. 3891–3906, 2021.
  57. M. Xu and B. Jeon, “Complexity-Efficient Dependent Quantization for Versatile Video Coding,” IEEE Transactions on Broadcasting (TBC), 2023.
  58. X. Zhao et al., “Transform Coding in the VVC Standard,” TCSVT, vol. 31, no. 10, pp. 3878–3890, 2021.
  59. T. L. da Silveira et al., “A Class of Low-complexity Dct-like Transforms for Image and Video Coding,” TCSVT, vol. 32, no. 7, pp. 4364–4375, 2021.
  60. M. Karczewicz et al., “VVC in-loop Filters,” TCSVT, vol. 31, no. 10, pp. 3907–3925, 2021.
  61. C. Chen et al., “Adaptive Quided Image Filter for Improved In-loop Filtering in Video Coding,” in 2015 IEEE 17th International Workshop on Multimedia Signal Processing (MMSP).   IEEE, 2015, pp. 1–6.
Citations (3)

Summary

We haven't generated a summary for this paper yet.