Papers
Topics
Authors
Recent
Search
2000 character limit reached

Task-Aware Encoder Control for Deep Video Compression

Published 7 Apr 2024 in eess.IV, cs.AI, and cs.CV | (2404.04848v2)

Abstract: Prior research on deep video compression (DVC) for machine tasks typically necessitates training a unique codec for each specific task, mandating a dedicated decoder per task. In contrast, traditional video codecs employ a flexible encoder controller, enabling the adaptation of a single codec to different tasks through mechanisms like mode prediction. Drawing inspiration from this, we introduce an innovative encoder controller for deep video compression for machines. This controller features a mode prediction and a Group of Pictures (GoP) selection module. Our approach centralizes control at the encoding stage, allowing for adaptable encoder adjustments across different tasks, such as detection and tracking, while maintaining compatibility with a standard pre-trained DVC decoder. Empirical evidence demonstrates that our method is applicable across multiple tasks with various existing pre-trained DVCs. Moreover, extensive experiments demonstrate that our method outperforms previous DVC by about 25% bitrate for different tasks, with only one pre-trained decoder.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Dsslic: Deep semantic segmentation-based layered image compression. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2042–2046. IEEE, 2019.
  2. Overview of the versatile video coding (vvc) standard and its applications. IEEE Transactions on Circuits and Systems for Video Technology, 31(10):3736–3764, 2021.
  3. Toward intelligent sensing: Intermediate deep feature compression. IEEE Transactions on Image Processing, 29:2230–2243, 2020.
  4. Scalable image coding for humans and machines. IEEE Transactions on Image Processing, 31:2739–2754, 2022.
  5. MMAction2 Contributors. Openmmlab’s next generation video understanding toolbox and benchmark. https://github.com/open-mmlab/mmaction2, 2020.
  6. A survey of detection-based video multi-object tracking. Displays, page 102317, 2022.
  7. Overview of the mpeg cdvs standard. In 2015 Data Compression Conference, pages 323–332, 2015.
  8. Compact descriptors for video analysis: The emerging mpeg standard. IEEE MultiMedia, 26(2):44–54, 2019.
  9. Lossy compression for lossless prediction. Advances in Neural Information Processing Systems, 34:14014–14028, 2021.
  10. Image coding for machines with omnipotent feature learning. In European Conference on Computer Vision, pages 510–528. Springer, 2022.
  11. Boosting neural image compression for machines using latent space masking. arXiv preprint arXiv:2112.08168, 2021.
  12. Learned scalable video coding for humans and machines. arXiv preprint arXiv:2307.08978, 2023.
  13. Towards coding for human and machine vision: A scalable image coding approach. In 2020 IEEE International Conference on Multimedia and Expo (ICME), pages 1–6. IEEE, 2020.
  14. Fvc: A new framework towards deep video compression in feature space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1502–1511, 2021.
  15. Coarse-to-fine deep video coding with hyperprior-guided mode prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5921–5930, 2022.
  16. Video object tracking in the compressed domain using spatio-temporal markov random fields. IEEE transactions on image processing, 22(1):300–313, 2012.
  17. End-to-end compressed video representation learning for generic event boundary detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13967–13976, 2022.
  18. Deep contextual video compression. Advances in Neural Information Processing Systems, 34:18114–18125, 2021a.
  19. Neural video compression with diverse contexts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22616–22626, 2023.
  20. Task-driven semantic coding via reinforcement learning. IEEE Transactions on Image Processing, 30:6307–6320, 2021b.
  21. Deepsvc: Deep scalable video coding for both machine and human vision. In Proceedings of the 31st ACM International Conference on Multimedia, pages 9205–9214, 2023.
  22. Tsm: Temporal shift module for efficient video understanding. In Proceedings of the IEEE/CVF international conference on computer vision, pages 7083–7093, 2019.
  23. Real-time online multi-object tracking in compressed domain. arXiv preprint arXiv:2204.02081, 2022.
  24. Dvc: An end-to-end deep video compression framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11006–11015, 2019.
  25. An end-to-end learning framework for video compression. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10):3292–3308, 2021.
  26. Preprocessing enhanced image compression for machine vision. arXiv preprint arXiv:2206.05650, 2022.
  27. Neural video compression using gans for detail synthesis and propagation. In European Conference on Computer Vision, pages 562–578. Springer, 2022.
  28. Imagenet large scale visual recognition challenge. International journal of computer vision, 115:211–252, 2015.
  29. Temporal context mining for learned video compression. IEEE Transactions on Multimedia, 2022.
  30. Yolov: making still image object detectors great at video object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 2254–2262, 2023.
  31. Dmc-net: Generating discriminative motion cues for fast compressed video action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1268–1277, 2019.
  32. Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402, 2012.
  33. Overview of the high efficiency video coding (hevc) standard. IEEE Transactions on Circuits and Systems for Video Technology, 22(12):1649–1668, 2012.
  34. Raft: Recurrent all-pairs field transforms for optical flow. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 402–419. Springer, 2020.
  35. Self-conditioned probabilistic learning of video rescaling. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4490–4499, 2021.
  36. Non-semantics suppressed mask learning for unsupervised video semantic compression. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13610–13622, 2023.
  37. A coding framework and benchmark towards low-bitrate video understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024.
  38. Towards image understanding from deep compression without decoding. arXiv preprint arXiv:1803.06131, 2018.
  39. Fast object detection in compressed video. In Proceedings of the IEEE/CVF international conference on computer vision, pages 7104–7113, 2019.
  40. Towards analysis-friendly face representation with scalable feature and texture compression. IEEE Transactions on Multimedia, 2021.
  41. Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, pages 1398–1402. Ieee, 2003.
  42. Overview of the h.264/avc video coding standard. IEEE Transactions on Circuits and Systems for Video Technology, 13(7):560–576, 2003.
  43. Compressed video action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6026–6035, 2018.
  44. Sssic: semantics-to-signal scalable image coding with learned structural representations. IEEE Transactions on Image Processing, 30:8939–8954, 2021.
  45. Perceptual learned video compression with recurrent conditional gan. arXiv preprint arXiv:2109.03082, 1, 2021.
  46. Bytetrack: Multi-object tracking by associating every detection box. In European Conference on Computer Vision, pages 1–21. Springer, 2022.
  47. Learned disentangled latent representations for scalable image coding for humans and machines. In 2023 Data Compression Conference (DCC), pages 42–51, 2023.
Citations (2)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.