Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Shortcut-V2V: Compression Framework for Video-to-Video Translation based on Temporal Redundancy Reduction (2308.08011v2)

Published 15 Aug 2023 in cs.CV

Abstract: Video-to-video translation aims to generate video frames of a target domain from an input video. Despite its usefulness, the existing networks require enormous computations, necessitating their model compression for wide use. While there exist compression methods that improve computational efficiency in various image/video tasks, a generally-applicable compression method for video-to-video translation has not been studied much. In response, we present Shortcut-V2V, a general-purpose compression framework for video-to-video translation. Shourcut-V2V avoids full inference for every neighboring video frame by approximating the intermediate features of a current frame from those of the previous frame. Moreover, in our framework, a newly-proposed block called AdaBD adaptively blends and deforms features of neighboring frames, which makes more accurate predictions of the intermediate features possible. We conduct quantitative and qualitative evaluations using well-known video-to-video translation models on various tasks to demonstrate the general applicability of our framework. The results show that Shourcut-V2V achieves comparable performance compared to the original video-to-video translation model while saving 3.2-5.7x computational cost and 7.8-44x memory at test time.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Recycle-gan: Unsupervised video retargeting. Proc. of the European Conference on Computer Vision (ECCV), 2018.
  2. Understanding deformable alignment in video super-resolution. Proc. the AAAI Conference on Artificial Intelligence (AAAI), 2021.
  3. Free-form video inpainting with 3d gated convolution and temporal patchgan. Proc. of the IEEE International Conference on Computer Vision (ICCV), 2019.
  4. Mocycle-gan: Unpaired video-to-video translation. In ACM Multimedia (ACM MM), 2019.
  5. The cityscapes dataset for semantic urban scene understanding. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  6. Deformable convolutional networks. Proc. of the IEEE International Conference on Computer Vision (ICCV), 2017.
  7. Spatio-temporal deformable convolution for compressed video quality enhancement. Proc. the AAAI Conference on Artificial Intelligence (AAAI), 2020.
  8. Efficient video super-resolution through recurrent latent space propagation, 2019.
  9. Delta distillation for efficient video processing, 2022.
  10. Deep residual learning for image recognition. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  11. Image-to-image translation with conditional adversarial networks. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  12. Erdn: Equivalent receptive field deformable network for video deblurring. In Proc. of the European Conference on Computer Vision (ECCV), 2022.
  13. Teachers do more than teach: Compressing image-to-image models. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  14. Analyzing and improving the image quality of StyleGAN. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  15. GAN compression: Efficient architectures for interactive conditional gans. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  16. Temporal shift module for efficient video understanding. Proc. of the IEEE International Conference on Computer Vision (ICCV), 2019.
  17. Fdan: Flow-guided deformable alignment network for video super-resolution. arXiv preprint arXiv:2105.05640, 2021.
  18. Efficient semantic video segmentation with per-frame inference. Proc. of the European Conference on Computer Vision (ECCV), 2020.
  19. Fully convolutional networks for semantic segmentation. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
  20. World-consistent video-to-video synthesis. Proc. of the European Conference on Computer Vision (ECCV), 2020.
  21. Adafuse: Adaptive temporal fusion network for efficient action recognition. Proc. the International Conference on Learning Representations (ICLR), 2021.
  22. Va-red22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT: Video adaptive redundancy reduction. Proc. the International Conference on Learning Representations (ICLR), 2021.
  23. Preserving semantic and temporal consistency for unpaired video-to-video translation. ACM Multimedia (ACM MM), 2019.
  24. Online multi-granularity distillation for gan compression. Proc. of the IEEE International Conference on Computer Vision (ICCV), 2021.
  25. Playing for benchmarks. In Proc. of the IEEE International Conference on Computer Vision (ICCV), 2017.
  26. FaceForensics++: Learning to detect manipulated facial images. In Proc. of the IEEE International Conference on Computer Vision (ICCV), 2019.
  27. Stop or forward: Dynamic layer skipping for efficient action recognition. In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023.
  28. Very deep convolutional networks for large-scale image recognition. In Proc. the International Conference on Learning Representations (ICLR), 2015.
  29. Hetconv: Heterogeneous kernel-based convolutions for deep cnns. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  30. Dynamic network quantization for efficient video inference. Proc. of the IEEE International Conference on Computer Vision (ICCV), 2021.
  31. Going deeper with convolutions. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
  32. Hypercon: Image-to-video model transfer for video-to-video translation tasks. IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2021.
  33. TDAN: temporally deformable alignment network for video super-resolution. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  34. Fvd: A new metric for video generation. In Proc. the International Conference on Learning Representations Workshop (ICLRW), 2019.
  35. Learning temporally and semantically consistent unpaired video-to-video translation through pseudo-supervision from synthetic optical flow. Proc. the AAAI Conference on Artificial Intelligence (AAAI), 2022.
  36. Few-shot video-to-video synthesis. Proc. the Advances in Neural Information Processing Systems (NeurIPS), 2019.
  37. Video-to-video synthesis. In Proc. the Advances in Neural Information Processing Systems (NeurIPS), 2018.
  38. High-resolution image synthesis and semantic manipulation with conditional gans. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  39. EDVR: video restoration with enhanced deformable convolutional networks. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), 2019.
  40. Vtoonify: Controllable high-resolution portrait video style transfer. ACM Transactions on Graphics (Proceedings of ACM SIGGRAPH Asia), 2022.
  41. The unreasonable effectiveness of deep features as a perceptual metric. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  42. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proc. of the IEEE International Conference on Computer Vision (ICCV), 2017.
  43. Deformable convnets v2: More deformable, better results. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  44. Fast-vid2vid: Spatial-temporal compression for video-to-video synthesis. Proc. of the European Conference on Computer Vision (ECCV), 2022.

Summary

We haven't generated a summary for this paper yet.