Sparse Global Matching for Video Frame Interpolation with Large Motion (2404.06913v3)
Abstract: Large motion poses a critical challenge in Video Frame Interpolation (VFI) task. Existing methods are often constrained by limited receptive fields, resulting in sub-optimal performance when handling scenarios with large motion. In this paper, we introduce a new pipeline for VFI, which can effectively integrate global-level information to alleviate issues associated with large motion. Specifically, we first estimate a pair of initial intermediate flows using a high-resolution feature map for extracting local details. Then, we incorporate a sparse global matching branch to compensate for flow estimation, which consists of identifying flaws in initial flows and generating sparse flow compensation with a global receptive field. Finally, we adaptively merge the initial flow estimation with global flow compensation, yielding a more accurate intermediate flow. To evaluate the effectiveness of our method in handling large motion, we carefully curate a more challenging subset from commonly used benchmarks. Our method demonstrates the state-of-the-art performance on these VFI subsets with large motion.
- Learning to synthesize motion blur. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6840–6848, 2019.
- End-to-end object detection with transformers. In European conference on computer vision, pages 213–229, 2020.
- Videoinr: Learning video implicit neural representation for continuous space-time super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2047–2057, 2022.
- Channel attention is all you need for video frame interpolation. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 10663–10671, 2020.
- Twins: Revisiting the design of spatial attention in vision transformers. In NeurIPS 2021, 2021.
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021.
- Deepstereo: Learning to predict new views from the world’s imagery. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5515–5524, 2016.
- Many-to-many splatting for efficient video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3553–3562, 2022.
- Real-time intermediate flow estimation for video frame interpolation. In European Conference on Computer Vision, pages 624–642, 2022.
- Scale-adaptive feature aggregation for efficient space-time video super-resolution. In Winter Conference on Applications of Computer Vision, 2024.
- LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pages 8981–8989, 2018.
- Neighbor correspondence matching for flow-based video frame synthesis. In Proceedings of the 30th ACM International Conference on Multimedia, pages 5389–5397, 2022.
- Super slomo: High quality estimation of multiple intermediate frames for video interpolation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9000–9008, 2018.
- Cotr: Correspondence transformer for matching across images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6207–6217, 2021.
- A unified pyramid recurrent network for video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1578–1587, 2023.
- Cross-attention transformer for video interpolation. In Proceedings of the Asian Conference on Computer Vision Workshops, pages 320–337, 2022.
- Ifrnet: Intermediate feature refine network for efficient frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1969–1978, 2022.
- Amt: All-pairs multi-field transforms for efficient frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9801–9810, 2023.
- Enhanced quadratic video interpolation. In Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part IV 16, pages 41–56, 2020.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.
- Video frame interpolation with transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3532–3542, 2022.
- Xiph. org video test media (derf’s collection). Online, https://media. xiph. org/video/derf, 6, 1994.
- Softmax splatting for video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5437–5446, 2020.
- Bmbc: Bilateral motion estimation with bilateral cost volume for video interpolation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16, pages 109–125, 2020.
- Asymmetric bilateral motion estimation for video frame interpolation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14539–14548, 2021.
- Biformer: Learning bilateral motion estimation via bilateral transformer for 4k video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1568–1577, 2023.
- Film: Frame interpolation for large motion. In European Conference on Computer Vision, pages 250–266, 2022.
- Video frame interpolation transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17482–17491, 2022.
- Xvfi: extreme video frame interpolation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 14489–14498, 2021.
- Deep animation video interpolation in the wild. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6587–6595, 2021.
- Loftr: Detector-free local feature matching with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8922–8931, 2021.
- Raft: Recurrent all-pairs field transforms for optical flow. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 402–419, 2020.
- Video compression through image interpolation. In Proceedings of the European conference on computer vision (ECCV), pages 416–431, 2018.
- Gmflow: Learning optical flow via global matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8121–8130, 2022.
- Quadratic video interpolation. Advances in Neural Information Processing Systems, 32, 2019.
- Video enhancement with task-oriented flow. International Journal of Computer Vision, 127:1106–1125, 2019.
- Extracting motion and appearance via inter-frame attention for efficient video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5682–5692, 2023.
- Blur interpolation transformer for real-world motion from blur. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5713–5723, 2023.
- Exploring motion ambiguity and alignment for high-quality video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22169–22179, 2023.
- Deformable detr: Deformable transformers for end-to-end object detection. In International Conference on Learning Representations, 2021.