Trustworthy Self-Attention: Enabling the Network to Focus Only on the Most Relevant References (2403.00211v2)
Abstract: The prediction of optical flow for occluded points is still a difficult problem that has not yet been solved. Recent methods use self-attention to find relevant non-occluded points as references for estimating the optical flow of occluded points based on the assumption of self-similarity. However, they rely on visual features of a single image and weak constraints, which are not sufficient to constrain the trained network to focus on erroneous and weakly relevant reference points. We make full use of online occlusion recognition information to construct occlusion extended visual features and two strong constraints, allowing the network to learn to focus only on the most relevant references without requiring occlusion ground truth to participate in the training of the network. Our method adds very few network parameters to the original framework, making it very lightweight. Extensive experiments show that our model has the greatest cross-dataset generalization. Our method achieves much greater error reduction, 18.6%, 16.2%, and 20.1% for all points, non-occluded points, and occluded points respectively from the state-of-the-art GMA-base method, MATCHFlow(GMA), on Sintel Albedo pass. Furthermore, our model achieves state-of-the-art performance on the Sintel bench-marks, ranking #1 among all published methods on Sintel clean pass. The code will be open-source.
- Symmetrical dense optical flow estimation with occlusions detection. International Journal of Computer Vision, 75:371–385, 2007.
- Deep equilibrium optical flow estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 620–630, 2022.
- A naturalistic open source movie for optical flow evaluation. In Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part VI 12, pages 611–625. Springer, 2012.
- Rethinking optical flow from geometric matching consistent perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1337–1347, 2023.
- Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE international conference on computer vision, pages 2758–2766, 2015.
- Mirrorflow: Exploiting symmetries in joint optical flow and occlusion estimation. In Proceedings of the IEEE International Conference on Computer Vision, pages 312–321, 2017.
- Iterative residual refinement for joint optical flow and occlusion estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5754–5763, 2019.
- Flownet 2.0: Evolution of optical flow estimation with deep networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2462–2470, 2017.
- Imposing consistency for optical flow estimation. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, pages 3181–3191, 2022.
- Learning to estimate hidden motions with global motion aggregation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9772–9781, 2021.
- Yoio: You only iterate once by mining and fusing multiple necessary global information in the optical flow estimation. arXiv preprint arXiv:2401.05879, 2024.
- What matters in unsupervised optical flow. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 557–572. Springer, 2020.
- The hci benchmark suite: Stereo and flow ground truth with uncertainties for urban autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 19–28, 2016.
- Multiple object tracking: A literature review. Artificial intelligence, 293:103448, 2021.
- A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4040–4048, 2016.
- Object scene flow for autonomous vehicles. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3061–3070, 2015.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Film: Frame interpolation for large motion. In European Conference on Computer Vision, pages 250–266. Springer, 2022.
- Flowformer++: Masked cost volume autoencoding for pretraining optical flow estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1599–1610, 2023.
- Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8934–8943, 2018.
- Skflow: Learning optical flow with super kernels. Advances in Neural Information Processing Systems, 35:11313–11326, 2022.
- Raft: Recurrent all-pairs field transforms for optical flow. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 402–419. Springer, 2020.
- Gmflow: Learning optical flow via global matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8121–8130, 2022.
- Viser: Video-specific surface embeddings for articulated 3d shape reconstruction. Advances in Neural Information Processing Systems, 34:19326–19338, 2021.
- Maskflownet: Asymmetric feature matching with learnable occlusion mask. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6278–6287, 2020.