Global Matching with Overlapping Attention for Optical Flow Estimation (2203.11335v1)

Published 21 Mar 2022 in cs.CV

Abstract: Optical flow estimation is a fundamental task in computer vision. Recent direct-regression methods using deep neural networks achieve remarkable performance improvement. However, they do not explicitly capture long-term motion correspondences and thus cannot handle large motions effectively. In this paper, inspired by the traditional matching-optimization methods where matching is introduced to handle large displacements before energy-based optimizations, we introduce a simple but effective global matching step before the direct regression and develop a learning-based matching-optimization framework, namely GMFlowNet. In GMFlowNet, global matching is efficiently calculated by applying argmax on 4D cost volumes. Additionally, to improve the matching quality, we propose patch-based overlapping attention to extract large context features. Extensive experiments demonstrate that GMFlowNet outperforms RAFT, the most popular optimization-only method, by a large margin and achieves state-of-the-art performance on standard benchmarks. Thanks to the matching and overlapping attention, GMFlowNet obtains major improvements on the predictions for textureless regions and large motions. Our code is made publicly available at https://github.com/xiaofeng94/GMFlowNet

Citations (70)

View on Semantic Scholar

Summary

The paper introduces GMFlowNet, which integrates global matching with overlapping attention to address large displacement challenges in optical flow estimation.
It employs a two-stage strategy, using a 4D cost volume and a Patch-based OverLapping Attention block for effective match refinement.
Experimental results demonstrate notable improvements over RAFT, with gains of 22.4% and 4.7% on challenging regions of Sintel and KITTI datasets.

Global Matching with Overlapping Attention for Optical Flow Estimation

This paper introduces GMFlowNet, an innovative approach designed for optical flow estimation, which marries the strengths of global matching with overlapping attention mechanisms, culminating in a novel learning-based matching-optimization framework for improved optical flow estimation. Addressing limitations in existing deep learning methods that struggle with capturing long-term motion correspondences especially during large motion events, GMFlowNet demonstrates significant performance gains over its predecessors, like the widely referenced optimization-only method, RAFT.

Methodological Advancements

The approach leverages a two-part strategy, first incorporating global matching to tackle large displacements followed by efficient hypothesis refinement through a direct regression process. Key to this framework is the integration of a global matching module which processes a 4D cost volume through an argmax operation to extract matches, a method that differentiates it from the traditionally hand-crafted and computationally intensive approaches.

Further enhancing performance, GMFlowNet introduces the Patch-based OverLapping Attention (POLA) block to address ambiguities, enhancing feature extraction by aggregating regional context to improve matching accuracy, particularly in textureless or repetitive pattern areas.

Experimental Evaluation

Experimental validations reveal GMFlowNet’s superior performance against state-of-the-art benchmarks. Noteworthy is its substantial outperformance of RAFT, particularly on standard benchmarks such as Sintel and KITTI datasets, with significant improvements of 22.4% and 4.7% in areas with large displacements (greater than 40 pixels) on the Sintel clean and final datasets respectively. This illustrates the model’s adeptness at managing large motion scenarios, a traditional challenge for previous deep learning models focused on optical flow.

Moreover, GMFlowNet maintains competitive efficiency, seamlessly integrating the matching step with negligible computational overhead, thereby ensuring enhanced optical flow predictions without sacrificing speed, with a measured inference impact of approximately just 0.52% extra time compared to RAFT.

Theoretical and Practical Implications

The theoretical implications of this work include advancing the field's understanding of integrating traditional matching techniques with contemporary learning paradigms to address the complex problem of optical flow estimation. Practically, GMFlowNet's architecture suggests potential applications in real-time video processing, where precise motion capture is crucial, such as in autonomous driving, video analysis, and enhancement systems.

Future Directions

While GMFlowNet enhances optical flow estimation performance significantly, areas for further exploration include optimizing the computational efficiency and memory demands of the overlapping attention mechanisms for deployment in more resource-constrained environments. Additionally, future research could explore further hybridization of GMFlowNet’s methodologies with other emerging neural architectures to potentially yield even greater increases in efficiency and accuracy in optical flow tasks.

In summary, GMFlowNet represents a meaningful evolution in optical flow estimation by effectively coupling global matching and contextual attention mechanisms, paving the way for future advancements by tackling entrenched challenges in capturing and regressing complex motion patterns within dynamic imaging contexts.