UFM: A Simple Path towards Unified Dense Correspondence with Flow (2506.09278v1)

Published 10 Jun 2025 in cs.CV, cs.LG, and cs.RO

Abstract: Dense image correspondence is central to many applications, such as visual odometry, 3D reconstruction, object association, and re-identification. Historically, dense correspondence has been tackled separately for wide-baseline scenarios and optical flow estimation, despite the common goal of matching content between two images. In this paper, we develop a Unified Flow & Matching model (UFM), which is trained on unified data for pixels that are co-visible in both source and target images. UFM uses a simple, generic transformer architecture that directly regresses the (u,v) flow. It is easier to train and more accurate for large flows compared to the typical coarse-to-fine cost volumes in prior work. UFM is 28% more accurate than state-of-the-art flow methods (Unimatch), while also having 62% less error and 6.7x faster than dense wide-baseline matchers (RoMa). UFM is the first to demonstrate that unified training can outperform specialized approaches across both domains. This result enables fast, general-purpose correspondence and opens new directions for multi-modal, long-range, and real-time correspondence tasks.

Summary

The paper introduces a unified model for dense correspondence, blending optical flow and wide-baseline matching using a transformer-based approach.
The paper demonstrates that UFM achieves 62% less error and 6.7x faster runtime in zero-shot evaluations across multiple benchmarks.
The robust loss and aggregation of 12 diverse datasets offer practical benefits for applications in robotics, AR/VR, and automotive real-time processing.

Overview of Unified Flow Matching Model (UFM)

The paper introduces a Unified Flow Matching model (UFM) that aims to unify dense image correspondence tasks, specifically optical flow and wide-baseline matching. Dense image correspondence refers to identifying pixel positions in one image relative to another, which is crucial for various computer vision applications such as visual odometry, 3D reconstruction, and image warping. Traditionally, the tasks involving dense correspondence—optical flow and wide-baseline matching—have been addressed independently due to differing assumptions: optical flow typically deals with small motion between temporally adjacent frames, whereas wide-baseline matching handles changes due to different viewpoints and scene angles. The distinct approaches have led to domain-specific models that excel within their scope but struggle outside their specialized context.

Model Architecture and Training

The UFM employs a transformer-based architecture that regresses dense correspondence and covisibility maps directly, bypassing the traditional coarse-to-fine approaches used in optical flow methods. This architecture benefits from scaling across a diverse training dataset that includes $12$ different datasets, spanning static scenes, optical flow, and posed rigid objects. The model effectively aggregates these datasets using a robust regression loss, showing the potential to outperform specialized models in terms of speed and accuracy.

Key Findings

The UFM model delivers significant improvements in precision and speed over state-of-the-art dense correspondence methods, as evidenced in zero-shot evaluation settings on multiple benchmarks (ETH3D, DTU, TA-WB). The model achieves $62\%$ less error and $6.7x$ faster runtime compared to the best dense baseline. Moreover, the incorporation of a refinement step further enhances its accuracy in challenging wide-baseline scenarios.

Practical and Theoretical Implications

The implications of this research are extensive. Practically, the unified approach promises to streamline correspondence tasks across varied applications such as robotics, AR/VR, and automotive industries, where real-time processing and robustness against varied conditions are key. Theoretically, the success of UFM highlights the potential to further explore unified models in image correspondence, possibly extending the framework to include more modalities or integrate semantic correspondence tasks.

Speculation on Future Developments

Looking ahead, the framework introduced in this paper appears promising for future developments in AI that demand cross-domain generalization and computational efficiency. The integration of refinement techniques and semantic matching capabilities may further improve robustness, paving the way for more comprehensive correspondence prediction models capable of handling increasingly complex real-world scenarios.

In conclusion, while the UFM offers remarkable benefits in accuracy and speed for dense correspondence tasks, the limitations in semantic matching capabilities suggest areas for future enhancement, potentially involving refinement of encoder freezing techniques or exploration of additional semantic datasets.

Related Papers

YouTube

Show All Videos