GLU-Net: Global-Local Universal Network for Dense Flow and Correspondences (1912.05524v3)

Published 11 Dec 2019 in cs.CV

Abstract: Establishing dense correspondences between a pair of images is an important and general problem, covering geometric matching, optical flow and semantic correspondences. While these applications share fundamental challenges, such as large displacements, pixel-accuracy, and appearance changes, they are currently addressed with specialized network architectures, designed for only one particular task. This severely limits the generalization capabilities of such networks to new scenarios, where e.g. robustness to larger displacements or higher accuracy is required. In this work, we propose a universal network architecture that is directly applicable to all the aforementioned dense correspondence problems. We achieve both high accuracy and robustness to large displacements by investigating the combined use of global and local correlation layers. We further propose an adaptive resolution strategy, allowing our network to operate on virtually any input image resolution. The proposed GLU-Net achieves state-of-the-art performance for geometric and semantic matching as well as optical flow, when using the same network and weights. Code and trained models are available at https://github.com/PruneTruong/GLU-Net.

Citations (168)

View on Semantic Scholar

Summary

The paper introduces GLU-Net as a universal architecture unifying dense correspondence tasks with integrated global and local correlation layers.
It employs an adaptive resolution strategy and self-supervised training to deliver high-precision outcomes in geometric, semantic, and optical flow benchmarks.
Experimental results on datasets like HPatches, ETH3D, TSS, and KITTI underscore the model's robustness and versatility across varied image analysis challenges.

Overview of GLU-Net: Global-Local Universal Network for Dense Flow and Correspondences

Dense correspondence estimation between image pairs is a persistent challenge in computer vision, demonstrating crucial applications across geometric matching, optical flow, and semantic correspondences. Although specific neural architectures for these tasks have been developed, their limited generalizability and susceptibility to varying challenges—like large displacements, pixel accuracy, and changes in appearance—pose significant limitations. This paper presents GLU-Net, a novel universal network architecture designed to robustly and accurately tackle the various dense correspondence problems using a unified approach.

Key Contributions

GLU-Net integrates several significant advancements in the field of dense correspondence estimation:

Universal Architecture: GLU-Net is proposed as a singular network architecture capable of handling different dense correspondence tasks, from semantic matching to optical flow, using the same model parameters across tasks.
Global and Local Correlations: Combining global and local correlation layers allows GLU-Net to accommodate both large viewpoint shifts and fine displacements efficiently. This strategic arrangement bridges the strengths of global matching capabilities with the precision afforded by local correlations.
Adaptive Resolution Strategy: To overcome the constraint of fixed input resolution imposed by global layers, an adaptive resolution strategy is introduced. This enables the network to process input images of any resolution effectively, boosting its performance in high-resolution scenarios.
Self-Supervised Training: The architecture is trained in a self-supervised fashion using synthetic warps of actual images, obviating the need for annotated ground-truth flows.

The GLU-Net architecture employs a feature pyramid-based coarse-to-fine strategy, an approach well-documented for its success in field-specific tasks within computer vision. By iterating through a hierarchy of feature resolutions, the network systematically refines its estimation of dense correspondences.

Experimental Results

Upon testing, GLU-Net demonstrated superior performance across multiple datasets representative of diverse tasks:

Geometric Correspondence: On datasets like HPatches and ETH3D, GLU-Net surpassed existing state-of-the-art methods like DGC-Net, suggesting enhanced robustness to geometric transformations and improved accuracy with pixel-level precision.
Semantic Correspondence: Without task-specific retraining, GLU-Net achieved high PCK scores on the TSS dataset, highlighting its ability to generalize across tasks with substantial intra-class variability.
Optical Flow: While not explicitly trained on optical flow datasets, GLU-Net yielded competitive results on KITTI-2012 and KITTI-2015 datasets, underscoring its potential for high-efficacy adoption in motion estimation tasks.

Implications and Future Work

The holistic model presented in this paper opens up new directions for research in universal neural architectures, targeting seamless integration of various image correspondence challenges within a singular framework. This marks a shift towards developing more adaptable, scalable solutions that can cater to a multitude of tasks without necessitating additional task-specific model designs or parameter reconfigurations. Future developments could explore enhancing self-supervised learning paradigms, potentially integrating real-world non-rigid deformations and dynamically moving object scenarios to further refine the network’s robustness and precision.

In conclusion, GLU-Net exemplifies a decisive step toward a more unified, efficient approach in dense correspondence estimation across differing domains of computer vision. Its ability to operate effectively across geometric, semantic, and optical flow tasks signals important advancements in the field, paving the way towards the next frontier of AI-enabled image analysis.

Related Papers

GitHub

GitHub - PruneTruong/GLU-Net: Official implementation of the paper GLU-Net (CVPR2020-Oral) (233 stars)

YouTube

Show All Videos