What Matters in Unsupervised Optical Flow (2006.04902v2)

Published 8 Jun 2020 in cs.CV, cs.LG, and eess.IV

Abstract: We systematically compare and analyze a set of key components in unsupervised optical flow to identify which photometric loss, occlusion handling, and smoothness regularization is most effective. Alongside this investigation we construct a number of novel improvements to unsupervised flow models, such as cost volume normalization, stopping the gradient at the occlusion mask, encouraging smoothness before upsampling the flow field, and continual self-supervision with image resizing. By combining the results of our investigation with our improved model components, we are able to present a new unsupervised flow technique that significantly outperforms the previous unsupervised state-of-the-art and performs on par with supervised FlowNet2 on the KITTI 2015 dataset, while also being significantly simpler than related approaches.

Citations (179)

View on Semantic Scholar

Summary

The paper provides a comprehensive analysis comparing photometric losses, occlusion techniques, and smoothness constraints to optimize unsupervised optical flow.
The study introduces innovations such as cost volume normalization and gradient stopping at occlusion masks to mitigate degraded gradients.
The unified UFlow framework achieves state-of-the-art performance on benchmarks like KITTI 2015, rivaling results of supervised methods.

An Overview of "What Matters in Unsupervised Optical Flow"

This paper by Jonschkowski et al. explores the domain of unsupervised optical flow, providing a critical analysis of key components and proposing improvements that lead to significant performance gains. The work stands out through its systematic approach to understanding the interplay between various components like photometric loss, occlusion handling, and smoothness regularization in the context of neural network models for optical flow estimation.

Optical flow, essentially the dense pixel-level estimation of motion between two consecutive images, is a cornerstone for numerous computer vision applications such as visual odometry and object tracking. However, the task of estimating optical flow is challenging due to the lack of labelled real-world datasets, pushing the community to explore unsupervised methodologies which bypass the need for ground-truth labels by leveraging the abundant unlabeled video data available online.

Key Contributions and Findings

The authors present three primary contributions to the field:

Comprehensive Comparison and Analysis: The paper systematically compares different photometric losses (L1, Charbonnier, Census, and SSIM), occlusion estimation techniques, and smoothness constraints. Additionally, the paper reflects on the impacts of model pretraining, image resolution, data augmentation, and batch size. These elements are not isolated but are instead analyzed in conjunction to determine optimal configurations.
Innovative Model Improvements: The researchers propose several enhancements to existing unsupervised flow models. Notably, these include cost volume normalization, gradient stopping at occlusion masks, and applying smoothness at the native flow resolution. A critical realization here is that stopping the gradient at the occlusion mask is essential to prevent degradation caused by incorrect gradients entering the occlusion maps.
Unified Framework - UFlow: By integrating the best-performing modifications, they develop UFlow, a unified framework that not only achieves new state-of-the-art performance in unsupervised optical flow but does so with reduced complexity compared to earlier methods. UFlow achieves remarkable results that are comparable to supervised methods on challenging datasets like KITTI 2015 without incurring the overheads of multi-frame or multi-modal approaches.

Numerical Results

The proposed UFlow model demonstrates impressive results, outperforming previous unsupervised methods on datasets such as Flying Chairs, Sintel, and KITTI. Specifically, it matches the performance of the well-regarded supervised model FlowNet2 on the KITTI 2015 benchmark, a noteworthy achievement given the absence of training on labeled data. The authors rigorously evaluate their model across established metrics, providing transparency and comparability with other state-of-the-art methods.

Implications and Future Directions

The research offers several theoretical and practical implications. It simplifies the existing landscape of unsupervised optical flow by identifying and standardizing effective components, thus setting a benchmark for future research. On a practical level, the improvements hold promise for enhancing downstream tasks that rely on optical flow, such as video enhancement and automated driving systems.

Looking ahead, the paper paves the way for further exploration into loss functions and regularization techniques in unsupervised learning settings. Given that unsupervised optical flow still falls short of fully capturing complex motion and overcoming brightness inconsistencies, future work could explore introducing more sophisticated loss functions or integrating domain adaptation techniques to better mimic real-world scenarios.

This work is of substantial interest to researchers aiming to further understand, implement, and extend effective unsupervised learning methodologies in computer vision, and it lays the groundwork for continued advancements in this field.

PDF Markdown

Related Papers

YouTube

Show All Videos