UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss (1711.07837v1)

Published 21 Nov 2017 in cs.CV

Abstract: In the era of end-to-end deep learning, many advances in computer vision are driven by large amounts of labeled data. In the optical flow setting, however, obtaining dense per-pixel ground truth for real scenes is difficult and thus such data is rare. Therefore, recent end-to-end convolutional networks for optical flow rely on synthetic datasets for supervision, but the domain mismatch between training and test scenarios continues to be a challenge. Inspired by classical energy-based optical flow methods, we design an unsupervised loss based on occlusion-aware bidirectional flow estimation and the robust census transform to circumvent the need for ground truth flow. On the KITTI benchmarks, our unsupervised approach outperforms previous unsupervised deep networks by a large margin, and is even more accurate than similar supervised methods trained on synthetic datasets alone. By optionally fine-tuning on the KITTI training data, our method achieves competitive optical flow accuracy on the KITTI 2012 and 2015 benchmarks, thus in addition enabling generic pre-training of supervised networks for datasets with limited amounts of ground truth.

Citations (551)

View on Semantic Scholar

Summary

The paper introduces an unsupervised learning framework that eliminates the need for synthetic ground truth by leveraging a novel bidirectional census loss.
The methodology employs bidirectional flow estimation with occlusion handling and iterative network stacking to achieve enhanced accuracy on KITTI benchmarks.
Experimental results show marked reductions in average endpoint error, highlighting the framework's robustness and potential for real-world optical flow applications.

UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss

The paper "UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss" by Meister et al. presents a novel approach to mitigating the challenges associated with obtaining dense per-pixel ground truth data for optical flow in real-world scenes. The authors introduce an unsupervised learning framework, emphasizing an innovative loss function for training convolutional neural networks (CNNs) without relying on synthetic datasets.

Key Contributions

The authors address the domain mismatch issue by designing a robust loss function inspired by classical energy-based approaches. Their unsupervised loss leverages bidirectional flow estimation and occlusion-aware modeling alongside a census transform for enhanced robustness on real images. With these innovations, the paper claims substantial improvements over previous unsupervised deep networks in the KITTI benchmarks and even surpasses some supervised methods trained solely on synthetic data.

Methodology

The approach draws on two primary enhancements over traditional supervised methods:

Unsupervised Photometric Reconstruction Loss: This loss replaces the need for synthetic ground truth by aligning photometric properties across consecutive frames. The method involves computing bidirectional optical flow and employing the census transform for robust comparison, further enhanced by incorporating occlusions explicitly.
Iterative Refinement via Stacking: The architecture refines flow estimations iteratively using a stack of FlowNet networks, improving the accuracy and generalization ability across diverse datasets.

Evaluation and Results

The experiments reveal that the unsupervised model trained on the challenging KITTI dataset achieved marked improvements in accuracy, specifically reducing the average endpoint error significantly compared to prior unsupervised methods. The approach demonstrates competitive results, even when compared to supervised techniques fine-tuned on real-world samples.

KITTI Benchmarks: The proposed model outperformed previous unsupervised models by a considerable margin, demonstrating strong performance metrics such as reduced endpoint errors across both KITTI 2012 and 2015.
Generalization: Additional experiments conducted on datasets such as Middlebury and MPI Sintel confirm the broader applicability and robustness of the approach beyond the original training domain.

Implications and Future Work

The findings suggest that unsupervised learning using a well-designed loss function can mitigate the dependency on synthetic datasets, expanding the potential applicability of CNN-based optical flow estimation in real-world scenarios. The research highlights the importance of accurate loss formulations, which may drive future improvements in unsupervised methods.

Looking ahead, further exploration into more sophisticated unsupervised losses could propel advancements in the field, potentially narrowing the gap with fully supervised paradigms while maintaining the flexibility of training on diverse data sources without ground truth constraints.

In summary, this paper presents a significant step toward realizing unsupervised optical flow estimation's potential, providing a framework that could be foundational for subsequent research and development in AI and computer vision.

PDF Markdown