SMURF: Self-Teaching Multi-Frame Unsupervised RAFT with Full-Image Warping (2105.07014v1)

Published 14 May 2021 in cs.CV

Abstract: We present SMURF, a method for unsupervised learning of optical flow that improves state of the art on all benchmarks by $36\%$ to $40\%$ (over the prior best method UFlow) and even outperforms several supervised approaches such as PWC-Net and FlowNet2. Our method integrates architecture improvements from supervised optical flow, i.e. the RAFT model, with new ideas for unsupervised learning that include a sequence-aware self-supervision loss, a technique for handling out-of-frame motion, and an approach for learning effectively from multi-frame video data while still only requiring two frames for inference.

Citations (77)

View on Semantic Scholar

Summary

The paper introduces SMURF, a novel unsupervised method that adapts the RAFT model architecture for accurate optical flow estimation without synthetic data.
SMURF achieves a 36% to 40% reduction in error across Sintel Clean, Sintel Final, and KITTI 2015 benchmarks, outperforming previous unsupervised methods.
Key techniques enabling SMURF include unsupervised RAFT integration, refined self-supervision, and a novel full-image warping method addressing occlusions and generalization. গাড়ি

Essay: SMURF - Unsupervised Learning of Optical Flow

The presented paper introduces SMURF, a novel method for unsupervised learning of optical flow, demonstrating notable improvements in accuracy across established benchmarks. The method employs unsupervised learning techniques combined with architectural adaptations and advancements in self-supervision methods. This enhances the learning capability of the deep neural networks used for optical flow estimation without the necessity for synthetically annotated training data.

Optical flow refers to the motion of objects within a sequence of images. Estimation of optical flow is critical for numerous computer vision tasks, including visual odometry, depth estimation, and object tracking. Historically, estimation used optimization methods that leveraged smoothness and pixel similarity constraints but often required labeled data for superior performance. While deep learning models, such as PWC-Net and FlowNet2, have improved these estimates through supervised learning, obtaining ground-truth flow can be challenging, especially in real-world scenarios.

SMURF integrates the RAFT model, which was originally designed for supervised learning, into an unsupervised framework by making substantial architectural adjustments and applying innovative learning methods such as sequence-aware loss functions, full-image warping, and multi-frame self-supervision. Some key contributions of SMURF include:

Unsupervised RAFT Integration: The RAFT model's architecture is modified to accommodate unsupervised learning using instance normalization and a sequence of loss functions applied to all predictions, not solely the final output.
Self-Supervision Refinement: SMURF introduces self-supervision without masking, offering enhanced learning signals for pixels near image borders. Self-generated flow provides further training supervision by using augmented versions of training images, assisting the model in learning occlusion forecasting.
Full-Image Warping Technique: This pioneering technique resolves issues with image frame occlusions by expanding data reference complexity beyond cropped training images, incorporating full image references during warp computations for loss optimizations.
Multi-Frame Supervised Learning: Using flow predictions across multiple frames, the model self-generates labels to inpaint occluded regions, improving prediction accuracy in complex visual contexts. Despite multi-frame data handling, the inference process here is constrained to two images, significantly limiting complexity during runtime.

Numerical evaluations depict SMURF attaining a reduction in error by 36% to 40% across essential benchmarks – Sintel Clean, Sintel Final, and KITTI 2015 – outperforming not just previous unsupervised methods like UFlow, but also some supervised approaches. These results simultaneously address occlusions and generalization deficiencies noticeable in prior methods.

Theoretical implications of this work underline the benefits of unsupervised learning, especially in domains where acquiring labeled datasets is impractical. Though SMURF effectively bridges the gap between supervised and unsupervised modalities, unsupervised learning's tendency to capitalize on apparent visual motion rather than discerning accurate physical object movement remains a pervasive challenge. Future research directions could explore the integration of scene flow and semantic reasoning, evolving unsupervised methodologies towards conceptual understanding required for physical predictions.

SMURF's contribution to unsupervised optical flow estimation represents an important advancement by providing a framework that significantly minimizes dependency on synthetic data, paves the way for practical applications in unencumbered domains, and enhances the performance of computer vision systems tasked with estimating dynamic scenarios. The methodologies presented may catalyze further exploration into applying similar techniques in adjacent computer vision fields.

Related Papers

YouTube

Show All Videos