Models Matter, So Does Training: An Empirical Study of CNNs for Optical Flow Estimation (1809.05571v1)

Published 14 Sep 2018 in cs.CV

Abstract: We investigate two crucial and closely related aspects of CNNs for optical flow estimation: models and training. First, we design a compact but effective CNN model, called PWC-Net, according to simple and well-established principles: pyramidal processing, warping, and cost volume processing. PWC-Net is 17 times smaller in size, 2 times faster in inference, and 11\% more accurate on Sintel final than the recent FlowNet2 model. It is the winning entry in the optical flow competition of the robust vision challenge. Next, we experimentally analyze the sources of our performance gains. In particular, we use the same training procedure of PWC-Net to retrain FlowNetC, a sub-network of FlowNet2. The retrained FlowNetC is 56\% more accurate on Sintel final than the previously trained one and even 5\% more accurate than the FlowNet2 model. We further improve the training procedure and increase the accuracy of PWC-Net on Sintel by 10\% and on KITTI 2012 and 2015 by 20\%. Our newly trained model parameters and training protocols will be available on https://github.com/NVlabs/PWC-Net

Citations (188)

View on Semantic Scholar

Summary

The paper demonstrates that both model design and training procedures are crucial, with tailored protocols boosting optical flow performance and efficiency.
It introduces PWC-Net, which uses pyramidal feature extraction, warping, and cost volume processing to achieve a 17-fold reduction in size and 11% higher accuracy on Sintel.
Optimized training methods improved FlowNetC accuracy by 56% and reduced errors by 10-20% on Sintel and KITTI benchmarks, underscoring the impact of training techniques.

An Empirical Study on CNNs for Optical Flow Estimation

The paper, "Models Matter, So Does Training: An Empirical Study of CNNs for Optical Flow Estimation," presents an in-depth exploration of convolutional neural networks (CNNs) specifically designed for optical flow estimation. The authors introduce a compact model, PWC-Net, layering fundamental principles such as pyramidal processing, warping, and cost volume processing. The research results in a model significantly smaller and more efficient than previous attempts, specifically achieving a 17-fold reduction in size compared to the established FlowNet2 model.

Architectural Design of PWC-Net

PWC-Net's architecture leverages a pyramidal feature extractor, extracting multi-resolution features and integrating warping of features—a method to estimate large motions efficiently. This warping mechanism in combination with the cost volume processing at multiple scales allows the model to achieve substantial improvements in accuracy (11% on the Sintel final dataset), speed (twice as fast), and computational efficiency (17 times smaller) relative to FlowNet2.

The authors underscore the importance of both model architecture and training procedures in the resulting efficacy of optical flow models. They demonstrate that the same models trained with differing methodologies can yield significantly disparate results—a pivotal aspect often overlooked in existing literature.

Training Methodology

A pivotal contribution of this work involves illuminating the profound impact of training protocols on model performance. By retraining FlowNetC using the PWC-Net training regimen, the model achieved a 56% improvement in accuracy. The authors further optimized PWC-Net training, achieving substantial gains on both the KITTI and Sintel benchmarks.

Specifically, the revised training procedure improved Sintel's average end-point error (EPE) by 10% and KITTI 2012 and 2015 errors by 20%. The paper highlights the necessity for thoughtful and tailored training, illustrating that careful dataset scheduling, learning rate adjustments, and loss function enhancements can be as critical as the model architecture itself.

Implications and Future Directions

The findings advocate for integrating classical domain knowledge with modern deep learning. Such synergies improve efficiency and model complexity, promising for applications demanding real-time performance and deployment on mobile or embedded systems. The paper's results, made openly available, lay the groundwork for future research and comparative analysis.

Looking forward, PWC-Net could serve as a foundation for larger models, potentially stacked into more complex networks akin to what has been previously attempted with FlowNet2. Moreover, the implications of this paper stress exploring further into architectural innovations combined with optimal training protocols could bridge the gap toward higher efficiency models with robust real-time applications.

Overall, this empirical paper not only provides substantial advances in optical flow estimation but also prompts a reevaluation of evaluation metrics and training adequacy in CNN research, setting a precedent for comprehensive analysis in future investigations of neural architectures.

PDF Markdown

Related Papers

GitHub

GitHub - NVlabs/PWC-Net: PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume, CVPR 2018 (Oral) (1,654 stars)