Skin the sheep not only once: Reusing Various Depth Datasets to Drive the Learning of Optical Flow (2310.01833v1)
Abstract: Optical flow estimation is crucial for various applications in vision and robotics. As the difficulty of collecting ground truth optical flow in real-world scenarios, most of the existing methods of learning optical flow still adopt synthetic dataset for supervised training or utilize photometric consistency across temporally adjacent video frames to drive the unsupervised learning, where the former typically has issues of generalizability while the latter usually performs worse than the supervised ones. To tackle such challenges, we propose to leverage the geometric connection between optical flow estimation and stereo matching (based on the similarity upon finding pixel correspondences across images) to unify various real-world depth estimation datasets for generating supervised training data upon optical flow. Specifically, we turn the monocular depth datasets into stereo ones via synthesizing virtual disparity, thus leading to the flows along the horizontal direction; moreover, we introduce virtual camera motion into stereo data to produce additional flows along the vertical direction. Furthermore, we propose applying geometric augmentations on one image of an optical flow pair, encouraging the optical flow estimator to learn from more challenging cases. Lastly, as the optical flow maps under different geometric augmentations actually exhibit distinct characteristics, an auxiliary classifier which trains to identify the type of augmentation from the appearance of the flow map is utilized to further enhance the learning of the optical flow estimator. Our proposed method is general and is not tied to any particular flow estimator, where extensive experiments based on various datasets and optical flow estimation models verify its efficacy and superiority.
- B. D. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision,” in International Joint Conference on Artificial Intelligence (IJCAI), 1981.
- T. Brox, A. Bruhn, N. Papenberg, and J. Weickert, “High accuracy optical flow estimation based on a theory for warping,” in European Conference on Computer Vision (ECCV), 2004.
- S. Baker, D. Scharstein, J. Lewis, S. Roth, M. J. Black, and R. Szeliski, “A database and evaluation methodology for optical flow,” International Journal of Computer Vision (IJCV), 2011.
- M. Menze, C. Heipke, and A. Geiger, “Discrete optimization for optical flow,” in German Conference on Pattern Recognition (GCPR), 2015.
- Q. Chen and V. Koltun, “Full flow: Optical flow estimation by global optimization over regular grids,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- A. Dosovitskiy, P. Fischer, E. Ilg, P. Hausser, C. Hazirbas, V. Golkov, P. Van Der Smagt, D. Cremers, and T. Brox, “Flownet: Learning optical flow with convolutional networks,” in IEEE International Conference on Computer Vision (ICCV), 2015.
- A. Ranjan and M. J. Black, “Optical flow estimation using a spatial pyramid network,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- D. Sun, X. Yang, M.-Y. Liu, and J. Kautz, “Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- Z. Teed and J. Deng, “Raft: Recurrent all-pairs field transforms for optical flow,” in European Conference on Computer Vision (ECCV), 2020.
- F. Zhang, O. J. Woodford, V. A. Prisacariu, and P. H. Torr, “Separable flow: Learning motion cost volumes for optical flow estimation,” in IEEE International Conference on Computer Vision (ICCV), 2021.
- H. Xu, J. Zhang, J. Cai, H. Rezatofighi, and D. Tao, “Gmflow: Learning optical flow via global matching,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox, “Flownet 2.0: Evolution of optical flow estimation with deep networks,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- T.-W. Hui, X. Tang, and C. C. Loy, “Liteflownet: A lightweight convolutional neural network for optical flow estimation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- N. Mayer, E. Ilg, P. Hausser, P. Fischer, D. Cremers, A. Dosovitskiy, and T. Brox, “A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- D. J. Butler, J. Wulff, G. B. Stanley, and M. J. Black, “A naturalistic open source movie for optical flow evaluation,” in European Conference on Computer Vision (ECCV), 2012.
- A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
- M. Menze and A. Geiger, “Object scene flow for autonomous vehicles,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
- S. Meister, J. Hur, and S. Roth, “Unflow: Unsupervised learning of optical flow with a bidirectional census loss,” in AAAI Conference on Artificial Intelligence (AAAI), 2018.
- Y. Zou, Z. Luo, and J.-B. Huang, “Df-net: Unsupervised joint learning of depth and flow using cross-task consistency,” in European Conference on Computer Vision (ECCV), 2018.
- P. Liu, M. Lyu, I. King, and J. Xu, “Selflow: Self-supervised learning of optical flow,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- H.-Y. Lai, Y.-H. Tsai, and W.-C. Chiu, “Bridging stereo matching and optical flow via spatiotemporal correspondence,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- F. Aleotti, M. Poggi, and S. Mattoccia, “Learning optical flow from still images,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- C. Chi, Q. Wang, T. Hao, P. Guo, and X. Yang, “Feature-level collaboration: Joint unsupervised learning of optical flow, stereo depth and camera motion,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- V. Guizilini, K.-H. Lee, R. Ambruş, and A. Gaidon, “Learning optical flow, depth, and scene flow without real-world labels,” IEEE Robotics and Automation Letters, 2022.
- J. Watson, O. M. Aodha, D. Turmukhambetov, G. J. Brostow, and M. Firman, “Learning stereo from single images,” in European Conference on Computer Vision (ECCV), 2020.
- R. Ranftl, K. Lasinger, D. Hafner, K. Schindler, and V. Koltun, “Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022.
- Z. Yin and J. Shi, “Geonet: Unsupervised learning of dense depth, optical flow and camera pose,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- K. Xian, C. Shen, Z. Cao, H. Lu, Y. Xiao, R. Li, and Z. Luo, “Monocular relative depth perception with web stereo data supervision,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.