Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Monocular Road Planar Parallax Estimation (2111.11089v2)

Published 22 Nov 2021 in cs.CV and cs.AI

Abstract: Estimating the 3D structure of the drivable surface and surrounding environment is a crucial task for assisted and autonomous driving. It is commonly solved either by using 3D sensors such as LiDAR or directly predicting the depth of points via deep learning. However, the former is expensive, and the latter lacks the use of geometry information for the scene. In this paper, instead of following existing methodologies, we propose Road Planar Parallax Attention Network (RPANet), a new deep neural network for 3D sensing from monocular image sequences based on planar parallax, which takes full advantage of the omnipresent road plane geometry in driving scenes. RPANet takes a pair of images aligned by the homography of the road plane as input and outputs a $\gamma$ map (the ratio of height to depth) for 3D reconstruction. The $\gamma$ map has the potential to construct a two-dimensional transformation between two consecutive frames. It implies planar parallax and can be combined with the road plane serving as a reference to estimate the 3D structure by warping the consecutive frames. Furthermore, we introduce a novel cross-attention module to make the network better perceive the displacements caused by planar parallax. To verify the effectiveness of our method, we sample data from the Waymo Open Dataset and construct annotations related to planar parallax. Comprehensive experiments are conducted on the sampled dataset to demonstrate the 3D reconstruction accuracy of our approach in challenging scenarios.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. T. Asai, K. Yamaguchi, Y. Kojima, T. Naito, and Y. Ninomiya, “3D line reconstruction of a road environment using an in-vehicle camera,” in Proc. ISVC, 2008, pp. 897–904.
  2. D. Chen and X. He, “Fast automatic three-dimensional road model reconstruction based on mobile laser scanning system,” Optik, vol. 126, no. 7-8, pp. 725–730, 2015.
  3. H. Gao, L. Liu, Y. Tian, and S. Lu, “3d reconstruction for road scene with obstacle detection feedback,” IJPRAI, p. 1855021, 2018.
  4. C. Badue, R. Guidolini, R. V. Carneiro, P. Azevedo, V. B. Cardoso, A. Forechi, L. Jesus, R. Berriel, T. M. Paixao, F. Mutz et al., “Self-driving cars: A survey,” ESWA, p. 113816, 2020.
  5. B. Schoettle and M. Sivak, “A survey of public opinion about autonomous and self-driving vehicles in the us, the uk, and australia,” University of Michigan, Ann Arbor, Transportation Research Institute, Tech. Rep., 2014.
  6. B. Lu, B. Tam, and N. Kottege, “Autonomous obstacle legipulation with a hexapod robot,” arXiv preprint arXiv:2011.06227, 2020.
  7. H. Xu and J. Zhang, “Aanet: Adaptive aggregation network for efficient stereo matching,” in Proc. CVPR, 2020, pp. 1959–1968.
  8. F. Zhang, V. Prisacariu, R. Yang, and P. H. Torr, “Ga-net: Guided aggregation net for end-to-end stereo matching,” in Proc. CVPR, 2019, pp. 185–194.
  9. H. Fujita, M. Itagaki, K. Ichikawa, Y. K. Hooi, K. Kawano, and R. Yamamoto, “Fine-tuned pre-trained mask r-cnn models for surface object detection,” arXiv preprint arXiv:2010.11464, 2020.
  10. J. Zhang and S. Singh, “Loam: Lidar odometry and mapping in real-time,” in Proc. RSS, 2014, pp. 1–9.
  11. J. L. Schonberger and J.-M. Frahm, “Structure-from-motion revisited,” in Proc. CVPR, 2016, pp. 4104–4113.
  12. C. Godard, O. Mac Aodha, M. Firman, and G. J. Brostow, “Digging into self-supervised monocular depth estimation,” in Proc. ICCV, 2019, pp. 3828–3838.
  13. Z. Yin and J. Shi, “Geonet: Unsupervised learning of dense depth, optical flow and camera pose,” in Proc. CVPR, 2018, pp. 1983–1992.
  14. I. Tishchenko, S. Lombardi, M. R. Oswald, and M. Pollefeys, “Self-supervised learning of non-rigid residual flow and ego-motion,” in Proc. 3DV, 2020, pp. 150–159.
  15. D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale deep network,” in Proc. NIPS, 2014, pp. 2366–2374.
  16. T. Zhou, M. Brown, N. Snavely, and D. G. Lowe, “Unsupervised learning of depth and ego-motion from video,” in Proc. CVPR, 2017, pp. 1851–1858.
  17. M. Irani and P. Anandan, “Parallax geometry of pairs of points for 3d scene analysis,” in Proc. ECCV, 1996, pp. 17–30.
  18. M. Irani, B. Rousso, and S. Peleg, “Recovery of ego-motion using region alignment,” IEEE TPAMI, vol. 19, no. 3, pp. 268–272, 1997.
  19. R. Kumar, P. Anandan, and K. Hanna, “Direct recovery of shape from multiple views: A parallax based approach,” in Proc. ICPR, 1994, pp. 685–688.
  20. A. Shashua and N. Navab, “Relative affine structure: Theory and application to 3d reconstruction from perspective views,” in Proc. CVPR, 1994, pp. 483–489.
  21. M. Irani, P. Anandan, and M. Cohen, “Direct recovery of planar-parallax from multiple frames,” IEEE TPAMI, vol. 24, no. 11, pp. 1528–1534, 2002.
  22. P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine, V. Vasudevan, W. Han, J. Ngiam, H. Zhao, A. Timofeev, S. Ettinger, M. Krivokon, A. Gao, A. Joshi, Y. Zhang, J. Shlens, Z. Chen, and D. Anguelov, “Scalability in perception for autonomous driving: Waymo open dataset,” in Proc. CVPR, 2020, pp. 2446–2454.
  23. H. S. Sawhney, “3d geometry from planar parallax,” in Proc. CVPR, 1994, pp. 929–934.
  24. K. Chaney, A. Z. Zhu, and K. Daniilidis, “Learning event-based height from plane and parallax,” in Proc. IROS, 2019, p. 3690–3696.
  25. H. Xing, Y. Cao, M. Biber, M. Zhou, and D. Burschka, “Joint prediction of monocular depth and structure using planar and parallax geometry,” PR, vol. 130, p. 108806, 2022.
  26. H. Fan, H. Su, and L. J. Guibas, “A point set generation network for 3d object reconstruction from a single image,” in Proc. CVPR, 2017, pp. 605–613.
  27. C. B. Choy, D. Xu, J. Gwak, K. Chen, and S. Savarese, “3d-r2n2: A unified approach for single and multi-view 3d object reconstruction,” in Proc. ECCV, 2016, pp. 628–644.
  28. N. Wang, Y. Zhang, Z. Li, Y. Fu, W. Liu, and Y.-G. Jiang, “Pixel2mesh: Generating 3d mesh models from single rgb images,” in Proc. ECCV, 2018, pp. 52–67.
  29. S. Saito, Z. Huang, R. Natsume, S. Morishima, A. Kanazawa, and H. Li, “Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization,” in Proc. ICCV, 2019, pp. 2304–2314.
  30. R. Garg, V. K. Bg, G. Carneiro, and I. Reid, “Unsupervised cnn for single view depth estimation: Geometry to the rescue,” in Proc. ECCV, 2016, pp. 740–756.
  31. C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised monocular depth estimation with left-right consistency,” in Proc. CVPR, 2017, pp. 270–279.
  32. J. Zbontar and Y. LeCun, “Computing the stereo matching cost with a convolutional neural network,” in Proc. CVPR, 2015, pp. 1592–1599.
  33. A. Kendall, H. Martirosyan, S. Dasgupta, P. Henry, R. Kennedy, A. Bachrach, and A. Bry, “End-to-end learning of geometry and context for deep stereo regression,” in Proc. ICCV, 2017, pp. 66–75.
  34. J.-R. Chang and Y.-S. Chen, “Pyramid stereo matching network,” in Proc. CVPR, 2018, pp. 5410–5418.
  35. X. Guo, K. Yang, W. Yang, X. Wang, and H. Li, “Group-wise correlation stereo network,” in Proc. CVPR, 2019, pp. 3273–3282.
  36. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Proc. NIPS, 2017, pp. 5998–6008.
  37. K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao, C. Xu, Y. Xu et al., “A survey on visual transformer,” IEEE TPMAI, vol. 45, no. 1, pp. 87–110, 2023.
  38. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in Proc. ICLR, 2021, pp. 1–9.
  39. N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in Proc. ECCV, 2020, pp. 213–229.
  40. X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable detr: Deformable transformers for end-to-end object detection,” in Proc. ICLR, 2021, pp. 1–9.
  41. N. Parmar, A. Vaswani, J. Uszkoreit, L. Kaiser, N. Shazeer, A. Ku, and D. Tran, “Image transformer,” in Proc. ICML, 2018, pp. 4055–4064.
  42. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proc. ICCV, 2021, pp. 10 012–10 022.
  43. J. Ho, N. Kalchbrenner, D. Weissenborn, and T. Salimans, “Axial attention in multidimensional transformers,” arXiv preprint arXiv:1912.12180, 2019.
  44. L. Wang, Y. Guo, Y. Wang, Z. Liang, Z. Lin, J. Yang, and W. An, “Parallax attention for unsupervised stereo correspondence learning,” IEEE TPAMI, vol. 44, no. 4, pp. 2108–2125, 2020.
  45. Z. Li, X. Liu, N. Drenkow, A. Ding, F. X. Creighton, R. H. Taylor, and M. Unberath, “Revisiting stereo depth estimation from a sequence-to-sequence perspective with transformers,” in Proc. ICCV, 2021, pp. 6197–6206.
  46. V. Guizilini, R. Ambrus, S. Pillai, A. Raventos, and A. Gaidon, “3D packing for self-supervised monocular depth estimation,” in Proc. CVPR, 2020, pp. 2485–2494.
  47. X. Chen and K. He, “Exploring simple siamese representation learning,” in Proc. CVPR, 2021, pp. 15 750–15 758.
  48. L. Wang, Y. Wang, Z. Liang, Z. Lin, J. Yang, W. An, and Y. Guo, “Learning parallax attention for stereo image super-resolution,” in Proc. CVPR, 2019, pp. 12 250–12 259.
  49. F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” Proc. ICLR, pp. 1–9, 2016.
  50. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE TIP, vol. 13, no. 4, pp. 600–612, 2004.
  51. J.-W. Bian, Z. Li, N. Wang, H. Zhan, C. Shen, M.-M. Cheng, and I. Reid, “Unsupervised scale-consistent depth and ego-motion learning from monocular video,” in Proc. NeurIPS, 2019, pp. 35–45.
  52. S. Meister, J. Hur, and S. Roth, “Unflow: Unsupervised learning of optical flow with a bidirectional census loss,” in Proc. AAAI, 2018, pp. 7251–7259.
  53. R. Jonschkowski, A. Stone, J. T. Barron, A. Gordon, K. Konolige, and A. Angelova, “What matters in unsupervised optical flow,” in Proc. ECCV, 2020, pp. 557–572.
  54. M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381–395, 1981.
  55. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” in Proc. NeurIPS, 2019, pp. 8026–8037.
  56. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. ICLR, 2015, pp. 1–9.
  57. R. Ranftl, A. Bochkovskiy, and V. Koltun, “Vision transformers for dense prediction,” in Proc. CVPR, 2021, pp. 12 179–12 188.
  58. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. CVPR, 2016, pp. 770–778.
Citations (2)

Summary

We haven't generated a summary for this paper yet.