Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DaF-BEVSeg: Distortion-aware Fisheye Camera based Bird's Eye View Segmentation with Occlusion Reasoning (2404.06352v1)

Published 9 Apr 2024 in cs.CV and cs.RO

Abstract: Semantic segmentation is an effective way to perform scene understanding. Recently, segmentation in 3D Bird's Eye View (BEV) space has become popular as its directly used by drive policy. However, there is limited work on BEV segmentation for surround-view fisheye cameras, commonly used in commercial vehicles. As this task has no real-world public dataset and existing synthetic datasets do not handle amodal regions due to occlusion, we create a synthetic dataset using the Cognata simulator comprising diverse road types, weather, and lighting conditions. We generalize the BEV segmentation to work with any camera model; this is useful for mixing diverse cameras. We implement a baseline by applying cylindrical rectification on the fisheye images and using a standard LSS-based BEV segmentation model. We demonstrate that we can achieve better performance without undistortion, which has the adverse effects of increased runtime due to pre-processing, reduced field-of-view, and resampling artifacts. Further, we introduce a distortion-aware learnable BEV pooling strategy that is more effective for the fisheye cameras. We extend the model with an occlusion reasoning module, which is critical for estimating in BEV space. Qualitative performance of DaF-BEVSeg is showcased in the video at https://streamable.com/ge4v51.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. J. Huang, G. Huang, Z. Zhu, Y. Yun et al., “Bevdet: High-performance multi-camera 3d object detection in bird-eye-view,” arXiv preprint arXiv:2112.11790, 2021.
  2. H. Rashed, M. Essam, M. Mohamed, A. E. Sallab et al., “Bev-modnet: Monocular camera based bird’s eye view moving object detection for autonomous driving,” in 2021 IEEE International Intelligent Transportation Systems Conference (ITSC).   IEEE, 2021, pp. 1503–1508.
  3. H. Liu, Y. Teng, T. Lu, H. Wang et al., “Sparsebev: High-performance sparse 3d object detection from multi-camera videos,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 18 580–18 590.
  4. Z. Li, W. Wang, H. Li, E. Xie et al., “Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers,” ArXiv, vol. abs/2203.17270, 2022.
  5. J. Philion and S. Fidler, “Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d,” in European Conference on Computer Vision, 2020.
  6. J. Zhang, Y. Zhang, Q. Liu, and Y. Wang, “Sa-bev: Generating semantic-aware bird’s-eye-view feature for multi-view 3d object detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023, pp. 3348–3357.
  7. M. Klingner, S. Borse, V. R. Kumar, B. Rezaei et al., “X3kd: Knowledge distillation across modalities, tasks and stages for multi-camera 3d object detection,” 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13 343–13 353, 2023.
  8. S. Borse, M. Klingner, V. R. Kumar, H. Cai et al., “X-align++: cross-modal cross-view alignment for bird’s-eye-view segmentation,” Machine Vision and Applications, vol. 34, pp. 1–16, 2022.
  9. M. Li, Y. Zhang, X. Ma, Y. Qu et al., “Bev-dg: Cross-modal learning under bird’s-eye view for domain generalization of 3d semantic segmentation,” 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 11 598–11 608, 2023.
  10. L. Peng, Z. Chen, Z.-H. Fu, P. Liang et al., “Bevsegformer: Bird’s eye view semantic segmentation from arbitrary camera rigs,” 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 5924–5932, 2022.
  11. C. Pan, Y. He, J. Peng, Q. Zhang et al., “Baeformer: Bi-directional and early interaction transformers for bird’s eye view semantic segmentation,” 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9590–9599, 2023.
  12. Z. Liu, H. Tang, A. Amini, X. Yang et al., “Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation,” 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 2774–2781, 2022.
  13. B. Huang, Y. Li, E. Xie, F. Liang et al., “Fast-bev: Towards real-time on-vehicle bird’s-eye view perception,” ArXiv, vol. abs/2301.07870, 2023.
  14. E. Xie, Z. Yu, D. Zhou, J. Philion et al., “M2bev: Multi-camera joint 3d detection and segmentation with unified birds-eye view representation,” ArXiv, vol. abs/2204.05088, 2022.
  15. H. Wang, H. Tang, S. Shi, A. Li et al., “Unitr: A unified and efficient multi-modal transformer for bird’s-eye-view representation,” 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6769–6779, 2023.
  16. T. Pham, M. Maghoumi, W. Jiang, B. S. S. Jujjavarapu, M. Sajjadi, X. Liu, H.-C. Lin, B.-J. Chen, G. Truong, C. Fang et al., “Nvautonet: Fast and accurate 360deg 3d visual perception for self driving,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 7376–7385.
  17. W. K. Fong, R. Mohan, J. V. Hurtado, L. Zhou et al., “Panoptic nuscenes: A large-scale benchmark for lidar panoptic segmentation and tracking,” arXiv preprint arXiv:2109.03805, 2021.
  18. P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard et al., “Scalability in perception for autonomous driving: Waymo open dataset,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
  19. V. R. Kumar, C. Eising, C. Witt, and S. Yogamani, “Surround-view fisheye camera perception for automated driving: Overview, survey & challenges,” IEEE Transactions on Intelligent Transportation Systems, 2023.
  20. M. Pöpperli, R. Gulagundi, S. Yogamani, and S. Milz, “Capsule neural network based height classification using low-cost automotive ultrasonic sensors,” in 2019 IEEE Intelligent Vehicles Symposium (IV).   IEEE, 2019, pp. 661–666.
  21. H. Rashed, E. Mohamed, G. Sistu, V. R. Kumar et al., “Fisheyeyolo: Object detection on fisheye cameras for autonomous driving,” in Proceedings of the Machine Learning for Autonomous Driving NeurIPS 2020 Virtual Workshop, Virtual, vol. 11, 2020.
  22. L. Yahiaoui, J. Horgan, B. Deegan, S. Yogamani et al., “Overview and empirical analysis of isp parameter tuning for visual perception in autonomous driving,” Journal of Imaging, vol. 5, no. 10, p. 78, 2019.
  23. V. R. Kumar, M. Klingner, S. Yogamani, M. Bach et al., “Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 8, pp. 10 252–10 261, 2021.
  24. V. R. Kumar, S. Milz, C. Witt, M. Simon et al., “Near-field depth estimation using monocular fisheye camera: A semi-supervised learning approach using sparse lidar data,” in CVPR Workshop, vol. 7, 2018, p. 2.
  25. N. Tripathi and S. Yogamani, “Trained trajectory based automated parking system using Visual SLAM,” in Proceedings of the Computer Vision and Pattern Recognition Conference Workshops, 2021.
  26. A. Konrad, C. Eising, G. Sistu et al., “FisheyeSuperPoint: Keypoint Detection and Description Network for Fisheye Images,” Proceedings of the International Conference on Computer Vision Theory and Applications, vol. abs/2103.00191, 2021.
  27. S. Shen, L. Kerofsky, and S. Yogamani, “Optical flow for autonomous driving: Applications, challenges and improvements,” Electronic Imaging, 2024.
  28. E. Mohamed, M. Ewaisha, M. Siam, H. Rashed et al., “Monocular instance motion segmentation for autonomous driving: Kitti instancemotseg dataset and multi-task baseline,” in 2021 IEEE Intelligent Vehicles Symposium (IV).   IEEE, 2021, pp. 114–121.
  29. H. Rashed, A. El Sallab, S. Yogamani, and M. ElHelw, “Motion and depth augmented semantic segmentation for autonomous navigation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 0–0.
  30. S. Ramachandran, G. Sistu, J. McDonald, and S. Yogamani, “Woodscape fisheye semantic segmentation for autonomous driving–cvpr 2021 omnicv workshop challenge,” arXiv preprint arXiv:2107.08246, 2021.
  31. H. Rashed, S. Yogamani, A. El-Sallab, P. Krizek, and M. El-Helw, “Optical flow augmented semantic segmentation networks for automated driving,” in Proceedings of the International Conference on Computer Vision Theory and Applications, 2019.
  32. M. Uricár, J. Ulicny, G. Sistu, H. Rashed et al., “Desoiling dataset: Restoring soiled areas on automotive fisheye cameras,” in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2019, pp. 0–0.
  33. M. M. Dhananjaya, V. R. Kumar, and S. Yogamani, “Weather and light level classification for autonomous driving: Dataset, baseline and active learning,” in 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), 2021, pp. 2816–2821.
  34. G. Sistu, I. Leang, and S. Yogamani, “Real-time joint object detection and semantic segmentation network for automated driving,” arXiv preprint arXiv:1901.03912, 2019.
  35. G. Sistu, I. Leang, S. Chennupati, S. Yogamani et al., “Neurall: Towards a unified visual perception model for automated driving,” in 2019 IEEE Intelligent Transportation Systems Conference (ITSC).   IEEE, 2019, pp. 796–803.
  36. I. Leang, G. Sistu, F. Bürger, A. Bursuc et al., “Dynamic task weighting methods for multi-task networks in autonomous driving systems,” in 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC).   IEEE, 2020, pp. 1–8.
  37. V. R. Kumar, S. Yogamani, H. Rashed, G. Sitsu et al., “Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 2830–2837, 2021.
  38. C. Eising, J. Horgan, and S. Yogamani, “Near-field perception for low-speed vehicle automation using surround-view fisheye cameras,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 9, pp. 13 976–13 993, 2021.
  39. M. Uricár, D. Hurych, P. Krizek, and S. Yogamani, “Challenges in designing datasets and validation for autonomous driving,” Proceedings of the International Conference on Computer Vision Theory and Applications, 2019.
  40. S. Mohapatra, S. Yogamani, H. Gotzig, S. Milz et al., “Bevdetnet: bird’s eye view lidar point cloud based real-time 3d object detection for autonomous driving,” in 2021 IEEE International Intelligent Transportation Systems Conference (ITSC).   IEEE, 2021, pp. 2809–2815.
  41. C. Lu, M. J. G. Van De Molengraft, and G. Dubbelman, “Monocular semantic occupancy grid mapping with convolutional variational encoder–decoder networks,” IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 445–452, 2019.
  42. B. Pan, J. Sun, H. Y. T. Leung, A. Andonian et al., “Cross-view semantic segmentation for sensing surroundings,” IEEE Robotics and Automation Letters, vol. 5, no. 3, pp. 4867–4873, 2020.
  43. T. Roddick and R. Cipolla, “Predicting semantic map representations from images using pyramid occupancy networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 138–11 147.
  44. A. Saha, O. Mendez, C. Russell, and R. Bowden, “Translating images into maps,” in 2022 International conference on robotics and automation (ICRA).   IEEE, 2022, pp. 9200–9206.
  45. B. Zhou and P. Krähenbühl, “Cross-view transformers for real-time map-view semantic segmentation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 13 760–13 769.
  46. H. A. Mallot, H. H. Bülthoff, J. Little, and S. Bohrer, “Inverse perspective mapping simplifies optical flow computation and obstacle detection,” Biological cybernetics, vol. 64, no. 3, pp. 177–185, 1991.
  47. Y. Kim and D. Kum, “Deep learning based vehicle position and orientation estimation via inverse perspective mapping image,” in 2019 IEEE Intelligent Vehicles Symposium (IV).   IEEE, 2019, pp. 317–323.
  48. X. Zhu, Z. Yin, J. Shi, H. Li et al., “Generative adversarial frontal view to bird view synthesis,” in 2018 International conference on 3D Vision (3DV).   IEEE, 2018, pp. 454–463.
  49. L. Reiher, B. Lampe, and L. Eckstein, “A sim2real deep learning approach for the transformation of images from multiple vehicle-mounted cameras to a semantically segmented image in bird’s eye view,” in 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC).   IEEE, 2020, pp. 1–7.
  50. E. U. Samani, F. Tao, H. R. Dasari, S. Ding et al., “F2bev: Bird’s eye view generation from surround-view fisheye camera images for automated driving,” in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2023, pp. 9367–9374.
  51. C. Ramchandra, S. Ganesh, E. Ciarán, R. K. Varun et al., “Fisheyepixpro: Self-supervised pretraining using fisheye images for semantic segmentation,” Electronic Imaging, vol. 34, pp. 1–6, 2022.
  52. L. Deng, M. Yang, Y. Qian, C. Wang et al., “Cnn based semantic segmentation for urban traffic scenes using fisheye camera,” in 2017 IEEE Intelligent Vehicles Symposium (IV).   IEEE, 2017, pp. 231–236.
  53. L. Deng, M. Yang, H. Li, T. Li et al., “Restricted deformable convolution-based road scene semantic segmentation using surround view cameras,” IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 10, pp. 4350–4362, 2019.
  54. C. Häne, L. Heng, G. H. Lee, A. Sizov et al., “Real-time direct dense matching on fisheye images using plane-sweeping stereo,” in 2014 2nd International Conference on 3D Vision, vol. 1.   IEEE, 2014, pp. 57–64.
  55. C. Won, J. Ryu, and J. Lim, “Omnimvs: End-to-end learning for omnidirectional stereo matching,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 8987–8996.
  56. V. R. Kumar, S. A. Hiremath, M. Bach, S. Milz et al., “Fisheyedistancenet: Self-supervised scale-aware distance estimation using monocular fisheye camera for autonomous driving,” in 2020 IEEE International Conference on Robotics and Automation (ICRA), 2020, pp. 574–581.
  57. J. Kannala and S. Brandt, “A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 8, pp. 1335–1340, 2006.
  58. V. R. Kumar, S. Yogamani, M. Bach et al., “Unrectdepthnet: Self-supervised monocular depth estimation using a generic framework for handling common camera distortion models,” Conference on Intelligent Robots & Systems (IROS), 2020.
  59. H. Caesar, V. Bankiti, A. H. Lang, S. Vora et al., “nuScenes: A Multimodal Dataset for Autonomous Driving,” in Proc. of CVPR, 2020, pp. 11 621–11 631.
  60. Z. Zhang and M. Sabuncu, “Generalized cross entropy loss for training deep neural networks with noisy labels,” Advances in neural information processing systems, vol. 31, 2018.
  61. A. R. Sekkat, Y. Dupuis, V. R. Kumar, H. Rashed et al., “Synwoodscape: Synthetic surround-view fisheye camera dataset for autonomous driving,” IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 8502–8509, 2022.
  62. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  63. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems, vol. 32, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Senthil Yogamani (81 papers)
  2. David Unger (3 papers)
  3. Venkatraman Narayanan (8 papers)
  4. Varun Ravi Kumar (26 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com