Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
12 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
37 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

SNE-RoadSegV2: Advancing Heterogeneous Feature Fusion and Fallibility Awareness for Freespace Detection (2402.18918v2)

Published 29 Feb 2024 in cs.CV

Abstract: Feature-fusion networks with duplex encoders have proven to be an effective technique to solve the freespace detection problem. However, despite the compelling results achieved by previous research efforts, the exploration of adequate and discriminative heterogeneous feature fusion, as well as the development of fallibility-aware loss functions remains relatively scarce. This paper makes several significant contributions to address these limitations: (1) It presents a novel heterogeneous feature fusion block, comprising a holistic attention module, a heterogeneous feature contrast descriptor, and an affinity-weighted feature recalibrator, enabling a more in-depth exploitation of the inherent characteristics of the extracted features, (2) it incorporates both inter-scale and intra-scale skip connections into the decoder architecture while eliminating redundant ones, leading to both improved accuracy and computational efficiency, and (3) it introduces two fallibility-aware loss functions that separately focus on semantic-transition and depth-inconsistent regions, collectively contributing to greater supervision during model training. Our proposed heterogeneous feature fusion network (SNE-RoadSegV2), which incorporates all these innovative components, demonstrates superior performance in comparison to all other freespace detection algorithms across multiple public datasets. Notably, it ranks the 1st on the official KITTI Road benchmark.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. R. Fan et al., “SNE-RoadSeg: Incorporating surface normal information into semantic segmentation for accurate freespace detection,” in Proceedings of the European Conference on Computer Vision (ECCV).   Springer, 2020, pp. 340–356.
  2. Z. Chen et al., “Progressive LiDAR adaptation for road detection,” IEEE/CAA Journal of Automatica Sinica, vol. 6, no. 3, pp. 693–702, 2019.
  3. C. Hazirbas et al., “FuseNet: Incorporating depth into semantic segmentation via fusion-based cnn architecture,” in Proceedings of the Asian Conference on Computer Vision (ACCV).   Springer, 2017, pp. 213–228.
  4. Q. Ha et al., “MFNet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2017, pp. 5108–5115.
  5. X. Zhou et al., “FANet: Feature aggregation network for rgbd saliency detection,” Signal Processing: Image Communication, vol. 102, p. 116591, 2022.
  6. H. Zhou et al., “CANet: Co-attention network for RGB-D semantic segmentation,” Pattern Recognition, vol. 124, p. 108468, 2022.
  7. H. Wang et al., “SNE-RoadSeg+: Rethinking depth-normal translation and deep supervision for freespace detection,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2021, pp. 1140–1145.
  8. W. Zou et al., “Dual geometric perception for cross-domain road segmentation,” Displays, vol. 76, p. 102332, 2023.
  9. H. Huang et al., “UNet 3+: A full-scale connected unet for medical image segmentation,” in IEEE international Conference on Acoustics, Speech and Signal Processing (ICASSP).   IEEE, 2020, pp. 1055–1059.
  10. Z. Liu et al., “Swin Transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2021, pp. 10 012–10 022.
  11. A. Geiger et al., “Are we ready for autonomous driving? the KITTI vision benchmark suite,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).   IEEE, 2012, pp. 3354–3361.
  12. J. Long et al., “Fully convolutional networks for semantic segmentation,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3431–3440.
  13. O. Ronneberger et al., “U-Net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention (MICCAI).   Springer, 2015, pp. 234–241.
  14. V. Badrinarayanan et al., “SegNet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 12, pp. 2481–2495, 2017.
  15. L.-C. Chen et al., “DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 834–848, 2017.
  16. J. Li et al., “RoadFormer: Duplex transformer for rgb-normal semantic road scene parsing,” CoRR, 2023.
  17. J. M. Alvarez et al., “Road scene segmentation from a single image,” in Proceedings of the European Conference on Computer Vision (ECCV).   Springer, 2012, pp. 376–389.
  18. L. Xiao et al., “Monocular road detection using structured random forest,” International Journal of Advanced Robotic Systems, vol. 13, no. 3, p. 101, 2016.
  19. C.-A. Brust et al., “Convolutional patch networks with spatial prior for road detection and urban scene understanding,” in International Conference on Computer Vision Theory and Applications (VISAPP), 2015.
  20. D. Levi et al., “StixelNet: A deep convolutional network for obstacle detection and road segmentation.” in Proceedings of the British Machine Vision Conference (BMVC), vol. 1, no. 2, 2015, p. 4.
  21. A. A. Khan et al., “LRDNet: Lightweight LiDAR aided cascaded feature pools for free road space detection,” IEEE Transactions on Multimedia, pp. 1–13, 2022.
  22. S. Gu et al., “A cascaded LiDAR-Camera fusion network for road detection,” in 2021 IEEE international conference on robotics and automation (ICRA).   IEEE, 2021, pp. 13 308–13 314.
  23. Y. Chang et al., “Fast road segmentation via uncertainty-aware symmetric network,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 11 124–11 130.
  24. J.-Y. Sun et al., “Reverse and boundary attention network for road segmentation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2019, pp. 0–0.
  25. J. Wei et al., “F33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPTNet: fusion, feedback and focus for salient object detection,” in Proceedings of the AAAI conference on Artificial Intelligence (AAAI), vol. 34, no. 07, 2020, pp. 12 321–12 328.
  26. X. Chen et al., “Bi-directional cross-modality feature propagation with separation-and-aggregation gate for RGB-D semantic segmentation,” in Proceedings of the European Conference on Computer Vision (ECCV).   Springer, 2020, pp. 561–577.
  27. Y. Pang et al., “Hierarchical dynamic filtering network for RGB-D salient object detection,” in Proceedings of the European Conference on Computer Vision (ECCV).   Springer, 2020, pp. 235–252.
  28. S. Qiu et al., “Semantic segmentation for real point cloud scenes via bilateral augmentation and adaptive fusion,” in 2021 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 1757–1767.
  29. L. Sun et al., “Real-time fusion network for RGB-D semantic segmentation incorporating unexpected obstacle detection for road-driving images,” IEEE Robotics and Automation Letters, vol. 5, no. 4, pp. 5558–5565, 2020.
  30. L. Caltagirone et al., “LiDAR–camera fusion for road detection using fully convolutional neural networks,” Robotics and Autonomous Systems, vol. 111, pp. 125–131, 2019.
  31. J. Hu et al., “Squeeze-and-excitation networks,” in 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7132–7141.
  32. K. Xu et al., “Show, attend and tell: Neural image caption generation with visual attention,” in International Conference on Machine Learning (ICML).   PMLR, 2015, pp. 2048–2057.
  33. A. Vaswani et al., “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  34. S. Woo et al., “Cbam: Convolutional block attention module,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 3–19.
  35. Z. Zhou et al., “UNet++: Redesigning skip connections to exploit multiscale features in image segmentation,” IEEE Transactions on Medical Imaging, vol. 39, no. 6, pp. 1856–1867, 2019.
  36. F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1251–1258.
  37. J. Fritsch et al., “A new performance measure and evaluation benchmark for road detection algorithms,” in 16th International IEEE Conference on Intelligent Transportation Systems (ITSC).   IEEE, 2013, pp. 1693–1700.
  38. R. Fan et al., “Learning collision-free space detection from stereo images: Homography matrix brings better data augmentation,” IEEE/ASME Transactions on Mechatronics, vol. 27, no. 1, pp. 225–233, 2021.
  39. H. Wang et al., “Dynamic fusion module evolves drivable area and road anomaly detection: A benchmark and algorithms,” IEEE Transactions on Cybernetics, vol. 52, no. 10, pp. 10 750–10 760, 2021.
  40. L. Sun et al., “Pseudo-LiDAR-based road detection,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 8, pp. 5386–5398, 2022.
  41. B. Yu et al., “Free space detection using camera-LiDAR fusion in a bird’s eye view plane,” Sensors, vol. 21, no. 22, p. 7623, 2021.
  42. H. Wang et al., “Applying surface normal information in drivable area and road anomaly detection for ground mobile robots,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2020, pp. 2706–2711.
  43. M. Cordts et al., “The Cityscapes dataset for semantic urban scene understanding,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3213–3223.
  44. Y. Cabon et al., “Virtual KITTI 2,” CoRR, 2020.
  45. H. Alhaija et al., “Augmented reality meets computer vision: Efficient data generation for urban driving scenes,” International Journal of Computer Vision, p. 961–972, 2018.
  46. L. Lipson et al., “RAFT-Stereo: Multilevel recurrent field transforms for stereo matching,” in 2021 International Conference on 3D Vision (3DV).   IEEE, 2021, pp. 218–227.
  47. J. Li et al., “Practical stereo matching via cascaded recurrent network with adaptive correlation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 16 263–16 272.
  48. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, 2014.
  49. C. Min et al., “ORFD: A dataset and benchmark for off-road freespace detection,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 2532–2538.
  50. M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International conference on machine learning.   PMLR, 2019, pp. 6105–6114.
  51. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  52. Z. Huang et al., “CCNet: Criss-cross attention for semantic segmentation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
  53. X. Li et al., “Expectation-maximization attention networks for semantic segmentation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
  54. Y. Yuan et al., “Object-contextual representations for semantic segmentation,” in Proceedings of the European Conference on Computer Vision (ECCV).   Springer, 2020, pp. 173–190.
Citations (5)

Summary

We haven't generated a summary for this paper yet.