Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Vanishing-Point-Guided Video Semantic Segmentation of Driving Scenes (2401.15261v2)

Published 27 Jan 2024 in cs.CV

Abstract: The estimation of implicit cross-frame correspondences and the high computational cost have long been major challenges in video semantic segmentation (VSS) for driving scenes. Prior works utilize keyframes, feature propagation, or cross-frame attention to address these issues. By contrast, we are the first to harness vanishing point (VP) priors for more effective segmentation. Intuitively, objects near VPs (i.e., away from the vehicle) are less discernible. Moreover, they tend to move radially away from the VP over time in the usual case of a forward-facing camera, a straight road, and linear forward motion of the vehicle. Our novel, efficient network for VSS, named VPSeg, incorporates two modules that utilize exactly this pair of static and dynamic VP priors: sparse-to-dense feature mining (DenseVP) and VP-guided motion fusion (MotionVP). MotionVP employs VP-guided motion estimation to establish explicit correspondences across frames and help attend to the most relevant features from neighboring frames, while DenseVP enhances weak dynamic features in distant regions around VPs. These modules operate within a context-detail framework, which separates contextual features from high-resolution local features at different input resolutions to reduce computational costs. Contextual and local features are integrated through contextualized motion attention (CMA) for the final prediction. Extensive experiments on two popular driving segmentation benchmarks, Cityscapes and ACDC, demonstrate that VPSeg outperforms previous SOTA methods, with only modest computational overhead.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. Unsupervised vanishing point detection and camera calibration from a single manhattan image with radial distortion. In CVPR, 2017.
  2. John Canny. A computational approach to edge detection. IEEE TPAMI, PAMI-8(6):679–698, 1986.
  3. Deepvp: Deep learning for vanishing point detection on one million street view images. In IEEE ICRA, 2018.
  4. Semantic image segmentation with deep convolutional nets and fully connected CRFs. In ICLR, 2015.
  5. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587, 2017.
  6. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE TPAMI, 40(4):834–848, 2018a.
  7. Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV, 2018b.
  8. Towards scene understanding: unsupervised monocular depth estimation with semantic-aware representation. In CVPR, 2019.
  9. The cityscapes dataset for semantic urban scene understanding. In CVPR, 2016.
  10. Context contrasted feature and gated multi-scale aggregation for scene segmentation. In CVPR, 2018.
  11. Use of the hough transformation to detect lines and curves in pictures. CACM, 15(1):11–15, 1972.
  12. Advances in deep concealed scene understanding. VI, 1(1):16, 2023.
  13. Accurate and robust line segment extraction by analyzing distribution around peaks in hough space. CVIU, 92(1):1–25, 2003.
  14. Adaptive pyramid context network for semantic segmentation. In CVPR, 2019.
  15. Daformer: improving network architectures and training strategies for domain-adaptive semantic segmentation. In CVPR, 2022.
  16. Temporally distributed networks for fast video semantic segmentation. In CVPR, 2020.
  17. CCNet: Criss-cross attention for semantic segmentation. In ICCV, 2019.
  18. Accel: A corrective fusion network for efficient semantic segmentation on video. In CVPR, 2019.
  19. Mining contextual information beyond image for semantic segmentation. In ICCV, 2021a.
  20. ISNet: Integrate image-level and semantic-level context for semantic segmentation. In ICCV, 2021b.
  21. Efficient computation of vanishing points. In IEEE ICRA, 2002.
  22. Imagenet classification with deep convolutional neural networks. NeurIPS, 25, 2012.
  23. GSVNet: Guided spatially-varying convolution for fast semantic segmentation on video. In ICME, 2021.
  24. Video semantic segmentation via sparse temporal transformer. In ACM MM, 2021.
  25. Spatial pyramid based graph reasoning for semantic segmentation. In CVPR, 2020.
  26. Video k-net: A simple, strong, and unified baseline for video segmentation. In CVPR, 2022.
  27. Low-latency video semantic segmentation. In CVPR, 2018.
  28. Avoiding degeneracy for monocular visual slam with point and line features. In IEEE ICRA, 2021.
  29. Learning to predict context-adaptive convolution for semantic segmentation. In ECCV, 2020a.
  30. Efficient semantic video segmentation with per-frame inference. In ECCV, 2020b.
  31. Decoupled weight decay regularization. In ICLR, 2017.
  32. Contribution to the determination of vanishing points using hough transform. IEEE TPAMI, 16(4):430–438, 1994.
  33. Budget-aware deep semantic video segmentation. In CVPR, 2017.
  34. VSPW: A large-scale dataset for video scene parsing in the wild. In CVPR, 2021.
  35. Image segmentation using deep learning: a survey. IEEE TPAMI, 44(7):3523–3542, 2022.
  36. Fast vanishing-point detection in unstructured environments. IEEE TIP, 21(1):425–430, 2012.
  37. Local memory attention for fast video semantic segmentation. In IEEE IROS, 2021.
  38. A fast algorithm for morphological erosion and dilation. In EUSIPCO, 1996.
  39. Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer. IEEE TPAMI, 44(3), 2022.
  40. Imagenet large scale visual recognition challenge. IJCV, 115(3):211–252, 2015.
  41. Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In ICCV, 2021.
  42. Clockwork convnets for video semantic segmentation. In ECCV, 2016.
  43. Fast and robust vanishing point detection for unstructured road following. IEEE TITS, 17(4):970–979, 2016.
  44. Coarse-to-fine feature mining for video semantic segmentation. In CVPR, 2022a.
  45. Mining relations among cross-frame affinities for video semantic segmentation. In ECCV, 2022b.
  46. Robust vanishing point estimation for driver assistance. In IEEE ITSC, 2006.
  47. Layout and context understanding for image synthesis with scene graphs. In ICIP, 2019.
  48. Attention is all you need. In NeurIPS, 2017.
  49. Seaformer: Squeeze-enhanced axial transformer for mobile semantic segmentation. In ICLR, 2023.
  50. Temporal memory attention for video semantic segmentation. In ICIP, 2021.
  51. Unified perceptual parsing for scene understanding. In ECCV, 2018a.
  52. Unified perceptual parsing for scene understanding. In ECCV, 2018b.
  53. SegFormer: Simple and efficient design for semantic segmentation with transformers. In NeurIPS, 2021.
  54. Adaptive auxiliary input extraction based on vanishing point detection for distant object detection in high-resolution railway scene. In IEEE ICEMI, 2019.
  55. Dynamic video segmentation network. In CVPR, 2018.
  56. DenseASPP for semantic segmentation in street scenes. In CVPR, 2018.
  57. A robust lane detection method based on vanishing point estimation using the relevance of line segments. IEEE TITS, 18(12):3254–3266, 2017.
  58. Multi-scale context aggregation by dilated convolutions. In ICLR, 2016.
  59. Object-contextual representations for semantic segmentation. In ECCV, 2020.
  60. Context encoding for semantic segmentation. In CVPR, 2018.
  61. Pyramid scene parsing network. In CVPR, 2017.
  62. Joint semantic segmentation and boundary detection using iterative pyramid contexts. In CVPR, 2020.
  63. Context-reinforced semantic segmentation. In CVPR, 2019.
  64. Deep feature flow for video recognition. In CVPR, 2017.
  65. Asymmetric non-local neural networks for semantic segmentation. In ICCV, 2019.
  66. Real-time road detection with image texture analysis-based vanishing point estimation. In IEEE PIC, 2015.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com