Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
12 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
37 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

Rethinking Dilated Convolution for Real-time Semantic Segmentation (2111.09957v3)

Published 18 Nov 2021 in cs.CV and eess.IV

Abstract: The field-of-view is an important metric when designing a model for semantic segmentation. To obtain a large field-of-view, previous approaches generally choose to rapidly downsample the resolution, usually with average poolings or stride 2 convolutions. We take a different approach by using dilated convolutions with large dilation rates throughout the backbone, allowing the backbone to easily tune its field-of-view by adjusting its dilation rates, and show that it's competitive with existing approaches. To effectively use the dilated convolution, we show a simple upper bound on the dilation rate in order to not leave gaps in between the convolutional weights, and design an SE-ResNeXt inspired block structure that uses two parallel $3\times 3$ convolutions with different dilation rates to preserve the local details. Manually tuning the dilation rates for every block can be difficult, so we also introduce a differentiable neural architecture search method that uses gradient descent to optimize the dilation rates. In addition, we propose a lightweight decoder that restores local information better than common alternatives. To demonstrate the effectiveness of our approach, our model RegSeg achieves competitive results on real-time Cityscapes and CamVid datasets. Using a T4 GPU with mixed precision, RegSeg achieves 78.3 mIOU on Cityscapes test set at $37$ FPS, and 80.9 mIOU on CamVid test set at $112$ FPS, both without ImageNet pretraining.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Segmentation and recognition using structure from motion point clouds. In ECCV, pages 44–57. Springer, 2008.
  2. Deep spatio-temporal random fields for efficient video segmentation. In CVPR, pages 8915–8924, 2018.
  3. Hardnet: A low memory traffic network. In ICCV, pages 3552–3561, 2019.
  4. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587, 2017.
  5. Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV, 2018.
  6. Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic segmentation. In CVPR, 2020.
  7. The cityscapes dataset for semantic urban scene understanding. In CVPR, pages 3213–3223, 2016.
  8. Randaugment: Practical automated data augmentation with a reduced search space. In CVPR Workshops, pages 702–703, 2020.
  9. Deformable convolutional networks. In Proceedings of the IEEE international conference on computer vision, pages 764–773, 2017.
  10. Imagenet: A large-scale hierarchical image database. In CVPR, pages 248–255. Ieee, 2009.
  11. Fast and accurate model scaling. In CVPR, 2021.
  12. Rethinking bisenet for real-time semantic segmentation. In CVPR, pages 9716–9725, June 2021.
  13. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.
  14. Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
  15. Bag of tricks for image classification with convolutional neural networks. In CVPR, pages 558–567, 2019.
  16. Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes. arXiv preprint arXiv:2101.06085, 2021.
  17. Searching for mobilenetv3. In ICCV, 2019.
  18. Squeeze-and-excitation networks. In CVPR, pages 7132–7141, 2018.
  19. Temporally distributed networks for fast video semantic segmentation. In CVPR, pages 8818–8827, 2020.
  20. Ccnet: Criss-cross attention for semantic segmentation. TPAMI, 2020.
  21. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, pages 448–456. PMLR, 2015.
  22. Dfanet: Deep feature aggregation for real-time semantic segmentation. In CVPR, June 2019.
  23. Selective kernel networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 510–519, 2019.
  24. Semantic flow for fast and accurate scene parsing. In ECCV, pages 775–793. Springer, 2020.
  25. Graph-guided architecture search for real-time semantic segmentation. In CVPR, pages 4203–4212, 2020.
  26. Fully convolutional networks for semantic segmentation. In CVPR, pages 3431–3440, 2015.
  27. Hyperseg: Patch-wise hypernetwork for real-time semantic segmentation. In CVPR, pages 4061–4070, 2021.
  28. In defense of pre-trained imagenet architectures for real-time semantic segmentation of road-driving images. In CVPR, June 2019.
  29. Pytorch: An imperative style, high-performance deep learning library. NeurIPS, 32:8026–8037, 2019.
  30. Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. In CVPR, pages 10213–10224, 2021.
  31. Designing network design spaces. In CVPR, pages 10428–10436, 2020.
  32. Overfeat: Integrated recognition, localization and detection using convolutional networks. In ICLR, 2014.
  33. Training region-based object detectors with online hard example mining. In CVPR, pages 761–769, 2016.
  34. Real-time semantic segmentation via multiply spatial fusion network. arXiv preprint arXiv:1911.07217, 2019.
  35. Efficientnet: Rethinking model scaling for convolutional neural networks. In ICML, pages 6105–6114. PMLR, 2019.
  36. Efficientnetv2: Smaller models and faster training. In ICML, 2021.
  37. Deep high-resolution representation learning for visual recognition. TPAMI, 2019.
  38. Aggregated residual transformations for deep neural networks. In CVPR, pages 1492–1500, 2017.
  39. Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation. IJCV, pages 1–18, 2021.
  40. Bisenet: Bilateral segmentation network for real-time semantic segmentation. In ECCV, pages 325–341, 2018.
  41. Resnest: Split-attention networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2736–2746, 2022.
  42. Customizable architecture search for semantic segmentation. In CVPR, pages 11641–11650, 2019.
  43. Pyramid scene parsing network. In CVPR, pages 2881–2890, 2017.
  44. Improving semantic segmentation via video propagation and label relaxation. In CVPR, pages 8856–8865, 2019.
Citations (26)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com