Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Feature boosting with efficient attention for scene parsing (2402.19250v1)

Published 29 Feb 2024 in cs.CV

Abstract: The complexity of scene parsing grows with the number of object and scene classes, which is higher in unrestricted open scenes. The biggest challenge is to model the spatial relation between scene elements while succeeding in identifying objects at smaller scales. This paper presents a novel feature-boosting network that gathers spatial context from multiple levels of feature extraction and computes the attention weights for each level of representation to generate the final class labels. A novel `channel attention module' is designed to compute the attention weights, ensuring that features from the relevant extraction stages are boosted while the others are attenuated. The model also learns spatial context information at low resolution to preserve the abstract spatial relationships among scene elements and reduce computation cost. Spatial attention is subsequently concatenated into a final feature set before applying feature boosting. Low-resolution spatial attention features are trained using an auxiliary task that helps learning a coarse global scene structure. The proposed model outperforms all state-of-the-art models on both the ADE20K and the Cityscapes datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence, 39, 2481–2495.
  2. Opensurfaces: A richly annotated catalog of surface appearance. ACM Transactions on graphics (TOG), 32, 1–17.
  3. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587, .
  4. Attention to scale: Scale-aware semantic image segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3640–3649).
  5. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV) (pp. 801–818).
  6. Detect what you can: Detecting and representing objects using holistic models and body parts. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1971–1978).
  7. The cityscapes dataset for semantic urban scene understanding. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  8. Fleuret, F. et al. (2021). Uncertainty reduction for model adaptation in semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 9613–9623).
  9. Dual attention network for scene segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3146–3154).
  10. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).
  11. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7132–7141).
  12. Pancreas segmentation via spatial context based u-net and bidirectional lstm. arXiv preprint arXiv:1903.00832, .
  13. Pyramid attention network for semantic segmentation. arXiv preprint arXiv:1805.10180, .
  14. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1925–1934).
  15. A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130, .
  16. Recent progress in semantic image segmentation. Artificial Intelligence Review, 52, 1089–1106.
  17. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431–3440).
  18. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 Fourth International Conference on 3D Vision (3DV) (pp. 565–571). IEEE.
  19. The role of context for object detection and semantic segmentation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 891–898).
  20. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE international conference on computer vision (pp. 1520–1528).
  21. Fast-scnn: fast semantic segmentation network. arXiv preprint arXiv:1902.04502, .
  22. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention (pp. 234–241). Springer.
  23. Deep semantic segmentation of natural and medical images: A review. arXiv preprint arXiv:1910.07655, .
  24. Attention is all you need. arXiv preprint arXiv:1706.03762, .
  25. Residual attention network for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3156–3164).
  26. Ace: Adapting to changing environments for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2121–2130).
  27. Unified perceptual parsing for scene understanding. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 418–434).
  28. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1492–1500).
  29. Automatic segmentation of kidney and renal tumor in ct images based on 3d fully convolutional neural network with pyramid pooling module. In 2018 24th International Conference on Pattern Recognition (ICPR) (pp. 3790–3795). IEEE.
  30. Semantic segmentation for high spatial resolution remote sensing images based on convolution neural network and pyramid pooling module. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 11, 3252–3261.
  31. Object-contextual representations for semantic segmentation. arXiv preprint arXiv:1909.11065, .
  32. Ocnet: Object context network for scene parsing. arXiv preprint arXiv:1809.00916, .
  33. Context encoding for semantic segmentation. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 7151–7160).
  34. Resnest: Split-attention networks. arXiv preprint arXiv:2004.08955, .
  35. Dcnas: Densely connected neural architecture search for semantic image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 13956–13967).
  36. Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2881–2890).
  37. Psanet: Point-wise spatial attention network for scene parsing. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 267–283).
  38. Scene parsing through ade20k dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 633–641).

Summary

We haven't generated a summary for this paper yet.