Multi-view Aggregation Network for Dichotomous Image Segmentation (2404.07445v1)
Abstract: Dichotomous Image Segmentation (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images. When designing an effective DIS model, the main challenge is how to balance the semantic dispersion of high-resolution targets in the small receptive field and the loss of high-precision details in the large receptive field. Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement. Human visual system captures regions of interest by observing them from multiple views. Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet), which unifies the feature fusion of the distant view and close-up view into a single stream with one encoder-decoder structure. With the help of the proposed multi-view complementary localization and refinement modules, our approach established long-range, profound visual interactions across multiple views, allowing the features of the detailed close-up view to focus on highly slender structures.Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed. The source code and datasets will be publicly available at \href{https://github.com/qianyu-dlut/MVANet}{MVANet}.
- Strpm: A spatiotemporal residual predictive model for high-resolution video prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13946–13955, 2022.
- Global context-aware progressive aggregation network for salient object detection. In Proceedings of the AAAI conference on artificial intelligence, pages 10599–10606, 2020.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Structure-measure: A new way to evaluate foreground maps. In Proceedings of the IEEE international conference on computer vision, pages 4548–4557, 2017.
- Enhanced-alignment measure for binary foreground map evaluation. arXiv preprint arXiv:1805.10421, 2018.
- Camouflaged object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2777–2787, 2020.
- Res2net: A new multi-scale backbone architecture. IEEE transactions on pattern analysis and machine intelligence, 43(2):652–662, 2019.
- Context-aware saliency detection. IEEE transactions on pattern analysis and machine intelligence, 34(10):1915–1926, 2011.
- Isdnet: Integrating shallow and deep networks for efficient ultra-high resolution segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4361–4370, 2022.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Epipolar transformers. In Proceedings of the ieee/cvf conference on computer vision and pattern recognition, pages 7779–7788, 2020.
- Learning implicit feature alignment function for semantic segmentation. In European Conference on Computer Vision, pages 487–505. Springer, 2022.
- Revisiting image pyramid structure for high resolution salient object detection. In Proceedings of the Asian Conference on Computer Vision, pages 108–124, 2022.
- On the choice of data for efficient training and validation of end-to-end driving models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4803–4812, 2022.
- Anabranch network for camouflaged object segmentation. Computer vision and image understanding, 184:45–56, 2019.
- Neuralangelo: High-fidelity neural surface reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8456–8465, 2023.
- Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1833–1844, 2021.
- Deep interactive thin object selection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 305–314, 2021.
- Real-time high-resolution background matting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8762–8771, 2021.
- Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2117–2125, 2017.
- Fully understanding generic objects: Modeling, segmentation, and reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7423–7433, 2021a.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021b.
- How to evaluate foreground maps? In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 248–255, 2014.
- Camouflaged object segmentation with distraction mining. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8772–8781, 2021.
- From model-based to data-driven simulation: Challenges and trends in autonomous driving. arXiv preprint arXiv:2305.13960, 2023.
- Multi-scale interactive network for salient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9413–9422, 2020.
- Unite-divide-unite: Joint boosting trunk and structure for high-accuracy dichotomous image segmentation. In Proceedings of the 31st ACM International Conference on Multimedia, pages 2139–2147, 2023.
- Saliency filters: Contrast based filtering for salient region detection. In 2012 IEEE conference on computer vision and pattern recognition, pages 733–740. IEEE, 2012.
- Boundary-aware segmentation network for mobile and web applications. arxiv 2021. arXiv preprint arXiv:2101.04704.
- Highly accurate dichotomous image segmentation. In European Conference on Computer Vision, pages 38–56. Springer, 2022.
- Efficient & effective prioritized matching for large-scale image-based localization. IEEE transactions on pattern analysis and machine intelligence, 39(9):1744–1756, 2016.
- Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE international conference on computer vision, pages 945–953, 2015.
- Shiliang Sun. A survey of multi-view machine learning. Neural computing and applications, 23:2031–2038, 2013.
- Kine-appendage: Enhancing freehand vr interaction through transformations of virtual appendages. IEEE Transactions on Visualization and Computer Graphics, 2022.
- Multi-view 3d reconstruction with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5722–5731, 2021.
- Autorecon: Automated 3d object discovery and reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21382–21391, 2023.
- F33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPTnet: fusion, feedback and focus for salient object detection. In Proceedings of the AAAI conference on artificial intelligence, pages 12321–12328, 2020.
- P2t: Pyramid pooling transformer for scene understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
- Pyramid grafting network for one-stage high resolution saliency detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11717–11726, 2022.
- Pix2vox: Context-aware 3d reconstruction from single and multi-view images. In Proceedings of the IEEE/CVF international conference on computer vision, pages 2690–2698, 2019.
- Meticulous object segmentation. arXiv preprint arXiv:2012.07181, 2020.
- Multi-view harmonized bilinear network for 3d object recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 186–194, 2018.
- Multi-view learning overview: Recent progress and new challenges. Information Fusion, 38:43–54, 2017.
- Suppress and balance: A simple gated network for salient object detection. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 35–51. Springer, 2020.
- Self-supervised pretraining for rgb-d salient object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 3463–3471, 2022.
- Dichotomous image segmentation with frequency priors.
- I can find you! boundary-guided separated attention network for camouflaged object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 3608–3616, 2022.
- Asymmetric non-local neural networks for semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 593–602, 2019.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.