Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 77 tok/s
Gemini 2.5 Pro 56 tok/s Pro
GPT-5 Medium 33 tok/s Pro
GPT-5 High 21 tok/s Pro
GPT-4o 107 tok/s Pro
Kimi K2 196 tok/s Pro
GPT OSS 120B 436 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

Multi-view Aggregation Network for Dichotomous Image Segmentation (2404.07445v1)

Published 11 Apr 2024 in cs.CV

Abstract: Dichotomous Image Segmentation (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images. When designing an effective DIS model, the main challenge is how to balance the semantic dispersion of high-resolution targets in the small receptive field and the loss of high-precision details in the large receptive field. Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement. Human visual system captures regions of interest by observing them from multiple views. Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet), which unifies the feature fusion of the distant view and close-up view into a single stream with one encoder-decoder structure. With the help of the proposed multi-view complementary localization and refinement modules, our approach established long-range, profound visual interactions across multiple views, allowing the features of the detailed close-up view to focus on highly slender structures.Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed. The source code and datasets will be publicly available at \href{https://github.com/qianyu-dlut/MVANet}{MVANet}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Strpm: A spatiotemporal residual predictive model for high-resolution video prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13946–13955, 2022.
  2. Global context-aware progressive aggregation network for salient object detection. In Proceedings of the AAAI conference on artificial intelligence, pages 10599–10606, 2020.
  3. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  4. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  5. Structure-measure: A new way to evaluate foreground maps. In Proceedings of the IEEE international conference on computer vision, pages 4548–4557, 2017.
  6. Enhanced-alignment measure for binary foreground map evaluation. arXiv preprint arXiv:1805.10421, 2018.
  7. Camouflaged object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2777–2787, 2020.
  8. Res2net: A new multi-scale backbone architecture. IEEE transactions on pattern analysis and machine intelligence, 43(2):652–662, 2019.
  9. Context-aware saliency detection. IEEE transactions on pattern analysis and machine intelligence, 34(10):1915–1926, 2011.
  10. Isdnet: Integrating shallow and deep networks for efficient ultra-high resolution segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4361–4370, 2022.
  11. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  12. Epipolar transformers. In Proceedings of the ieee/cvf conference on computer vision and pattern recognition, pages 7779–7788, 2020.
  13. Learning implicit feature alignment function for semantic segmentation. In European Conference on Computer Vision, pages 487–505. Springer, 2022.
  14. Revisiting image pyramid structure for high resolution salient object detection. In Proceedings of the Asian Conference on Computer Vision, pages 108–124, 2022.
  15. On the choice of data for efficient training and validation of end-to-end driving models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4803–4812, 2022.
  16. Anabranch network for camouflaged object segmentation. Computer vision and image understanding, 184:45–56, 2019.
  17. Neuralangelo: High-fidelity neural surface reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8456–8465, 2023.
  18. Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1833–1844, 2021.
  19. Deep interactive thin object selection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 305–314, 2021.
  20. Real-time high-resolution background matting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8762–8771, 2021.
  21. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2117–2125, 2017.
  22. Fully understanding generic objects: Modeling, segmentation, and reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7423–7433, 2021a.
  23. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021b.
  24. How to evaluate foreground maps? In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 248–255, 2014.
  25. Camouflaged object segmentation with distraction mining. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8772–8781, 2021.
  26. From model-based to data-driven simulation: Challenges and trends in autonomous driving. arXiv preprint arXiv:2305.13960, 2023.
  27. Multi-scale interactive network for salient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9413–9422, 2020.
  28. Unite-divide-unite: Joint boosting trunk and structure for high-accuracy dichotomous image segmentation. In Proceedings of the 31st ACM International Conference on Multimedia, pages 2139–2147, 2023.
  29. Saliency filters: Contrast based filtering for salient region detection. In 2012 IEEE conference on computer vision and pattern recognition, pages 733–740. IEEE, 2012.
  30. Boundary-aware segmentation network for mobile and web applications. arxiv 2021. arXiv preprint arXiv:2101.04704.
  31. Highly accurate dichotomous image segmentation. In European Conference on Computer Vision, pages 38–56. Springer, 2022.
  32. Efficient & effective prioritized matching for large-scale image-based localization. IEEE transactions on pattern analysis and machine intelligence, 39(9):1744–1756, 2016.
  33. Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE international conference on computer vision, pages 945–953, 2015.
  34. Shiliang Sun. A survey of multi-view machine learning. Neural computing and applications, 23:2031–2038, 2013.
  35. Kine-appendage: Enhancing freehand vr interaction through transformations of virtual appendages. IEEE Transactions on Visualization and Computer Graphics, 2022.
  36. Multi-view 3d reconstruction with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5722–5731, 2021.
  37. Autorecon: Automated 3d object discovery and reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21382–21391, 2023.
  38. F33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPTnet: fusion, feedback and focus for salient object detection. In Proceedings of the AAAI conference on artificial intelligence, pages 12321–12328, 2020.
  39. P2t: Pyramid pooling transformer for scene understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  40. Pyramid grafting network for one-stage high resolution saliency detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11717–11726, 2022.
  41. Pix2vox: Context-aware 3d reconstruction from single and multi-view images. In Proceedings of the IEEE/CVF international conference on computer vision, pages 2690–2698, 2019.
  42. Meticulous object segmentation. arXiv preprint arXiv:2012.07181, 2020.
  43. Multi-view harmonized bilinear network for 3d object recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 186–194, 2018.
  44. Multi-view learning overview: Recent progress and new challenges. Information Fusion, 38:43–54, 2017.
  45. Suppress and balance: A simple gated network for salient object detection. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 35–51. Springer, 2020.
  46. Self-supervised pretraining for rgb-d salient object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 3463–3471, 2022.
  47. Dichotomous image segmentation with frequency priors.
  48. I can find you! boundary-guided separated attention network for camouflaged object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 3608–3616, 2022.
  49. Asymmetric non-local neural networks for semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 593–602, 2019.
Citations (8)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 2 posts and received 0 likes.