Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 101 tok/s
Gemini 2.5 Pro 59 tok/s Pro
GPT-5 Medium 31 tok/s
GPT-5 High 40 tok/s Pro
GPT-4o 109 tok/s
GPT OSS 120B 470 tok/s Pro
Kimi K2 227 tok/s Pro
2000 character limit reached

Unsupervised Foreground Extraction via Deep Region Competition (2110.15497v4)

Published 29 Oct 2021 in cs.CV and cs.LG

Abstract: We present Deep Region Competition (DRC), an algorithm designed to extract foreground objects from images in a fully unsupervised manner. Foreground extraction can be viewed as a special case of generic image segmentation that focuses on identifying and disentangling objects from the background. In this work, we rethink the foreground extraction by reconciling energy-based prior with generative image modeling in the form of Mixture of Experts (MoE), where we further introduce the learned pixel re-assignment as the essential inductive bias to capture the regularities of background regions. With this modeling, the foreground-background partition can be naturally found through Expectation-Maximization (EM). We show that the proposed method effectively exploits the interaction between the mixture components during the partitioning process, which closely connects to region competition, a seminal approach for generic image segmentation. Experiments demonstrate that DRC exhibits more competitive performances on complex real-world data and challenging multi-object scenes compared with prior methods. Moreover, we show empirically that DRC can potentially generalize to novel foreground objects even from categories unseen during training.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (85)
  1. Region competition: Unifying snakes, region growing, and bayes/mdl for multiband image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 18(9):884–900, 1996.
  2. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 22(8):888–905, 2000.
  3. Image segmentation by data-driven markov chain monte carlo. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 24(5):657–673, 2002.
  4. Interactive graph cuts for optimal boundary & region segmentation of objects in nd images. In Proceedings of International Conference on Computer Vision (ICCV), 2001.
  5. " grabcut" interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics (TOG), 23(3):309–314, 2004.
  6. Global contrast based salient region detection. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 37(3):569–582, 2014.
  7. Salient object detection: A discriminative regional feature integration approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.
  8. Saliency optimization from robust background detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
  9. Microsoft coco: Common objects in context. In Proceedings of European Conference on Computer Vision (ECCV), 2014.
  10. The pascal visual object classes (voc) challenge. International Journal of Computer Vision (IJCV), 88(2):303–338, 2010.
  11. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  12. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
  13. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 39(12):2481–2495, 2017.
  14. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 40(4):834–848, 2017.
  15. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-assisted Intervention, 2015.
  16. Mask r-cnn. In Proceedings of International Conference on Computer Vision (ICCV), 2017.
  17. Probabilistic models of cognition: Conceptual foundations. Trends in Cognitive Sciences, 10(7):287–291, 2006.
  18. From fragments to objects: Segmentation and grouping in vision. Elsevier, 2001.
  19. W-net: A deep model for fully unsupervised image segmentation. arXiv preprint arXiv:1711.08506, 2017.
  20. Asako Kanezaki. Unsupervised image segmentation by backpropagation. In International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018.
  21. Invariant information clustering for unsupervised image classification and segmentation. In Proceedings of International Conference on Computer Vision (ICCV), 2019.
  22. Contour detection and hierarchical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 33(5):898–916, 2010.
  23. Slic superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 34(11):2274–2282, 2012.
  24. Unsupervised moving object detection via contextual information separation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  25. Dystab: Unsupervised object segmentation via dynamic-static bootstrapping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  26. Generative adversarial nets. In Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2014.
  27. Lr-gan: Layered recursive generative adversarial networks for image generation. In International Conference on Learning Representations (ICLR), 2017.
  28. Unsupervised object segmentation by redrawing. In Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2019.
  29. Seigan: Towards compositional image generation by simultaneously learning to segment, enhance, and inpaint. arXiv preprint arXiv:1811.07630, 2018.
  30. Finegan: Unsupervised hierarchical disentanglement for fine-grained object generation and discovery. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  31. Onegan: Simultaneous unsupervised learning of conditional image generation, foreground segmentation, and fine-grained clustering. In Proceedings of European Conference on Computer Vision (ECCV), 2020.
  32. Tagger: Deep unsupervised perceptual grouping. In Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2016.
  33. Attend, infer, repeat: Fast scene understanding with generative models. In Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2016.
  34. Neural expectation maximization. In Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2017.
  35. Relational neural expectation maximization: Unsupervised discovery of objects and their interactions. In International Conference on Learning Representations (ICLR), 2018.
  36. Monet: Unsupervised scene decomposition and representation. arXiv preprint arXiv:1901.11390, 2019.
  37. Multi-object representation learning with iterative variational inference. In Proceedings of International Conference on Machine Learning (ICML), 2019.
  38. Object-centric learning with slot attention. In Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2020.
  39. Genesis: Generative scene inference and sampling with object-centric latent representations. In International Conference on Learning Representations (ICLR), 2020.
  40. Space: Unsupervised object-oriented scene representation via spatial attention and decomposition. In International Conference on Learning Representations (ICLR), 2020.
  41. Snakes: Active contour models. International Journal of Computer Vision (IJCV), 1(4):321–331, 1988.
  42. Laurent D Cohen. On active contour models and balloons. CVGIP: Image understanding, 53(2):211–218, 1991.
  43. Seeded region growing. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 16(6):641–647, 1994.
  44. Filters, random fields and maximum entropy (frame): Towards a unified theory for texture modeling. International Journal of Computer Vision (IJCV), 27(2):107–126, 1998.
  45. Primal sketch: Integrating structure and texture. Computer Vision and Image Understanding (CVIU), 106(1):5–19, 2007.
  46. Yvan G Leclerc. Constructing simple stable descriptions for image partitioning. International Journal of Computer Vision (IJCV), 3(1):73–102, 1989.
  47. Learning latent space energy-based prior model. In Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2020.
  48. Adaptive mixtures of local experts. Neural Computation, 3(1):79–87, 1991.
  49. Hierarchical mixtures of experts and the em algorithm. Neural Computation, 6(2):181–214, 1994.
  50. Draw: A recurrent neural network for image generation. In Proceedings of International Conference on Machine Learning (ICML), 2015.
  51. Iterative amortized inference. In Proceedings of International Conference on Machine Learning (ICML), 2018.
  52. Scenecut: Joint geometric and object segmentation for indoor scenes. In Proceedings of International Conference on Robotics and Automation (ICRA), 2018.
  53. Indoor segmentation and support inference from rgbd images. In Proceedings of European Conference on Computer Vision (ECCV), 2012.
  54. Deep image prior. International Journal of Computer Vision (IJCV), 128(7), 2020.
  55. Unsupervised part-based disentangling of object shape and appearance. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  56. Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation. In Proceedings of International Conference on Computer Vision (ICCV), 2015.
  57. Constrained convolutional neural networks for weakly supervised segmentation. In Proceedings of International Conference on Computer Vision (ICCV), 2015.
  58. Weakly-supervised semantic segmentation network with deep seeded region growing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  59. Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In Proceedings of International Conference on Computer Vision (ICCV), 2015.
  60. Simple does it: Weakly supervised instance and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  61. Exploiting saliency for object segmentation from image level labels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  62. Joint learning of saliency detection and weakly supervised semantic segmentation. In Proceedings of International Conference on Computer Vision (ICCV), 2019.
  63. Object segmentation without labels with large-scale generative models. In Proceedings of International Conference on Machine Learning (ICML), 2021.
  64. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of International Conference on Machine Learning (ICML), 2011.
  65. Handbook of markov chain monte carlo. CRC press, 2011.
  66. Bo Pang and Ying Nian Wu. Latent space energy-based model of symbol-vector coupling for text generation and classification. In Proceedings of International Conference on Machine Learning (ICML), 2021.
  67. Bela Julesz. Textons, the elements of texture perception, and their interactions. Nature, 290(5802):91–97, 1981.
  68. Pytorch: An imperative style, high-performance deep learning library. In Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2019.
  69. Infogan: interpretable representation learning by information maximizing generative adversarial nets. In Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2016.
  70. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
  71. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of International Conference on Machine Learning (ICML), 2015.
  72. Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022, 2016.
  73. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. In International Conference on Learning Representations (ICLR), 2014.
  74. Neural photo editing with introspective adversarial networks. arXiv preprint arXiv:1609.07093, 2016.
  75. Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena, 60(1-4):259–268, 1992.
  76. Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology, 2010.
  77. Novel dataset for fine-grained image categorization: Stanford dogs. In CVPR Workshop on Fine-Grained Visual Categorization (FGVC), 2011.
  78. 3d object representations for fine-grained categorization. In ICCV workshops, 2013.
  79. Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  80. dsprites: Disentanglement testing sprites dataset. https://github.com/deepmind/dsprites-dataset/, 2017.
  81. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  82. Detectron2. https://github.com/facebookresearch/detectron2, 2019.
  83. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  84. Learning non-convergent non-persistent short-run mcmc toward energy-based model. In Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2019.
  85. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
Citations (37)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube