Unsupervised Foreground Extraction via Deep Region Competition (2110.15497v4)
Abstract: We present Deep Region Competition (DRC), an algorithm designed to extract foreground objects from images in a fully unsupervised manner. Foreground extraction can be viewed as a special case of generic image segmentation that focuses on identifying and disentangling objects from the background. In this work, we rethink the foreground extraction by reconciling energy-based prior with generative image modeling in the form of Mixture of Experts (MoE), where we further introduce the learned pixel re-assignment as the essential inductive bias to capture the regularities of background regions. With this modeling, the foreground-background partition can be naturally found through Expectation-Maximization (EM). We show that the proposed method effectively exploits the interaction between the mixture components during the partitioning process, which closely connects to region competition, a seminal approach for generic image segmentation. Experiments demonstrate that DRC exhibits more competitive performances on complex real-world data and challenging multi-object scenes compared with prior methods. Moreover, we show empirically that DRC can potentially generalize to novel foreground objects even from categories unseen during training.
- Region competition: Unifying snakes, region growing, and bayes/mdl for multiband image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 18(9):884–900, 1996.
- Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 22(8):888–905, 2000.
- Image segmentation by data-driven markov chain monte carlo. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 24(5):657–673, 2002.
- Interactive graph cuts for optimal boundary & region segmentation of objects in nd images. In Proceedings of International Conference on Computer Vision (ICCV), 2001.
- " grabcut" interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics (TOG), 23(3):309–314, 2004.
- Global contrast based salient region detection. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 37(3):569–582, 2014.
- Salient object detection: A discriminative regional feature integration approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.
- Saliency optimization from robust background detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
- Microsoft coco: Common objects in context. In Proceedings of European Conference on Computer Vision (ECCV), 2014.
- The pascal visual object classes (voc) challenge. International Journal of Computer Vision (IJCV), 88(2):303–338, 2010.
- Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
- Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 39(12):2481–2495, 2017.
- Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 40(4):834–848, 2017.
- U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-assisted Intervention, 2015.
- Mask r-cnn. In Proceedings of International Conference on Computer Vision (ICCV), 2017.
- Probabilistic models of cognition: Conceptual foundations. Trends in Cognitive Sciences, 10(7):287–291, 2006.
- From fragments to objects: Segmentation and grouping in vision. Elsevier, 2001.
- W-net: A deep model for fully unsupervised image segmentation. arXiv preprint arXiv:1711.08506, 2017.
- Asako Kanezaki. Unsupervised image segmentation by backpropagation. In International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018.
- Invariant information clustering for unsupervised image classification and segmentation. In Proceedings of International Conference on Computer Vision (ICCV), 2019.
- Contour detection and hierarchical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 33(5):898–916, 2010.
- Slic superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 34(11):2274–2282, 2012.
- Unsupervised moving object detection via contextual information separation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- Dystab: Unsupervised object segmentation via dynamic-static bootstrapping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Generative adversarial nets. In Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2014.
- Lr-gan: Layered recursive generative adversarial networks for image generation. In International Conference on Learning Representations (ICLR), 2017.
- Unsupervised object segmentation by redrawing. In Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2019.
- Seigan: Towards compositional image generation by simultaneously learning to segment, enhance, and inpaint. arXiv preprint arXiv:1811.07630, 2018.
- Finegan: Unsupervised hierarchical disentanglement for fine-grained object generation and discovery. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- Onegan: Simultaneous unsupervised learning of conditional image generation, foreground segmentation, and fine-grained clustering. In Proceedings of European Conference on Computer Vision (ECCV), 2020.
- Tagger: Deep unsupervised perceptual grouping. In Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2016.
- Attend, infer, repeat: Fast scene understanding with generative models. In Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2016.
- Neural expectation maximization. In Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2017.
- Relational neural expectation maximization: Unsupervised discovery of objects and their interactions. In International Conference on Learning Representations (ICLR), 2018.
- Monet: Unsupervised scene decomposition and representation. arXiv preprint arXiv:1901.11390, 2019.
- Multi-object representation learning with iterative variational inference. In Proceedings of International Conference on Machine Learning (ICML), 2019.
- Object-centric learning with slot attention. In Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2020.
- Genesis: Generative scene inference and sampling with object-centric latent representations. In International Conference on Learning Representations (ICLR), 2020.
- Space: Unsupervised object-oriented scene representation via spatial attention and decomposition. In International Conference on Learning Representations (ICLR), 2020.
- Snakes: Active contour models. International Journal of Computer Vision (IJCV), 1(4):321–331, 1988.
- Laurent D Cohen. On active contour models and balloons. CVGIP: Image understanding, 53(2):211–218, 1991.
- Seeded region growing. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 16(6):641–647, 1994.
- Filters, random fields and maximum entropy (frame): Towards a unified theory for texture modeling. International Journal of Computer Vision (IJCV), 27(2):107–126, 1998.
- Primal sketch: Integrating structure and texture. Computer Vision and Image Understanding (CVIU), 106(1):5–19, 2007.
- Yvan G Leclerc. Constructing simple stable descriptions for image partitioning. International Journal of Computer Vision (IJCV), 3(1):73–102, 1989.
- Learning latent space energy-based prior model. In Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2020.
- Adaptive mixtures of local experts. Neural Computation, 3(1):79–87, 1991.
- Hierarchical mixtures of experts and the em algorithm. Neural Computation, 6(2):181–214, 1994.
- Draw: A recurrent neural network for image generation. In Proceedings of International Conference on Machine Learning (ICML), 2015.
- Iterative amortized inference. In Proceedings of International Conference on Machine Learning (ICML), 2018.
- Scenecut: Joint geometric and object segmentation for indoor scenes. In Proceedings of International Conference on Robotics and Automation (ICRA), 2018.
- Indoor segmentation and support inference from rgbd images. In Proceedings of European Conference on Computer Vision (ECCV), 2012.
- Deep image prior. International Journal of Computer Vision (IJCV), 128(7), 2020.
- Unsupervised part-based disentangling of object shape and appearance. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation. In Proceedings of International Conference on Computer Vision (ICCV), 2015.
- Constrained convolutional neural networks for weakly supervised segmentation. In Proceedings of International Conference on Computer Vision (ICCV), 2015.
- Weakly-supervised semantic segmentation network with deep seeded region growing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In Proceedings of International Conference on Computer Vision (ICCV), 2015.
- Simple does it: Weakly supervised instance and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- Exploiting saliency for object segmentation from image level labels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- Joint learning of saliency detection and weakly supervised semantic segmentation. In Proceedings of International Conference on Computer Vision (ICCV), 2019.
- Object segmentation without labels with large-scale generative models. In Proceedings of International Conference on Machine Learning (ICML), 2021.
- Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of International Conference on Machine Learning (ICML), 2011.
- Handbook of markov chain monte carlo. CRC press, 2011.
- Bo Pang and Ying Nian Wu. Latent space energy-based model of symbol-vector coupling for text generation and classification. In Proceedings of International Conference on Machine Learning (ICML), 2021.
- Bela Julesz. Textons, the elements of texture perception, and their interactions. Nature, 290(5802):91–97, 1981.
- Pytorch: An imperative style, high-performance deep learning library. In Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2019.
- Infogan: interpretable representation learning by information maximizing generative adversarial nets. In Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2016.
- Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
- Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of International Conference on Machine Learning (ICML), 2015.
- Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022, 2016.
- Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. In International Conference on Learning Representations (ICLR), 2014.
- Neural photo editing with introspective adversarial networks. arXiv preprint arXiv:1609.07093, 2016.
- Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena, 60(1-4):259–268, 1992.
- Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology, 2010.
- Novel dataset for fine-grained image categorization: Stanford dogs. In CVPR Workshop on Fine-Grained Visual Categorization (FGVC), 2011.
- 3d object representations for fine-grained categorization. In ICCV workshops, 2013.
- Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- dsprites: Disentanglement testing sprites dataset. https://github.com/deepmind/dsprites-dataset/, 2017.
- Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- Detectron2. https://github.com/facebookresearch/detectron2, 2019.
- Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- Learning non-convergent non-persistent short-run mcmc toward energy-based model. In Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2019.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
Collections
Sign up for free to add this paper to one or more collections.