Papers
Topics
Authors
Recent
Search
2000 character limit reached

ConstraintMatch for Semi-constrained Clustering

Published 26 Nov 2023 in cs.LG, cs.CV, and stat.ML | (2311.15395v1)

Abstract: Constrained clustering allows the training of classification models using pairwise constraints only, which are weak and relatively easy to mine, while still yielding full-supervision-level model performance. While they perform well even in the absence of the true underlying class labels, constrained clustering models still require large amounts of binary constraint annotations for training. In this paper, we propose a semi-supervised context whereby a large amount of \textit{unconstrained} data is available alongside a smaller set of constraints, and propose \textit{ConstraintMatch} to leverage such unconstrained data. While a great deal of progress has been made in semi-supervised learning using full labels, there are a number of challenges that prevent a naive application of the resulting methods in the constraint-based label setting. Therefore, we reason about and analyze these challenges, specifically 1) proposing a \textit{pseudo-constraining} mechanism to overcome the confirmation bias, a major weakness of pseudo-labeling, 2) developing new methods for pseudo-labeling towards the selection of \textit{informative} unconstrained samples, 3) showing that this also allows the use of pairwise loss functions for the initial and auxiliary losses which facilitates semi-constrained model training. In extensive experiments, we demonstrate the effectiveness of ConstraintMatch over relevant baselines in both the regular clustering and overclustering scenarios on five challenging benchmarks and provide analyses of its several components.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Pseudo-labeling and confirmation bias in deep semi-supervised learning. 2020 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2020.
  2. Mixmatch: A holistic approach to semi-supervised learning. NeurIPS, 32, 2019.
  3. Deep clustering for unsupervised learning of visual features. Proceedings of the European conference on computer vision (ECCV), pages 132–149, 2018.
  4. Deep adaptive image clustering. IEEE/CVF CVPR Proceedings, pages 5879–5887, 2017.
  5. A simple framework for contrastive learning of visual representations. International Conference on Machine Learning, pages 1597–1607. PMLR, 2020.
  6. An analysis of single-layer networks in unsupervised feature learning. Proceedings of the fourteenth International Conference on Artificial Intelligence and Statistics, pages 215–223. JMLR, 2011.
  7. D. Janez. Statistical comparisons of classifiers over multiple data sets Journal of Machine Learning Research, pages 1–30. JMLR, 2006.
  8. Clustering-driven deep embedding with pairwise constraints. IEEE Computer Graphics and Applications, 39(4):16–27, 2019.
  9. Constrained clustering: Current and new trends. Guided Tour of Artificial Intelligence Research, pages 447–484. Springer, 2020.
  10. Improved deep embedded clustering with local structure preservation. IJCAI, pages 1753–1759, 2017.
  11. Momentum contrast for unsupervised visual representation learning. IEEE/CVF CVPR Proceedings, pages 9729–9738, 2020.
  12. Deep residual learning for image recognition. IEEE/CVF CVPR Proceedings, pages 770–778, 2016.
  13. Analysing the noise model error for realistic noisy label data. AAAI Proceedings, pages 7675–7684, 2021
  14. Neural network-based clustering using pairwise constraints. International Conference on Learning Representations Workshop, 2016.
  15. Learning to cluster in order to transfer across domains and tasks. International Conference on Learning Representations, 2018.
  16. Multi-class classification without multi-class labels. International Conference on Learning Representations, 2019.
  17. Deep semantic clustering by partition confidence maximisation. IEEE/CVF CVPR Proceedings, pages 8849–8858, 2020.
  18. Invariant information clustering for unsupervised image classification and segmentation. IEEE/CVF CVPR Proceedings, pages 9865–9874, 2019.
  19. Learning multiple layers of features from tiny images. Advances in Neural Information Processing Systems, 2009.
  20. H. Kuhn. The Hungarian Method for the Assignment Problem. Naval research logistics quarterly, 2(1-2):83–97, 1955.
  21. Featmatch: Feature-based Augmentation for Semi-supervised Learning. Proceedings of the European conference on computer vision (ECCV). Springer, 2020.
  22. D. Lee. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. Workshop on challenges in representation learning, International Conference on Machine Learning, page 896, 2013.
  23. Discriminatively boosted image clustering with fully convolutional auto-encoders. Pattern Recognition, 83:161–173, 2018.
  24. Contrastive clustering. AAAI Proceedings, 2021.
  25. J. Lin. Divergence measures based on the shannon entropy. IEEE Transactions on Information theory, 37(1):145–151, 1991.
  26. SGDR: Stochastic gradient descent with warm restarts. International Conference on Learning Representations, 2017.
  27. Generalized entropy regularization or: There’s nothing special about label smoothing. ACL, 2020.
  28. Semi-supervised clustering via pairwise constrained optimal graph. Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pages 3160–3166, 2021.
  29. Spice: Semantic pseudo-labeling for image clustering. arXiv preprint arXiv:2103.09382, 2021.
  30. You never cluster alone. Advances in Neural Information Processing Systems, 34, 2021.
  31. Semi-supervised clustering with neural networks. 2020 IEEE Sixth International Conference on Multimedia Big Data (BigMM), pages 152–161. IEEE, 2020.
  32. A classification-based approach to semi-supervised clustering with pairwise constraints. Neural Networks, 127:193–203, 2020.
  33. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. CoRR, abs/2001.07685, 2020.
  34. D. Steinley. Properties of the hubert-arable adjusted rand index. Psychological methods, 9(3):386, 2004.
  35. Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal of machine learning research, 3(Dec):583–617, 2002.
  36. On the importance of initialization and momentum in deep learning. International Conference on Machine Learning, pages 1139–1147. PMLR, 2013.
  37. Deepcluster: A general clustering framework based on deep learning. Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 809–825. Springer, 2017.
  38. Mice: Mixture of contrastive experts for unsupervised image clustering. International Conference on Learning Representations, 2021.
  39. A survey on semi-supervised learning. Machine Learning, 109(2):373–440, 2020.
  40. Scan: Learning to classify images without labels. European Conference on Computer Vision, pages 268–285. Springer, 2020.
  41. Clustering with instance-level constraints. AAAI Proceedings, 1097:577–584, 2000.
  42. Constrained k-means clustering with background knowledge. International Conference on Machine Learning, volume 1, pages 577–584, 2001.
  43. F. Wilcoxon. Individual comparisons by ranking methods. Biometrics, volume 1, pages 80–83, 1945.
  44. Deep comprehensive correlation mining for image clustering. IEEE/CVF CVPR Proceedings, pages 8150–8159, 2019.
  45. Joint unsupervised learning of deep representations and image clusters. IEEE/CVF CVPR Proceedings, pages 5147–5156, 2016.
  46. Flexmatch: Boosting semi-supervised learning with curriculum pseudo labeling. Advances in Neural Information Processing Systems, volume 34, pages 18408–18419.
  47. A framework for deep constrained clustering. Data Mining and Knowledge Discovery, 35(2):593–620, 2021.
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.