Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 190 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 39 tok/s Pro
GPT-5 High 46 tok/s Pro
GPT-4o 130 tok/s Pro
Kimi K2 202 tok/s Pro
GPT OSS 120B 439 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Learning Representations for Clustering via Partial Information Discrimination and Cross-Level Interaction (2401.13503v1)

Published 24 Jan 2024 in cs.CV

Abstract: In this paper, we present a novel deep image clustering approach termed PICI, which enforces the partial information discrimination and the cross-level interaction in a joint learning framework. In particular, we leverage a Transformer encoder as the backbone, through which the masked image modeling with two paralleled augmented views is formulated. After deriving the class tokens from the masked images by the Transformer encoder, three partial information learning modules are further incorporated, including the PISD module for training the auto-encoder via masked image reconstruction, the PICD module for employing two levels of contrastive learning, and the CLI module for mutual interaction between the instance-level and cluster-level subspaces. Extensive experiments have been conducted on six real-world image datasets, which demononstrate the superior clustering performance of the proposed PICI approach over the state-of-the-art deep clustering approaches. The source code is available at https://github.com/Regan-Zhang/PICI.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Unsupervised deep embedding for clustering analysis, in: Proc. of International Conference on Machine Learning (ICML), 2016.
  2. Improved deep embedded clustering with local structure preservation., in: Proc. of International Joint Conference on Artificial Intelligence (IJCAI), 2017.
  3. Deep embedded clustering with data augmentation, in: Asian conference on machine learning, PMLR, 2018, pp. 550–565.
  4. Contrastive clustering, in: Proc. of AAAI Conference on Artificial Intelligence (AAAI), 2021.
  5. Clustering-friendly representation learning via instance discrimination and feature decorrelation, arXiv preprint arXiv:2106.00131 (2021).
  6. Prototypical contrastive learning of unsupervised representations, arXiv preprint arXiv:2005.04966 (2020).
  7. Masked autoencoders are scalable vision learners, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 16000–16009.
  8. An image is worth 16x16 words: Transformers for image recognition at scale, in: Proc. of International Conference on Learning Representations (ICLR), 2021.
  9. Unsupervised feature learning via non-parametric instance discrimination, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 3733–3742.
  10. A simple framework for contrastive learning of visual representations, in: Proc. of International Conference on Machine Learning (ICML), 2020.
  11. Momentum contrast for unsupervised visual representation learning, in: Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  12. Bootstrap your own latent-a new approach to self-supervised learning, in: Advanced in Neural Information Processing Systems (NeurIPS), 2020.
  13. X. Chen, K. He, Exploring simple siamese representation learning, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 15750–15758.
  14. Unsupervised learning of visual features by contrasting cluster assignments, in: Advanced in Neural Information Processing Systems (NeurIPS), 2020.
  15. Masked siamese networks for label-efficient learning, in: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXI, Springer, 2022, pp. 456–473.
  16. Contrastive masked autoencoders are stronger vision learners, arXiv preprint arXiv:2207.13532 (2022).
  17. Context autoencoder for self-supervised representation learning, arXiv preprint arXiv:2202.03026 (2022).
  18. Adaptive self-paced deep clustering with data augmentation, IEEE Transactions on Knowledge and Data Engineering 32 (2019) 1680–1693.
  19. Deep clustering and visualization for end-to-end high-dimensional data analysis, IEEE Transactions on Neural Networks and Learning Systems (2022).
  20. Heterogeneous tri-stream clustering network, Neural Processing Letters (2023) 1–14.
  21. Deep clustering for unsupervised learning of visual features, in: Proceedings of the European conference on computer vision (ECCV), 2018, pp. 132–149.
  22. Graph contrastive clustering, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 9224–9233.
  23. Doubly contrastive deep clustering, arXiv preprint arXiv:2103.05484 (2021).
  24. Self-labelling via simultaneous clustering and representation learning, arXiv preprint arXiv:1911.05371 (2019).
  25. X. Wang, G.-J. Qi, Contrastive learning with stronger augmentations, IEEE transactions on pattern analysis and machine intelligence 45 (2022) 5549–5560.
  26. Strongly augmented contrastive clustering, Pattern Recognition 139 (2023) 109470.
  27. Attention is all you need, in: Advanced in Neural Information Processing Systems (NeurIPS), 2017.
  28. Representation learning with contrastive predictive coding, arXiv preprint arXiv:1807.03748 (2018).
  29. Twin contrastive learning for online clustering, International Journal of Computer Vision 130 (2022) 2205–2221.
  30. Multi-level feature learning for contrastive multi-view clustering, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 16051–16060.
  31. R. Jonker, T. Volgenant, Improving the hungarian assignment algorithm, Operations Research Letters 5 (1986) 171–175.
  32. Training data-efficient image transformers & distillation through attention, in: Proc. of International Conference on Machine Learning (ICML), 2021.
  33. Accurate object localization in remote sensing images based on convolutional neural networks, IEEE Transactions on Geoscience and Remote Sensing 55 (2017) 2486–2498.
  34. Y. Yang, S. Newsam, Bag-of-visual-words and spatial extensions for land-use classification, in: Proc. of SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2010.
  35. Dirichlet-derived multiple topic scene classification model for high spatial resolution remote sensing imagery, IEEE Transactions on Geoscience and Remote Sensing 54 (2015) 2108–2123.
  36. Aid: A benchmark data set for performance evaluation of aerial scene classification, IEEE Transactions on Geoscience and Remote Sensing 55 (2017) 3965–3981.
  37. Multi-level learning features for automatic classification of field crop pests, Computers and Electronics in Agriculture 152 (2018) 233–241.
  38. Hard sample aware noise robust learning for histopathology image classification, IEEE Transactions on Medical Imaging 41 (2021) 881–894.
  39. S. J. Choudhury, N. R. Pal, Deep and structure-preserving autoencoders for clustering data with missing information, IEEE Transactions on Emerging Topics in Computational Intelligence 5 (2021) 639–650.
  40. Ultra-scalable spectral clustering and ensemble clustering, IEEE Transactions on Knowledge and Data Engineering 32 (2020) 1212–1226.
  41. Toward multidiversified ensemble clustering of high-dimensional data: From subspaces to metrics and beyond, IEEE Transactions on Cybernetics (2021) 1–14.
  42. J. MacQueen, et al., Some methods for classification and analysis of multivariate observations, in: Proc. of Mathematical Statistics and Probability, 1967.
  43. L. Zelnik-Manor, P. Perona, Self-tuning spectral clustering, in: Advanced in Neural Information Processing Systems (NeurIPS), 2005.
  44. K. C. Gowda, G. Krishna, Agglomerative clustering using the concept of mutual nearest neighbourhood, Pattern Recognition 10 (1978) 105–12.
  45. Locality preserving nonnegative matrix factorization, in: Proc. of International Joint Conference on Artificial Intelligence (IJCAI), 2009.
  46. A. Martinez, A. Kak, Pca versus lda, IEEE Transactions on Pattern Analysis and Machine Intelligence 23 (2001) 228–233.
  47. BIRCH: An efficient data clustering method for very large databases, in: Proc. of SIGMOD International Conference on Management of Data, 1996.
  48. C. Fraley, A. E. Raftery, Enhanced model-based clustering, density estimation,and discriminant analysis software: MCLUST, Journal of Classification 20 (2003) 263–286.
  49. L. Van der Maaten, G. Hinton, Visualizing data using t-sne, Journal of machine learning research 9 (2008).

Summary

  • The paper introduces PICI, a method that leverages partial information discrimination and cross-level interaction to enhance clustering performance.
  • It employs a Transformer encoder with dual image augmentations to simulate masked modeling and extract discriminative features.
  • Empirical results across six datasets demonstrate that PICI outperforms state-of-the-art methods under multiple clustering metrics.

Overview of the Proposed Method

The research paper introduces a new method for image clustering, referred to as Partial Information discrimination and Cross-level Interaction (PICI), seeking to resolve limitations present in previous deep clustering approaches. Notably, existing methods tend to focus on global distribution-based losses, operate mainly at the full-image scale, and insufficiently utilize the potential benefits of interaction between multiple levels of learning. PICI offers a novel perspective that leverages sample-wise relationships through partial information discrimination and fosters interaction across different representation levels.

Transformer Encoders and Augmentations

Central to PICI's approach is the use of a Transformer encoder as the network's backbone, chosen for its prowess in capturing global relationships via the self-attention mechanism. Employing Transformer architectures to process images with masked portions, PICI boosts the recovery of semantic information and extracts discriminative features for clustering. Two types of image augmentations are applied to generate parallel views, each of which undergoes random masking to simulate partial information loss, a crucial step in the learning process.

Learning Modules and Contribution

The paper outlines three learning modules utilized by PICI:

  1. The Partial Information Self-Discrimination (PISD) module emphasizes learning through the reconstruction of images with masked patches.
  2. The Partial Information Contrastive Discrimination (PICD) module utilizes class tokens to drive contrastive learning at both instance and cluster levels.
  3. The Cross-Level Interaction (CLI) module enforces consistency across different levels of learning by using pseudo labels to bridge the instance-wise and the cluster-wise subspaces.

These modules collectively constitute an unsupervised learning framework that innovatively merges masked image modeling with deep contrastive clustering.

Empirical Validation

The empirical results provide compelling evidence of PICI's effectiveness. PICI was benchmarked across six diverse image datasets, where it demonstrated considerable improvements over state-of-the-art methods under a variety of standard clustering metrics. The comprehensive experimental analysis solidified PICI's standing, with significant performance gains reported.

Conclusion and Impact

The PICI approach reflects a successful attempt to tackle the inherent shortcomings of deep clustering methods. It pioneers the fusion of masked image modeling with contrastive clustering, resulting in an improved mechanism for representation learning. The publication of the source code complements the paper's contributory value to the field, inviting further exploration and enhancement from the research community. Overall, PICI sets a new benchmark in deep clustering, marrying the strengths of the Transformer with a nuanced understanding of data relationships at multiple scales.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Github Logo Streamline Icon: https://streamlinehq.com