Learning Representations for Clustering via Partial Information Discrimination and Cross-Level Interaction (2401.13503v1)

Published 24 Jan 2024 in cs.CV

Abstract: In this paper, we present a novel deep image clustering approach termed PICI, which enforces the partial information discrimination and the cross-level interaction in a joint learning framework. In particular, we leverage a Transformer encoder as the backbone, through which the masked image modeling with two paralleled augmented views is formulated. After deriving the class tokens from the masked images by the Transformer encoder, three partial information learning modules are further incorporated, including the PISD module for training the auto-encoder via masked image reconstruction, the PICD module for employing two levels of contrastive learning, and the CLI module for mutual interaction between the instance-level and cluster-level subspaces. Extensive experiments have been conducted on six real-world image datasets, which demononstrate the superior clustering performance of the proposed PICI approach over the state-of-the-art deep clustering approaches. The source code is available at https://github.com/Regan-Zhang/PICI.

References (49)

Summary

The paper introduces PICI, a method that leverages partial information discrimination and cross-level interaction to enhance clustering performance.
It employs a Transformer encoder with dual image augmentations to simulate masked modeling and extract discriminative features.
Empirical results across six datasets demonstrate that PICI outperforms state-of-the-art methods under multiple clustering metrics.

Overview of the Proposed Method

The research paper introduces a new method for image clustering, referred to as Partial Information discrimination and Cross-level Interaction (PICI), seeking to resolve limitations present in previous deep clustering approaches. Notably, existing methods tend to focus on global distribution-based losses, operate mainly at the full-image scale, and insufficiently utilize the potential benefits of interaction between multiple levels of learning. PICI offers a novel perspective that leverages sample-wise relationships through partial information discrimination and fosters interaction across different representation levels.

Transformer Encoders and Augmentations

Central to PICI's approach is the use of a Transformer encoder as the network's backbone, chosen for its prowess in capturing global relationships via the self-attention mechanism. Employing Transformer architectures to process images with masked portions, PICI boosts the recovery of semantic information and extracts discriminative features for clustering. Two types of image augmentations are applied to generate parallel views, each of which undergoes random masking to simulate partial information loss, a crucial step in the learning process.

Learning Modules and Contribution

The paper outlines three learning modules utilized by PICI:

The Partial Information Self-Discrimination (PISD) module emphasizes learning through the reconstruction of images with masked patches.
The Partial Information Contrastive Discrimination (PICD) module utilizes class tokens to drive contrastive learning at both instance and cluster levels.
The Cross-Level Interaction (CLI) module enforces consistency across different levels of learning by using pseudo labels to bridge the instance-wise and the cluster-wise subspaces.

These modules collectively constitute an unsupervised learning framework that innovatively merges masked image modeling with deep contrastive clustering.

Empirical Validation

The empirical results provide compelling evidence of PICI's effectiveness. PICI was benchmarked across six diverse image datasets, where it demonstrated considerable improvements over state-of-the-art methods under a variety of standard clustering metrics. The comprehensive experimental analysis solidified PICI's standing, with significant performance gains reported.

Conclusion and Impact

The PICI approach reflects a successful attempt to tackle the inherent shortcomings of deep clustering methods. It pioneers the fusion of masked image modeling with contrastive clustering, resulting in an improved mechanism for representation learning. The publication of the source code complements the paper's contributory value to the field, inviting further exploration and enhancement from the research community. Overall, PICI sets a new benchmark in deep clustering, marrying the strengths of the Transformer with a nuanced understanding of data relationships at multiple scales.