Self-labelling via simultaneous clustering and representation learning (1911.05371v3)

Published 13 Nov 2019 in cs.CV and cs.NE

Abstract: Combining clustering and representation learning is one of the most promising approaches for unsupervised learning of deep neural networks. However, doing so naively leads to ill posed learning problems with degenerate solutions. In this paper, we propose a novel and principled learning formulation that addresses these issues. The method is obtained by maximizing the information between labels and input data indices. We show that this criterion extends standard crossentropy minimization to an optimal transport problem, which we solve efficiently for millions of input images and thousands of labels using a fast variant of the Sinkhorn-Knopp algorithm. The resulting method is able to self-label visual data so as to train highly competitive image representations without manual labels. Our method achieves state of the art representation learning performance for AlexNet and ResNet-50 on SVHN, CIFAR-10, CIFAR-100 and ImageNet and yields the first self-supervised AlexNet that outperforms the supervised Pascal VOC detection baseline. Code and models are available.

Citations (730)

View on Semantic Scholar

Summary

The paper presents a novel integration of clustering and representation learning via optimal transport to prevent degenerate solutions.
It employs a fast Sinkhorn-Knopp algorithm to iteratively optimize pseudo-label assignments, achieving state-of-the-art performance on benchmark datasets.
Extensive ablation studies demonstrate the method's robustness to class imbalance and its effectiveness in enhancing unsupervised visual representations.

Self-Labelling via Simultaneous Clustering and Representation Learning

The pervasive challenge of learning from unlabelled data sets remains a significant hurdle in advancing unsupervised learning methodologies. The paper Self-Labelling via Simultaneous Clustering and Representation Learning by Yuki M. Asano, Christian Rupprecht, and Andrea Vedaldi offers a pioneering strategy whereby simultaneous clustering and representation learning are optimized cohesively to address the degenerate solutions often arising in naive approaches.

Core Contribution

The paper delineates a principled approach that hinges on maximizing the information between labels and input data indices to derive meaningful and non-degenerate label assignments. This criterion extends the standard cross-entropy minimization into an optimal transport problem, which is computationally addressed using an efficient variant of the Sinkhorn-Knopp algorithm. The resultant method not only mitigates degenerate solutions but also facilitates effective self-labelling of visual data hence enabling the training of competitive image representations without manual labeling.

Methodological Framework

Problem Formulation:

The problem statement is clear: combining clustering with representation learning without falling into degenerate traps. The authors frame this as an optimal transport problem, succinctly put as: $\min_{Q \in U(r,c)} \langle Q, -\log P \rangle$ where $Q$ is a matrix representing the pseudo-labels adhering to an equipartition constraint.

Sinkhorn-Knopp Solution:

The solution employs a fast version of the Sinkhorn-Knopp algorithm to solve the transport problem iteratively. This algorithm retains computational efficiency $\mathcal{O}(NK)$ due to its matrix-vector multiplication steps.

Empirical Validation and Performance

The method's efficacy is validated across multiple datasets including SVHN, CIFAR-10, CIFAR-100, and ImageNet. Key highlights include:

Performance on ImageNet:
- The proposed method achieves state-of-the-art performance in unsupervised representation learning for AlexNet and ResNet-50. Notably, the self-supervised AlexNet outperforms the supervised baseline on the Pascal VOC detection task.
Ablation Studies:
- Rigorous ablation studies establish the importance of iterative self-labelling, the choice of clustering hyperparameters, and the multi-head clustering approach. Specifically, using multiple clustering heads (10) significantly boosts performance, indicating the advantage of learning from multiple clustering perspectives.
Robustness to Imbalance:
- The method demonstrates robustness to class-imbalance scenarios, where the pseudo-label solutions retain competitive performance even in heavily imbalanced datasets.

Theoretical and Practical Implications

Theoretical Insights:

The paper provides a theoretical interpretation where the problem is viewed from an information-theoretic lens. Maximizing mutual information between data indices and labels ensures avoidance of degenerate clustering solutions.

Practical Deployments:

From a practical standpoint, the ability to self-label and then use these labels for standard cross-entropy training allows for rapid adaption across different architectures and tasks. This is immediately beneficial for domains where labeling is costly or infeasible.

Future Directions

Future developments stemming from this research include:

Extension to Diverse Data Types:

The approach could be generalized beyond visual data to text or multimodal datasets, potentially employing architecture-specific modifications to the Sinkhorn-Knopp component.

Optimal Transport Alternatives:

Exploring alternative algorithms to Sinkhorn-Knopp that further optimize computational efficiency or accommodate larger scales can be a potential research direction.

Integration with Other Self-Supervised Techniques:

Integrating self-labelling with other forms of self-supervision, such as contrastive learning, could yield even more robust representations, tailoring the clustering to explicitly learned transformations.

The carefully designed combination of clustering and representation learning in the described paper thus stands as a notable advancement in the repertoire of unsupervised learning strategies, showcasing substantial improvements in computational efficiency, model performance, and practical deployment.

PDF Markdown