No Representation Rules Them All in Category Discovery

Published 28 Nov 2023 in cs.CV, cs.AI, cs.IT, cs.LG, and math.IT | (2311.17055v1)

Abstract: In this paper we tackle the problem of Generalized Category Discovery (GCD). Specifically, given a dataset with labelled and unlabelled images, the task is to cluster all images in the unlabelled subset, whether or not they belong to the labelled categories. Our first contribution is to recognize that most existing GCD benchmarks only contain labels for a single clustering of the data, making it difficult to ascertain whether models are using the available labels to solve the GCD task, or simply solving an unsupervised clustering problem. As such, we present a synthetic dataset, named 'Clevr-4', for category discovery. Clevr-4 contains four equally valid partitions of the data, i.e based on object shape, texture, color or count. To solve the task, models are required to extrapolate the taxonomy specified by the labelled set, rather than simply latching onto a single natural grouping of the data. We use this dataset to demonstrate the limitations of unsupervised clustering in the GCD setting, showing that even very strong unsupervised models fail on Clevr-4. We further use Clevr-4 to examine the weaknesses of existing GCD algorithms, and propose a new method which addresses these shortcomings, leveraging consistent findings from the representation learning literature to do so. Our simple solution, which is based on 'mean teachers' and termed $\mu$GCD, substantially outperforms implemented baselines on Clevr-4. Finally, when we transfer these findings to real data on the challenging Semantic Shift Benchmark (SSB), we find that $\mu$GCD outperforms all prior work, setting a new state-of-the-art. For the project webpage, see https://www.robots.ox.ac.uk/~vgg/data/clevr4/

Abstract PDF Upgrade to Chat

Citations (22)

View on Semantic Scholar

Summary

The paper introduces the Clevr-4 dataset to expose the limitations of pure unsupervised clustering in generalized category discovery.
The paper demonstrates that state-of-the-art unsupervised methods struggle across multiple valid partitions such as shape, texture, color, and count.
The paper proposes μGCD, a robust method inspired by mean-teachers, which outperforms baselines on both Clevr-4 and the Semantic Shift Benchmark.

Overview of "No Representation Rules Them All in Category Discovery"

The paper addresses the challenge of Generalized Category Discovery (GCD), which involves clustering unlabelled data using both known labels and the ability to discover new categories, an approximation of taxonomy extrapolation closely aligned with human cognitive processes. The authors initially critique existing GCD benchmarks for their simplistic labelling and subsequently introduce a synthetic dataset, Clevr-4, uniquely characterized by its capacity to offer four different partitions based on shape, texture, color, and count. This complexity mandates models to embrace a taxonomy extrapolation process instead of purely pursuing unsupervised clustering.

Key Contributions

Introduction of Clevr-4: This synthetic dataset is pivotal as it highlights the limitations of unsupervised clustering within the GCD problem. It offers equally valid partitions of the data, challenging models to extend beyond a single natural grouping and thus providing insights into biases and weaknesses in existing approaches to category discovery.
Limitations of Unsupervised Clustering: The paper demonstrates that even state-of-the-art unsupervised models fail to perform well across all Clevr-4 taxonomies. This underscores the inadequacy of unsupervised clustering and validates the sensitivity of Clevr-4 as a benchmark.
Proposal of the $\mu$ GCD Method: Building upon their insights from Clevr-4, the authors propose $\mu$ GCD, leveraging insights from representation learning paradigms like mean-teachers. The method introduces robustness against pseudo-labeling noise, outperforming previous baselines on both Clevr-4 and real datasets, such as the challenging Semantic Shift Benchmark (SSB).

Implications

The implications of this study are manifold, both theoretically and practically. Theoretically, the study challenges the assumption that unsupervised representations can suffice in discovering meaningful taxonomies without defined labels. Practically, the development of Clevr-4 opens the door for more comprehensive training and testing frameworks for GCD tasks, offering a reliable benchmark for new algorithms.

Numerical Results

The paper provides strong numerical evidence of the proposed method's superiority. On the Clevr-4 dataset, the $\mu$ GCD method demonstrated significant performance improvements across the board, with particularly notable gains in the more challenging texture and count taxonomies. Furthermore, $\mu$ GCD set new performance records on the SSB, significantly advancing the state of the art.

Future Directions

The future landscape of AI, particularly in GCD, could be significantly influenced by exploring more complex datasets akin to Clevr-4 and devising models that align closer with human-like categorization processes. Moreover, with the success of techniques derived from representation learning applied to GCD, future research could focus on fine-tuning these approaches for other related fields such as semi-supervised learning and disentanglement.

In summary, this paper presents a critical step forward in understanding and tackling generalized category discovery. Through rigorous datasets like Clevr-4 and innovative methods such as $\mu$ GCD, the authors provide a foundational framework that challenges current assumptions and offers a path towards more sophisticated image classification models.