CHiLS: Zero-Shot Image Classification with Hierarchical Label Sets (2302.02551v3)

Published 6 Feb 2023 in cs.CV and cs.LG

Abstract: Open vocabulary models (e.g. CLIP) have shown strong performance on zero-shot classification through their ability generate embeddings for each class based on their (natural language) names. Prior work has focused on improving the accuracy of these models through prompt engineering or by incorporating a small amount of labeled downstream data (via finetuning). However, there has been little focus on improving the richness of the class names themselves, which can pose issues when class labels are coarsely-defined and are uninformative. We propose Classification with Hierarchical Label Sets (or CHiLS), an alternative strategy for zero-shot classification specifically designed for datasets with implicit semantic hierarchies. CHiLS proceeds in three steps: (i) for each class, produce a set of subclasses, using either existing label hierarchies or by querying GPT-3; (ii) perform the standard zero-shot CLIP procedure as though these subclasses were the labels of interest; (iii) map the predicted subclass back to its parent to produce the final prediction. Across numerous datasets with underlying hierarchical structure, CHiLS leads to improved accuracy in situations both with and without ground-truth hierarchical information. CHiLS is simple to implement within existing zero-shot pipelines and requires no additional training cost. Code is available at: https://github.com/acmi-lab/CHILS.

Citations (65)

View on Semantic Scholar

Summary

The paper presents CHiLS, a novel method that integrates hierarchical label sets into zero-shot image classification, leading to significant accuracy improvements.
It systematically generates subclasses using both existing hierarchies and GPT-3 queries, then maps predictions back to parent classes.
CHiLS shifts the paradigm by reducing reliance on prompt engineering and automating subclass generation for robust open vocabulary modeling.

CHiLS: Zero-Shot Image Classification with Hierarchical Label Sets

In the field of zero-shot image classification, the paper presents a novel approach to enhance the performance of open vocabulary models, specifically targeting CLIP. This approach, known as CHiLS, incorporates hierarchical label sets into the classification pipeline, offering a significant shift from existing methodologies that largely focus on prompt engineering to improve model accuracy.

Overview and Methodology

The paper identifies a critical issue in zero-shot classification: the limited richness of class names in datasets that have implicit semantic hierarchies. Traditional methods tend to overlook the semantic value embedded in class labels themselves. CHiLS aims to address this by systematically employing subclass generation and hierarchical mapping.

The methodology comprises three main steps:

Subclasses Generation: For each class, a set of subclasses is created using existing label hierarchies or generated via GPT-3 queries. This initiative underscores the utilization of semantic subtleties within class labels.
Zero-Shot Prediction: CHiLS applies the zero-shot CLIP procedure to these subclasses, treating them as primary labels for prediction.
Hierarchy Mapping: The predicted subclass is mapped back to its parent class to secure the final classification outcome.

This methodology is particularly advantageous in datasets exhibiting underlying hierarchical structures, as CHiLS enhances accuracy without necessitating additional training costs.

Empirical Evaluation

The paper thoroughly evaluates CHiLS across an array of image classification benchmarks, encompassing datasets with and without accessible hierarchical information. Notably, the approach yields substantial improvements in predictive accuracy—up to 30%—when true hierarchical information is employed.

The findings reveal that even when a synthetic hierarchy is generated using GPT-3, CHiLS consistently improves upon baseline (superclass) predictions. This signifies the model’s capability to function with merely abstract hierarchical classes, transcending typical prompt-based solutions by exploiting additional class representations.

Implications and Future Directions

The practical implications of CHiLS are manifold, particularly for practitioners utilizing CLIP as an out-of-the-box classifier. Its adaptability to automate subclass generation using GPT-3 makes it a potentially vital tool in scenarios where class labels are inadequately defined, fostering more informed and reliable predictions.

Theoretically, CHiLS suggests a paradigm shift in zero-shot learning by integrating hierarchical semantics directly into the predictive pipeline. Future work may explore more principled approaches to combine superclass and subclass predictions, alongside a deeper understanding of why CHiLS significantly boosts zero-shot accuracy. Additionally, the method could be extended beyond image classification, possibly allowing its application across broader zero-shot tasks in AI.

In conclusion, CHiLS exemplifies a strategic augmentation to zero-shot classification, leveraging hierarchical label sets to bolster accuracy, and invites further exploration into hierarchical semantics in AI.

PDF Markdown

Related Papers

GitHub

GitHub - acmi-lab/CHILS: Code and results accompanying our paper titled CHiLS: Zero-Shot Image Classification with Hierarchical Label Sets (54 stars)