Revealing the Proximate Long-Tail Distribution in Compositional Zero-Shot Learning (2312.15923v1)
Abstract: Compositional Zero-Shot Learning (CZSL) aims to transfer knowledge from seen state-object pairs to novel unseen pairs. In this process, visual bias caused by the diverse interrelationship of state-object combinations blurs their visual features, hindering the learning of distinguishable class prototypes. Prevailing methods concentrate on disentangling states and objects directly from visual features, disregarding potential enhancements that could arise from a data viewpoint. Experimentally, we unveil the results caused by the above problem closely approximate the long-tailed distribution. As a solution, we transform CZSL into a proximate class imbalance problem. We mathematically deduce the role of class prior within the long-tailed distribution in CZSL. Building upon this insight, we incorporate visual bias caused by compositions into the classifier's training and inference by estimating it as a proximate class prior. This enhancement encourages the classifier to acquire more discernible class prototypes for each composition, thereby achieving more balanced predictions. Experimental results demonstrate that our approach elevates the model's performance to the state-of-the-art level, without introducing additional parameters. Our code is available at \url{https://github.com/LanchJL/ProLT-CZSL}.
- Label-embedding for attribute-based classification. In CVPR, 819–826.
- Evaluation of output embeddings for fine-grained image classification. In CVPR, 2927–2936.
- A causal view of compositional zero-shot recognition. NeurIPS, 33: 1462–1473.
- Enriching word vectors with subword information. Transactions of the association for computational linguistics, 5: 135–146.
- DBSMOTE: density-based synthetic minority over-sampling technique. Applied Intelligence, 36: 664–684.
- Inferring analogous attributes. In CVPR, 200–207.
- Zero-shot logit adjustment. arXiv preprint arXiv:2204.11822.
- ImageNet: A large-scale hierarchical image database. In CVPR, 248–255.
- Devise: A deep visual-semantic embedding model. In NeurIPS, 2121–2129.
- Deep residual learning for image recognition. In CVPR, 770–778.
- Parts of recognition. Cognition, 18(1-3): 65–96.
- Detecting human-object interaction via fabricated compositional learning. In CVPR, 14646–14655.
- Discovering states and transformations in image collections. In CVPR, 1383–1391.
- The class imbalance problem: A systematic study. Intelligent data analysis, 6(5): 429–449.
- Mutual Balancing in State-Object Components for Compositional Zero-Shot Learning. arXiv preprint arXiv:2211.10647.
- Survey on deep learning with class imbalance. Journal of Big Data, 6(1): 1–54.
- Detecting human-object interactions with action co-occurrence priors. In ECCV, 718–736. Springer.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- Estimating mutual information. Physical review E, 69(6): 066138.
- Attribute-based classification for zero-shot visual object categorization. PAMI, 36(3): 453–465.
- Siamese Contrastive Embedding Network for Compositional Zero-Shot Learning. In CVPR, 9326–9335.
- Symmetry and group in attribute-object compositions. In CVPR, 11316–11325.
- Focal loss for dense object detection. In CVPR, 2980–2988.
- Large-scale long-tailed recognition in an open world. In CVPR, 2537–2546.
- Visual relationship detection with language priors. In ECCV, 852–869. Springer.
- Decomposed soft prompt guided fusion enhancing for compositional zero-shot learning. In CVPR, 23560–23569.
- Open world compositional zero-shot learning. In CVPR, 5222–5230.
- Learning graph embeddings for open world compositional zero-shot learning. PAMI.
- Long-tail learning via logit adjustment. arXiv preprint arXiv:2007.07314.
- Distributed representations of words and phrases and their compositionality. NeurIPS, 26.
- From red wine to red tomato: Composition with context. In CVPR, 1792–1801.
- Learning graph embeddings for compositional zero-shot learning. In CVPR, 953–962.
- Attributes as operators: factorizing unseen attribute-object compositions. In ECCV, 169–185.
- Rectified linear units improve restricted boltzmann machines. In ICML.
- Learning to compose soft prompts for compositional zero-shot learning. arXiv preprint arXiv:2204.03574.
- Neal, R. M. 2001. Annealed importance sampling. Statistics and computing, 11: 125–139.
- Relative attributes. In ICCV, 503–510. IEEE.
- Pytorch: An imperative style, high-performance deep learning library. NeurIPS, 32.
- Glove: Global vectors for word representation. In EMNLP, 1532–1543.
- Task-driven modular networks for zero-shot compositional learning. In ICCV, 3593–3602.
- Learning transferable visual models from natural language supervision. In ICML, 8748–8763. PMLR.
- Disentangling Visual Embeddings for Attributes and Objects. In CVPR, 13658–13667.
- Dropout: a simple way to prevent neural networks from overfitting. JMLR, 15(1): 1929–1958.
- Su, J. 2020. Mitigating class imbalances through mutual information ideology. https://spaces.ac.cn/archives/7615. Accessed: 2020-07-19.
- Unbiased scene graph generation from biased training. In CVPR, 3716–3725.
- Learning Conditional Attributes for Compositional Zero-Shot Learning. In CVPR, 11197–11206.
- Task-aware feature generation for zero-shot compositional learning. arXiv preprint arXiv:1906.04854.
- Zero-shot learning-the good, the bad and the ugly. In CVPR, 4582–4591.
- Learning unseen concepts via hierarchical decomposition and composition. In CVPR, 10248–10256.
- A decomposable causal view of compositional zero-shot learning. IEEE Transactions on Multimedia.
- Fine-grained visual comparisons with local learning. In CVPR, 192–199.