OTTER: Effortless Label Distribution Adaptation of Zero-shot Models (2404.08461v2)
Abstract: Popular zero-shot models suffer due to artifacts inherited from pretraining. One particularly detrimental issue, caused by unbalanced web-scale pretraining data, is mismatched label distribution. Existing approaches that seek to repair the label distribution are not suitable in zero-shot settings, as they have mismatching requirements, such as needing access to labeled downstream task data or knowledge of the true label balance in the pretraining distribution. We sidestep these challenges and introduce a simple and lightweight approach to adjust pretrained model predictions via optimal transport. Our technique requires only an estimate of the label distribution of a downstream task. Theoretically, we characterize the improvement produced by our procedure under certain mild conditions and provide bounds on the error caused by misspecification. Empirically, we validate our method in a wide array of zero-shot image and text classification tasks, improving accuracy by 4.8% and 15.9% on average, and beating baselines like prior matching -- often by significant margins -- in 17 out of 21 datasets.
- Zero-shot robustification of zero-shot models with foundation models. arXiv preprint arXiv:2309.04344, 2023.
- Evaluating clip: towards characterization of broader capabilities and downstream implications. arXiv preprint arXiv:2108.02818, 2021.
- Maximum likelihood with bias-corrected calibration is hard-to-beat at label shift adaptation. In International Conference on Machine Learning, pages 222–232. PMLR, 2020.
- More context, less distraction: Visual classification by inferring and conditioning on contextual attributes. arXiv preprint arXiv:2308.01313, 2023.
- On the sensitivity analysis of hoffman constants for systems of linear inequalities. SIAM Journal on Optimization, 12(4):913–927, 2002.
- Regularized learning for domain adaptation under label shifts. In International Conference on Learning Representations, 2019.
- Nuanced metrics for measuring unintended bias with real data for text classification. In Companion proceedings of the 2019 world wide web conference, pages 491–500, 2019.
- Food-101–mining discriminative components with random forests. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI 13, pages 446–461. Springer, 2014.
- Unified optimal transport framework for universal domain adaptation. Advances in Neural Information Processing Systems, 35:29512–29524, 2022.
- Semantic concept discovery for large-scale zero-shot event detection. In Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI’15, page 2234–2240. AAAI Press, 2015. ISBN 9781577357384.
- Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321–357, 2002.
- Debiasing vision-language models via biased prompts. arXiv preprint arXiv:2302.00070, 2023.
- Describing textures in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3606–3613, 2014.
- An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 215–223. JMLR Workshop and Conference Proceedings, 2011.
- Zero-shot video retrieval using content and concepts. In Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, CIKM ’13, page 1857–1860, New York, NY, USA, 2013. Association for Computing Machinery. ISBN 9781450322638. doi: 10.1145/2505515.2507880. URL https://doi.org/10.1145/2505515.2507880.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Multi-dimensional gender bias classification. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 314–331, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.23. URL https://www.aclweb.org/anthology/2020.emnlp-main.23.
- Measuring and mitigating unintended bias in text classification. 2018.
- One-shot learning of object categories. IEEE transactions on pattern analysis and machine intelligence, 28(4):594–611, 2006.
- Learning from imbalanced data sets, volume 10. Springer, 2018.
- A unified view of label shift estimation. Advances in Neural Information Processing Systems, 33:3290–3300, 2020.
- Caltech-256 object category dataset. 2007.
- Learning to re-weight examples with optimal transport for imbalanced classification. Advances in Neural Information Processing Systems, 35:25517–25530, 2022.
- Zero-shot end-to-end spoken language understanding via cross-modal selective self-training. arXiv preprint arXiv:2305.12793, 2023.
- Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(7):2217–2226, 2019.
- The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8340–8349, 2021.
- Alan J Hoffman. On approximate solutions of systems of linear inequalities. Journal of Research of the National Bureau of Standards, 49(4), 1952.
- Improving zero-shot models with label distribution priors. arXiv preprint arXiv:2212.00784, 2022.
- 3d object representations for fine-grained categorization. In Proceedings of the IEEE international conference on computer vision workshops, pages 554–561, 2013.
- Learning multiple layers of features from tiny images. 2009.
- Detecting and correcting for label shift with black box predictors. In International conference on machine learning, pages 3122–3130. PMLR, 2018a.
- Detecting and correcting for label shift with black box predictors. In International conference on machine learning, pages 3122–3130. PMLR, 2018b.
- Visual instruction tuning. arXiv preprint arXiv:2304.08485, 2023.
- Mitigating word bias in zero-shot prompt-based classifiers. arXiv preprint arXiv:2309.04992, 2023.
- Decoupled weight decay regularization. In International Conference on Learning Representations, 2018.
- Hatexplain: A benchmark dataset for explainable hate speech detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 14867–14875, 2021.
- Visual classification via description from large language models. arXiv preprint arXiv:2210.07183, 2022.
- Zero-shot natural language video localization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1470–1479, 2021.
- Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pages 188–197, 2019.
- Automated flower classification over a large number of classes. In 2008 Sixth Indian conference on computer vision, graphics & image processing, pages 722–729. IEEE, 2008.
- Chils: Zero-shot image classification with hierarchical label sets. In International Conference on Machine Learning, pages 26342–26362. PMLR, 2023.
- Cats and dogs. In 2012 IEEE conference on computer vision and pattern recognition, pages 3498–3505. IEEE, 2012.
- New characterizations of hoffman constants for systems of linear constraints. Mathematical Programming, 187:79–109, 2021.
- Javier F Peña. An easily computable upper bound on the hoffman constant for homogeneous inequality systems. Computational Optimization and Applications, 87(1):323–335, 2024.
- Optimal transport for long-tailed recognition with learnable cost matrix. In International Conference on Learning Representations, 2021.
- Computational optimal transport: With applications to data science. Foundations and Trends® in Machine Learning, 11(5-6):355–607, 2019.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Learning deep representations of fine-grained visual descriptions. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 49–58, Los Alamitos, CA, USA, jun 2016. IEEE Computer Society. doi: 10.1109/CVPR.2016.13. URL https://doi.ieeecomputersociety.org/10.1109/CVPR.2016.13.
- Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084, 2019.
- Joint image-text representation by gaussian visual-semantic embedding. In Proceedings of the 24th ACM International Conference on Multimedia, MM ’16, page 207–211, New York, NY, USA, 2016. Association for Computing Machinery. ISBN 9781450336031. doi: 10.1145/2964284.2967212. URL https://doi.org/10.1145/2964284.2967212.
- Geometry-aware adaptation for pretrained models. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Stephen M Robinson. Bounds for error in the solution set of a perturbed linear program. Linear Algebra and its applications, 6:69–81, 1973.
- Clip for all things zero-shot sketch-based image retrieval, fine-grained or not. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2765–2775, 2023.
- Pic2word: Mapping pictures to words for zero-shot composed image retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19305–19314, 2023.
- Filippo Santambrogio. Optimal transport for applied mathematicians. Birkäuser, NY, 55(58-63):94, 2015.
- Prompt-guided zero-shot anomaly action recognition using pretrained deep skeleton features. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6471–6480, 2023.
- Relative entropic optimal transport: a (prior-aware) matching perspective to (unbalanced) classification. Advances in Neural Information Processing Systems, 36, 2024.
- Domain adaptation with conditional distribution matching and generalized label shift. Advances in Neural Information Processing Systems, 33:19276–19289, 2020.
- Sinkhorn label allocation: Semi-supervised classification via annealed self-training. In International Conference on Machine Learning, pages 10065–10075. PMLR, 2021.
- Cost-sensitive learning methods for imbalanced data. In The 2010 International joint conference on neural networks (IJCNN), pages 1–8. IEEE, 2010.
- Unbiased look at dataset bias. In CVPR 2011, pages 1521–1528, 2011. doi: 10.1109/CVPR.2011.5995347.
- The caltech-ucsd birds-200-2011 dataset. 2011.
- Solar: Sinkhorn label refinery for imbalanced partial-label learning. Advances in Neural Information Processing Systems, 35:8104–8117, 2022.
- Learning robust global representations by penalizing local predictive power. Advances in Neural Information Processing Systems, 32, 2019.
- Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652, 2021.
- Fair and optimal classification via post-processing. 2023.
- Sun database: Large-scale scene recognition from abbey to zoo. In 2010 IEEE computer society conference on computer vision and pattern recognition, pages 3485–3492. IEEE, 2010.
- A model of zero-shot learning of spoken language understanding. pages 244–249, 01 2015. doi: 10.18653/v1/D15-1027.
- P22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPTot: Progressive partial optimal transport for deep imbalanced clustering. arXiv preprint arXiv:2401.09266, 2024.
- Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022.
- Changho Shin (11 papers)
- Jitian Zhao (4 papers)
- Sonia Cromp (6 papers)
- Harit Vishwakarma (15 papers)
- Frederic Sala (55 papers)