Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Geometry-Aware Adaptation for Pretrained Models (2307.12226v2)

Published 23 Jul 2023 in cs.LG, cs.AI, and stat.ML

Abstract: Machine learning models -- including prominent zero-shot models -- are often trained on datasets whose labels are only a small proportion of a larger label space. Such spaces are commonly equipped with a metric that relates the labels via distances between them. We propose a simple approach to exploit this information to adapt the trained model to reliably predict new classes -- or, in the case of zero-shot prediction, to improve its performance -- without any additional training. Our technique is a drop-in replacement of the standard prediction rule, swapping argmax with the Fr\'echet mean. We provide a comprehensive theoretical analysis for this approach, studying (i) learning-theoretic results trading off label space diameter, sample complexity, and model dimension, (ii) characterizations of the full range of scenarios in which it is possible to predict any unobserved class, and (iii) an optimal active learning-like next class selection procedure to obtain optimal training classes for when it is not possible to predict the entire range of unobserved classes. Empirically, using easily-available external metrics, our proposed approach, Loki, gains up to 29.7% relative improvement over SimCLR on ImageNet and scales to hundreds of thousands of classes. When no such metric is available, Loki can use self-derived metrics from class embeddings and obtains a 10.5% improvement on pretrained zero-shot models such as CLIP.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Predicting Structured Data. The MIT Press, July 2007. ISBN 978-0-262-25569-1. doi: 10.7551/mitpress/7443.001.0001. URL https://doi.org/10.7551/mitpress/7443.001.0001.
  2. Hierarchy-based image embeddings for semantic image retrieval. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 638–647. IEEE, 2019.
  3. Geometry of the space of phylogenetic trees. Advances in Applied Mathematics, 27(4):733–767, 2001. ISSN 0196-8858. doi: https://doi.org/10.1006/aama.2001.0759. URL https://www.sciencedirect.com/science/article/pii/S0196885801907596.
  4. Fast mean estimation with sub-gaussian rates. In Alina Beygelzimer and Daniel Hsu, editors, Proceedings of the Thirty-Second Conference on Learning Theory, volume 99 of Proceedings of Machine Learning Research, pages 786–806. PMLR, 25–28 Jun 2019. URL https://proceedings.mlr.press/v99/cherapanamjeri19b.html.
  5. A consistent regularization approach for structured prediction. In Advances in Neural Information Processing Systems 30 (NIPS 2016), volume 30, 2016.
  6. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. doi: 10.1109/CVPR.2009.5206848.
  7. Describing objects by their attributes. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 1778–1785, 2009. doi: 10.1109/CVPR.2009.5206772.
  8. Maurice R. Fréchet. Les éléments aléatoires de nature quelconque dans un espace distancié. Annales de l’institut Henri Poincaré, 1948.
  9. Devise: A deep visual-semantic embedding model. In C.J. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013. URL https://proceedings.neurips.cc/paper/2013/file/7cce53cf90577442771720a370c3c723-Paper.pdf.
  10. I Know the Relationships: Zero-Shot Action Recognition via Two-Stream Graph Convolutional Networks and Knowledge Graphs. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01):8303–8311, July 2019. doi: 10.1609/aaai.v33i01.33018303. URL https://ojs.aaai.org/index.php/AAAI/article/view/4843.
  11. SimCSE: Simple contrastive learning of sentence embeddings. In Empirical Methods in Natural Language Processing (EMNLP), 2021.
  12. Graph structured prediction energy networks. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper/2019/file/ea6979872125d5acbac6068f186a0359-Paper.pdf.
  13. On calibration of modern neural networks. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1321–1330. PMLR, 06–11 Aug 2017. URL https://proceedings.mlr.press/v70/guo17a.html.
  14. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016. doi: 10.1109/CVPR.2016.90.
  15. Loss minimization and parameter estimation with heavy tails. Journal of Machine Learning Research, 17(18):1–40, 2016. URL http://jmlr.org/papers/v17/14-273.html.
  16. Scaling up visual and vision-language representation learning with noisy text supervision. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 4904–4916. PMLR, 18–24 Jul 2021. URL https://proceedings.mlr.press/v139/jia21b.html.
  17. Rethinking knowledge graph propagation for zero-shot learning. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11479–11488, 2018.
  18. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=SJU4ayYgl.
  19. A structured prediction approach for label ranking. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018. URL https://proceedings.neurips.cc/paper/2018/file/b3dd760eb02d2e669c604f6b2f1e803f-Paper.pdf.
  20. Learning multiple layers of features from tiny images. 2009.
  21. Calibrated structured prediction. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015. URL https://proceedings.neurips.cc/paper/2015/file/52d2752b150f9c35ccb6869cbf074e48-Paper.pdf.
  22. Learning to detect unseen object classes by between-class attribute transfer. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 951–958, 2009. doi: 10.1109/CVPR.2009.5206594.
  23. Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(3):453–465, 2014. doi: 10.1109/TPAMI.2013.140.
  24. M. Lerasle and R. I. Oliveira. Robust empirical mean estimators, 2011.
  25. Logic-guided semantic representation learning for zero-shot relation classification. In Proceedings of the 28th International Conference on Computational Linguistics, pages 2967–2978, Barcelona, Spain (Online), December 2020. International Committee on Computational Linguistics. doi: 10.18653/v1/2020.coling-main.265. URL https://aclanthology.org/2020.coling-main.265.
  26. Differentiating through the fréchet mean. In Proceedings of the 37th International Conference on Machine Learning, pages 6393–6403, 2020.
  27. Stanislav Minsker. Geometric median and robust estimation in Banach spaces. Bernoulli, 21(4):2308 – 2335, 2015. doi: 10.3150/14-BEJ645. URL https://doi.org/10.3150/14-BEJ645.
  28. Zero-shot learning by convex combination of semantic embeddings. In Yoshua Bengio and Yann LeCun, editors, ICLR, 2014. URL http://dblp.uni-trier.de/db/conf/iclr/iclr2014.html#NorouziMBSSFCD13.
  29. Principal component analysis and the locus of the fréchet mean in the space of phylogenetic trees. Biometrika, 104(4):901–922, Dec 2017. ISSN 0006-3444 (Print); 1464-3510 (Electronic); 0006-3444 (Linking). doi: 10.1093/biomet/asx047.
  30. Lshtc: A benchmark for large-scale text classification. ArXiv, abs/1503.08581, 2015.
  31. Fréchet regression for random objects with euclidean predictors. The Annals of Statistics, 2016.
  32. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
  33. Injecting logical background knowledge into embeddings for relation extraction. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1119–1129, Denver, Colorado, May–June 2015. Association for Computational Linguistics. doi: 10.3115/v1/N15-1118. URL https://aclanthology.org/N15-1118.
  34. Improving Zero-Shot Learning Baselines with Commonsense Knowledge. Cognitive Computation, 14(6):2212–2222, November 2022. ISSN 1866-9964. doi: 10.1007/s12559-022-10044-0. URL https://doi.org/10.1007/s12559-022-10044-0.
  35. Manifold structured prediction. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018a. URL https://proceedings.neurips.cc/paper/2018/file/f6185f0ef02dcaec414a3171cd01c697-Paper.pdf.
  36. Manifold structured prediction. In Advances in Neural Information Processing Systems 32 (NeurIPS 2018), volume 32, 2018b.
  37. Universalizing weak supervision. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=YpPiNigTzMT.
  38. Zero-shot learning through cross-modal transfer. In C.J. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013. URL https://proceedings.neurips.cc/paper/2013/file/2d6cc4b2d139a53512fb8cbb3086ae2e-Paper.pdf.
  39. Label-free supervision of neural networks with physics and domain knowledge. In Thirty-First AAAI Conference on Artificial Intelligence, 2017.
  40. Roman Vershynin. Introduction to the non-asymptotic analysis of random matrices. In Compressed Sensing, 2010. URL https://api.semanticscholar.org/CorpusID:133440.
  41. Lifting weak supervision to structured prediction. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=Cntmos_Ndf0.
  42. Zero-shot recognition via semantic embeddings and knowledge graphs. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6857–6866, Los Alamitos, CA, USA, jun 2018. IEEE Computer Society. doi: 10.1109/CVPR.2018.00717. URL https://doi.ieeecomputersociety.org/10.1109/CVPR.2018.00717.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com