Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Comparing Foundation Models using Data Kernels (2305.05126v3)

Published 9 May 2023 in cs.LG, cs.AI, and stat.ME

Abstract: Recent advances in self-supervised learning and neural network scaling have enabled the creation of large models, known as foundation models, which can be easily adapted to a wide range of downstream tasks. The current paradigm for comparing foundation models involves evaluating them with aggregate metrics on various benchmark datasets. This method of model comparison is heavily dependent on the chosen evaluation metric, which makes it unsuitable for situations where the ideal metric is either not obvious or unavailable. In this work, we present a methodology for directly comparing the embedding space geometry of foundation models, which facilitates model comparison without the need for an explicit evaluation metric. Our methodology is grounded in random graph theory and enables valid hypothesis testing of embedding similarity on a per-datum basis. Further, we demonstrate how our methodology can be extended to facilitate population level model comparison. In particular, we show how our framework can induce a manifold of models equipped with a distance function that correlates strongly with several downstream metrics. We remark on the utility of this population level model comparison as a first step towards a taxonomic science of foundation models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Statistical inference on random dot product graphs: a survey. 2017. doi: 10.48550/ARXIV.1709.05454. URL https://arxiv.org/abs/1709.05454.
  2. Discovering underlying dynamics in time series of networks. arXiv preprint arXiv:2205.06877, 2022.
  3. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pp.  610–623, 2021.
  4. On the opportunities and risks of foundation models. CoRR, abs/2108.07258, 2021. URL https://arxiv.org/abs/2108.07258.
  5. Modern multidimensional scaling: Theory and applications. Springer Science & Business Media, 2005.
  6. Language models are few-shot learners. CoRR, abs/2005.14165, 2020. URL https://arxiv.org/abs/2005.14165.
  7. Mental state classification using multi-graph features. Frontiers in Human Neuroscience, 16:930291, 2022.
  8. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp.  4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1423. URL https://aclanthology.org/N19-1423.
  9. Dbpedia nif: Open, large-scale and multilingual knowledge extraction corpus. arXiv preprint arXiv:1812.10315, 2018.
  10. Metric recovery from directed unweighted graphs. In Guy Lebanon and S. V. N. Vishwanathan (eds.), Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, volume 38 of Proceedings of Machine Learning Research, pp.  342–350, San Diego, California, USA, 09–12 May 2015. PMLR. URL https://proceedings.mlr.press/v38/hashimoto15.html.
  11. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  12. Dynamic silos: Modularity in intra-organizational communication networks during the covid-19 pandemic. 2021.
  13. A central limit theorem for an omnibus embedding of multiple random graphs and implications for multiscale network inference, 2017. URL https://arxiv.org/abs/1705.09355.
  14. Holistic evaluation of language models, 2022. URL https://arxiv.org/abs/2211.09110.
  15. Density estimation from unweighted k-nearest neighbor graphs: A roadmap. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 1, NIPS’13, pp.  225–233, Red Hook, NY, USA, 2013. Curran Associates Inc.
  16. Umap: Uniform manifold approximation and projection for dimension reduction, 2018. URL https://arxiv.org/abs/1802.03426.
  17. Mteb: Massive text embedding benchmark, 2022. URL https://arxiv.org/abs/2210.07316.
  18. Manifold matching: Joint optimization of fidelity and commensurability. Brazilian Journal of Probability and Statistics, 27(3):377–400, 2013.
  19. Learning transferable visual models from natural language supervision. CoRR, abs/2103.00020, 2021. URL https://arxiv.org/abs/2103.00020.
  20. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084, 2019.
  21. Masked language model scoring. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2020. doi: 10.18653/v1/2020.acl-main.240. URL https://doi.org/10.18653%2Fv1%2F2020.acl-main.240.
  22. Global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319–2323, 2000.
  23. Warren S Torgerson. Multidimensional scaling: I. theory and method. Psychometrika, 17(4):401–419, 1952.
  24. Chain of thought prompting elicits reasoning in large language models. CoRR, abs/2201.11903, 2022. URL https://arxiv.org/abs/2201.11903.
  25. The connectome of an insect brain. Science, 379(6636):eadd9330, 2023.
  26. Huggingface’s transformers: State-of-the-art natural language processing, 2020.
  27. Random dot product graph models for social networks. In Workshop on Algorithms and Models for the Web-Graph, 2007.
  28. Data-centric artificial intelligence: A survey. arXiv preprint arXiv:2303.10158, 2023.
Citations (5)

Summary

We haven't generated a summary for this paper yet.