The Vendi Score: A Diversity Evaluation Metric for Machine Learning (2210.02410v2)
Abstract: Diversity is an important criterion for many areas of ML, including generative modeling and dataset curation. However, existing metrics for measuring diversity are often domain-specific and limited in flexibility. In this paper, we address the diversity evaluation problem by proposing the Vendi Score, which connects and extends ideas from ecology and quantum statistical mechanics to ML. The Vendi Score is defined as the exponential of the Shannon entropy of the eigenvalues of a similarity matrix. This matrix is induced by a user-defined similarity function applied to the sample to be evaluated for diversity. In taking a similarity function as input, the Vendi Score enables its user to specify any desired form of diversity. Importantly, unlike many existing metrics in ML, the Vendi Score does not require a reference dataset or distribution over samples or labels, it is therefore general and applicable to any generative model, decoding algorithm, and dataset from any domain where similarity can be defined. We showcase the Vendi Score on molecular generative modeling where we found it addresses shortcomings of the current diversity metric of choice in that domain. We also applied the Vendi Score to generative models of images and decoding algorithms of text where we found it confirms known results about diversity in those domains. Furthermore, we used the Vendi Score to measure mode collapse, a known shortcoming of generative adversarial networks (GANs). In particular, the Vendi Score revealed that even GANs that capture all the modes of a labeled dataset can be less diverse than the original dataset. Finally, the interpretability of the Vendi Score allowed us to diagnose several benchmark ML datasets for diversity, opening the door for diversity-informed data augmentation.
- Adelman, M. A. (1969). Comment on the "H" concentration measure as a numbers-equivalent. The Review of economics and statistics, pages 99–101.
- GILBO: one metric to measure them all. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, pages 7037–7046.
- Implicit regularization in deep matrix factorization. In Advances in Neural Information Processing Systems.
- Do GANs actually learn the distribution? some theory and empirics. In International Conference on Learning Representations.
- Bach, F. (2022). Information theory with kernel methods. arXiv preprint arXiv:2202.08545.
- Geometry of quantum states: an introduction to quantum entanglement. Cambridge university press.
- Benhenda, M. (2017). ChemGAN challenge for drug discovery: can AI reproduce natural chemical diversity? arXiv preprint arXiv:1708.08227.
- Bird, S. (2006). Nltk: The natural language toolkit. In Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions, pages 69–72.
- Child, R. (2020). Very deep VAEs generalize autoregressive models and can outperform them on images. arXiv preprint arXiv:2011.10650.
- Eval all, trust a few, do wrong to none: Comparing sentence generation models. arXiv preprint arXiv:1804.07972.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
- Diffusion models beat GANs on image synthesis. Advances in Neural Information Processing Systems, 34:8780–8794.
- Prescribed generative adversarial networks. arXiv preprint arXiv:1910.04302.
- Unsupervised quality estimation for neural machine translation. Transactions of the Association for Computational Linguistics, 8:539–555.
- SimCSE: Simple contrastive learning of sentence embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6894–6910.
- Densely connected normalizing flows. Advances in Neural Information Processing Systems, 34:23968–23982.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30.
- Hill, M. O. (1973). Diversity and Evenness: A Unifying Notation and Its Consequences. Ecology, 54(2):427–432.
- Probability product kernels. The Journal of Machine Learning Research, 5:819–844.
- Jost, L. (2006). Entropy and Diversity. Oikos, 113(2):363–375.
- A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410.
- Analyzing and improving the image quality of StyleGAN. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8110–8119.
- The multilingual Amazon reviews corpus. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.
- Krizhevsky, A. (2009). Learning multiple layers of features from tiny images. Technical report.
- Determinantal point processes for machine learning. Foundations and Trends® in Machine Learning, 5(2–3):123–286.
- Multiple importance sampling elbo and deep ensembles of variational approximations. In International Conference on Artificial Intelligence and Statistics, pages 10687–10702. PMLR.
- Improved precision and recall metric for assessing generative models. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, pages 3927–3936.
- Autoregressive Image Generation using Residual Quantization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11523–11532.
- Leinster, T. (2021). Entropy and Diversity: The Axiomatic Approach. Cambridge University Press.
- Measuring Diversity: The Importance of Species Similarity. Ecology, 93(3):477–489.
- A diversity-promoting objective function for neural conversation models. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 110–119.
- Microsoft COCO: Common objects in context. In European conference on computer vision, pages 740–755. Springer.
- Diverse image generation via self-conditioned gans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
- Deep learning face attributes in the wild. In International Conference on Computer Vision.
- Unrolled generative adversarial networks. In International Conference on Learning Representations.
- Diversity and inclusion metrics in subset selection. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, pages 117–123.
- Reliable fidelity and diversity metrics for generative models. In International Conference on Machine Learning, pages 7176–7185. PMLR.
- Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162–8171. PMLR.
- BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
- On aliased resizing and surprising subtleties in gan evaluation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11410–11420.
- Diversity as a Concept and Its Measurement. Journal of the American Statistical Association, 77(379):548–561.
- Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models. Frontiers in Pharmacology.
- Gait: A geometric approach to information theory. In International Conference on Artificial Intelligence and Statistics, pages 2601–2611. PMLR.
- Fréchet ChemNet distance: a metric for generative models for molecules in drug discovery. Journal of chemical information and modeling, 58(9):1736–1741.
- Confronting the challenge of quality diversity. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, pages 967–974.
- Unsupervised representation learning with deep convolutional generative adversarial networks. In arXiv:1511.06434.
- The effective rank: A measure of effective dimensionality. In 2007 15th European signal processing conference, pages 606–610. IEEE.
- ImageNet large scale visual recognition challenge. International journal of computer vision, 115(3):211–252.
- Assessing generative models via precision and recall. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, pages 5234–5243.
- Improved Techniques for Training GANs. In Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I., and Garnett, R., editors, Advances in Neural Information Processing Systems 29, pages 2234–2242. Curran Associates, Inc.
- Machine learning for scent: Learning generalizable perceptual representations of small molecules. arXiv preprint arXiv:1910.10685.
- Mixture models for diverse machine translation: Tricks of the trade. In International conference on machine learning, pages 5719–5728. PMLR.
- Revisiting precision and recall definition for generative model evaluation. In International Conference on Machine Learning (ICML).
- Denoising diffusion implicit models. In International Conference on Learning Representations.
- VEEGAN: reducing mode collapse in GANs using implicit variational learning. In Advances in Neural Information Processing Systems.
- Rethinking the Inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826.
- On the correlation of word embedding evaluation metrics. In Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 4789–4797.
- Diverse beam search for improved description of complex scenes. In Proceedings of the AAAI Conference on Artificial Intelligence.
- Von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and computing, 17(4):395–416.
- Wilde, M. M. (2013). Quantum information theory. Cambridge University Press.
- A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122.
- Using the nyström method to speed up kernel machines. Advances in Neural Information Processing Systems, 13.
- Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771.
- Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. In arXiv:1708.07747.
- How much of the chemical space has been explored? selecting the right exploration measure for drug discovery. In ICML 2022 2nd AI for Science Workshop.
- LSUN: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365.