Categorical Flow Matching on Statistical Manifolds (2405.16441v2)
Abstract: We introduce Statistical Flow Matching (SFM), a novel and mathematically rigorous flow-matching framework on the manifold of parameterized probability measures inspired by the results from information geometry. We demonstrate the effectiveness of our method on the discrete generation problem by instantiating SFM on the manifold of categorical distributions whose geometric properties remain unexplored in previous discrete generative models. Utilizing the Fisher information metric, we equip the manifold with a Riemannian structure whose intrinsic geometries are effectively leveraged by following the shortest paths of geodesics. We develop an efficient training and sampling algorithm that overcomes numerical stability issues with a diffeomorphism between manifolds. Our distinctive geometric perspective of statistical manifolds allows us to apply optimal transport during training and interpret SFM as following the steepest direction of the natural gradient. Unlike previous models that rely on variational bounds for likelihood estimation, SFM enjoys the exact likelihood calculation for arbitrary probability measures. We manifest that SFM can learn more complex patterns on the statistical manifold where existing models often fail due to strong prior assumptions. Comprehensive experiments on real-world generative tasks ranging from image, text to biological domains further demonstrate that SFM achieves higher sampling quality and likelihood than other discrete diffusion or flow-based models.
- Protein generation with evolutionary diffusion: sequence is all you need. bioRxiv, pages 2023–09, 2023.
- S.-I. Amari. Natural gradient works efficiently in learning. Neural computation, 10(2):251–276, 1998.
- S.-i. Amari and A. Cichocki. Adaptive blind signal processing-neural network approaches. Proceedings of the IEEE, 86(10):2026–2048, 1998.
- S.-i. Amari and H. Nagaoka. Methods of information geometry, volume 191. American Mathematical Soc., 2000.
- C. Atkinson and A. F. Mitchell. Rao’s distance measure. Sankhyā: The Indian Journal of Statistics, Series A, pages 345–365, 1981.
- Structured denoising diffusion models in discrete state-spaces. Advances in Neural Information Processing Systems, 34:17981–17993, 2021.
- Dirichlet diffusion score model for biological sequence generation. In International Conference on Machine Learning, pages 1276–1301. PMLR, 2023.
- Information geometry, volume 64. Springer, 2017.
- Springer series in 8. 1993.
- Matching normalizing flows and probability paths on manifolds. arXiv preprint arXiv:2207.04711, 2022.
- Se (3)-stochastic flow matching for protein backbone generation. arXiv preprint arXiv:2310.02391, 2023.
- A continuous time framework for discrete denoising models. Advances in Neural Information Processing Systems, 35:28266–28279, 2022.
- Generative flows on discrete state-spaces: Enabling multimodal flows with applications to protein co-design. arXiv preprint arXiv:2402.04997, 2024.
- A sequence-based global map of regulatory activity for deciphering human genetics. Nature genetics, 54(7):940–949, 2022.
- R. T. Chen and Y. Lipman. Riemannian flow matching on general geometries. arXiv preprint arXiv:2302.03660, 2023.
- Analog bits: Generating discrete data using diffusion models with self-conditioning. arXiv preprint arXiv:2208.04202, 2022.
- Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860, 2019.
- Flow matching in latent space. arXiv preprint arXiv:2307.08698, 2023.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Minibatch optimal transport distances; analysis and applications. arXiv preprint arXiv:2101.01792, 2021.
- Riemannian Geometry. Universitext. Springer Berlin Heidelberg, 2004.
- A. Graves. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850, 2013.
- Bayesian flow networks. arXiv preprint arXiv:2308.07037, 2023.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- An atlas of human long non-coding rnas with accurate 5’ ends. Nature, 543(7644):199–204, 2017.
- Argmax flows and multinomial diffusion: Learning categorical distributions. Advances in Neural Information Processing Systems, 34:12454–12465, 2021.
- Latent space editing in transformer-based flow matching. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 2247–2255, 2024.
- M. F. Hutchinson. A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines. Communications in Statistics-Simulation and Computation, 18(3):1059–1076, 1989.
- Equivariant flow matching. Advances in Neural Information Processing Systems, 36, 2024.
- Learning multiple layers of features from tiny images. 2009.
- Fisher-rao geometry of dirichlet distributions. Differential Geometry and its Applications, 74:101702, 2021.
- Mnist handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2, 2010.
- A statistical manifold framework for point cloud data. In International Conference on Machine Learning, pages 12378–12402. PMLR, 2022.
- Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1925–1934, 2017.
- Flow matching for generative modeling. arXiv preprint arXiv:2210.02747, 2022.
- Discrete diffusion language modeling by estimating the ratios of the data distribution. arXiv preprint arXiv:2310.16834, 2023.
- Tess: Text-to-text self-conditioned simplex diffusion. arXiv preprint arXiv:2305.08379, 2023.
- M. Mahoney. Large text compression benchmark, 2011.
- J. Martens. New insights and perspectives on the natural gradient method. Journal of Machine Learning Research, 21(146):1–76, 2020.
- E. Mathieu and M. Nickel. Riemannian continuous normalizing flows. Advances in Neural Information Processing Systems, 33:2503–2515, 2020.
- On closed-form expressions for the fisher-rao distance. arXiv preprint arXiv:2304.14885, 2023.
- Improving mini-batch optimal transport via partial transportation. In International Conference on Machine Learning, pages 16656–16690. PMLR, 2022.
- Efficient natural gradient descent methods for large-scale pde-based optimization problems. SIAM Journal on Scientific Computing, 45(4):A1621–A1655, 2023.
- Modelling single-cell rna-seq trajectories on a flat statistical manifold. In NeurIPS 2023 AI for Science Workshop, 2023.
- Adaptive natural gradient learning algorithms for various stochastic models. Neural Networks, 13(7):755–764, 2000.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- C. R. Rao. Information and the accuracy attainable in the estimation of statistical parameters. In Breakthroughs in Statistics: Foundations and basic theory, pages 235–247. Springer, 1992.
- R. Salakhutdinov and I. Murray. On the quantitative analysis of deep belief networks. In Proceedings of the 25th international conference on Machine learning, pages 872–879. ACM, 2008.
- Masked language model scoring. arXiv preprint arXiv:1910.14659, 2019.
- Blackout diffusion: generative diffusion models in discrete-state spaces. In International Conference on Machine Learning, pages 9034–9059. PMLR, 2023.
- Y. Song and S. Ermon. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.
- Y. Song and S. Ermon. Improved techniques for training score-based generative models. Advances in neural information processing systems, 33:12438–12448, 2020.
- Equivariant flow matching with hybrid probability transport for 3d molecule generation. Advances in Neural Information Processing Systems, 36, 2024.
- Harmonic prior self-conditioned flow matching for multi-ligand docking and binding site design. In NeurIPS 2023 AI for Science Workshop, 2023.
- Dirichlet flow matching with applications to dna sequence design. arXiv preprint arXiv:2402.05841, 2024.
- Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016.
- A note on the evaluation of generative models. arXiv preprint arXiv:1511.01844, 2015.
- Discrete flows: Invertible generative models of discrete data. Advances in Neural Information Processing Systems, 32, 2019.
- Autonomous retrosynthesis of gold nanoparticles via spectral shape matching. Digital Discovery, 1(4):502–510, 2022.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Fast protein backbone generation with se (3) flow matching. arXiv preprint arXiv:2310.05297, 2023.