Overcoming Order in Autoregressive Graph Generation (2402.03387v1)
Abstract: Graph generation is a fundamental problem in various domains, including chemistry and social networks. Recent work has shown that molecular graph generation using recurrent neural networks (RNNs) is advantageous compared to traditional generative approaches which require converting continuous latent representations into graphs. One issue which arises when treating graph generation as sequential generation is the arbitrary order of the sequence which results from a particular choice of graph flattening method. In this work we propose using RNNs, taking into account the non-sequential nature of graphs by adding an Orderless Regularization (OLR) term that encourages the hidden state of the recurrent model to be invariant to different valid orderings present under the training distribution. We demonstrate that sequential graph generation models benefit from our proposed regularization scheme, especially when data is scarce. Our findings contribute to the growing body of research on graph generation and provide a valuable tool for various applications requiring the synthesis of realistic and diverse graph structures.
- Learning to represent programs with graphs. arXiv preprint arXiv:1711.00740, 2017.
- Randomized smiles strings improve the quality of molecular generative models. Journal of cheminformatics, 11(1):1–13, 2019.
- Defactor: Differentiable edge factorization-based probabilistic graph generation. arXiv preprint arXiv:1811.09766, 2018.
- Edge-based sequential graph generation with recurrent neural networks. Neurocomputing, 416:177–189, 2020.
- On the end-vertex problem of graph searches. Discrete Mathematics & Theoretical Computer Science, 21, 2019.
- The properties of known drugs. 1. molecular frameworks. Journal of medicinal chemistry, 39(15):2887–2893, 1996.
- Phog: probabilistic model for code. In International conference on machine learning, pages 2933–2942. PMLR, 2016.
- Application of generative autoencoder in de novo molecular design. Molecular informatics, 37(1-2):1700123, 2018.
- Generative code modeling with graphs. arXiv preprint arXiv:1805.08490, 2018.
- Let there be order: Rethinking ordering in autoregressive graph generation. arXiv preprint arXiv:2305.15562, 2023.
- Evaluating large language models trained on code.(2021). arXiv preprint arXiv:2107.03374, 2021.
- Order matters: Probabilistic modeling of node sequence for graph generation. arXiv preprint arXiv:2106.06189, 2021.
- Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.
- Regularizing towards permutation invariance in recurrent models. Advances in Neural Information Processing Systems, 33:18364–18374, 2020.
- Syntax-directed variational autoencoder for structured data. arXiv preprint arXiv:1802.08786, 2018.
- Hierarchical gnns for large graph generation. arXiv preprint arXiv:2306.11412, 2023.
- Molgan: An implicit generative model for small molecular graphs. arXiv preprint arXiv:1805.11973, 2018.
- On the art of compiling and using’drug-like’chemical fragment spaces. ChemMedChem: Chemistry Enabling Drug Discovery, 3(10):1503–1507, 2008.
- Disentangled spatiotemporal graph generative models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 6541–6549, 2022.
- Interpretable molecular graph generation via monotonic constraints. In Proceedings of the 2022 SIAM International Conference on Data Mining (SDM), pages 73–81. SIAM, 2022.
- Labeled graph generative adversarial networks. arXiv preprint arXiv:1906.03220, 2019.
- Graph deconvolutional generation. arXiv preprint arXiv:2002.07087, 2020.
- Language models can learn complex molecular distributions. Nature Communications, 13(1):3293, 2022.
- Maurice Fréchet. Sur la distance de deux lois de probabilité. In Annales de l’ISUP, volume 6, pages 183–198, 1957.
- E (n) equivariant normalizing flows. Advances in Neural Information Processing Systems, 34:4181–4192, 2021.
- Inverse design of 3d molecular structures with conditional generative neural networks. Nature communications, 13(1):973, 2022.
- Automatic chemical design using a data-driven continuous representation of molecules. ACS central science, 4(2):268–276, 2018.
- Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
- Graphgen: a scalable approach to domain-agnostic labeled graph generation. In Proceedings of The Web Conference 2020, pages 1253–1263, 2020.
- A systematic survey on deep generative models for graph generation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
- Interpretable deep graph generation with node-edge co-disentanglement. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pages 1697–1707, 2020.
- Global relational models of source code. In International conference on learning representations, 2019.
- Mapping images to scene graphs with permutation-invariant structured prediction. Advances in Neural Information Processing Systems, 31, 2018.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
- On the naturalness of software. Communications of the ACM, 59(5):122–131, 2016.
- Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
- Graph residual flow for molecular graph generation. arXiv preprint arXiv:1909.13521, 2019.
- Equivariant diffusion for molecule generation in 3d. In International conference on machine learning, pages 8867–8887. PMLR, 2022.
- Junction tree variational autoencoder for molecular graph generation. In International conference on machine learning, pages 2323–2332. PMLR, 2018.
- The cornucopia of meaningful leads: Applying deep adversarial autoencoders for new molecule development in oncology. Oncotarget, 8(7):10883, 2017.
- Conditional molecular design with deep generative models. Journal of chemical information and modeling, 59(1):43–52, 2018.
- Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- Graphs, algorithms, and optimization. CRC Press, 2016.
- Greg Landrum. Rdkit: Open-source cheminformatics. 2006. Google Scholar, 2006.
- Efficient graph generation with graph recurrent attention networks. Advances in neural information processing systems, 32, 2019.
- Molecular generative model based on conditional variational autoencoder for de novo molecular design. Journal of cheminformatics, 10(1):1–9, 2018.
- An autoregressive flow model for 3d molecular geometry generation from scratch. In International Conference on Learning Representations (ICLR), 2022.
- Constrained generation of semantically valid graphs via regularizing variational autoencoders. Advances in Neural Information Processing Systems, 31, 2018.
- Graphnvp: An invertible flow model for generating molecular graphs. arXiv preprint arXiv:1905.11600, 2019.
- Large-scale comparison of machine learning methods for drug target prediction on chembl. Chemical science, 9(24):5441–5451, 2018.
- Janossy pooling: Learning deep permutation-invariant functions for variable-size inputs. arXiv preprint arXiv:1811.01900, 2018.
- A deep generative model for fragment-based molecule generation. In International Conference on Artificial Intelligence and Statistics, pages 2240–2250. PMLR, 2020.
- De novo molecule design by translating from reduced graphs to smiles. Journal of chemical information and modeling, 59(3):1136–1146, 2018.
- Molecular sets (moses): a benchmarking platform for molecular generation models. Frontiers in pharmacology, 11:565644, 2020.
- Molecularrnn: Generating realistic molecular graphs with optimized properties. arXiv preprint arXiv:1905.13372, 2019.
- Fréchet chemnet distance: a metric for generative models for molecules in drug discovery. Journal of chemical information and modeling, 58(9):1736–1741, 2018.
- A de novo molecular generation method using latent vector based generative adversarial network. Journal of Cheminformatics, 11(1):1–13, 2019.
- Extended-connectivity fingerprints. Journal of chemical information and modeling, 50(5):742–754, 2010.
- Semi-equivariant conditional normalizing flows, with applications to target-aware molecule generation. Machine Learning: Science and Technology, 4(3):035037, 2023.
- Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS central science, 4(1):120–131, 2018.
- Graphaf: a flow-based autoregressive model for molecular graph generation. arXiv preprint arXiv:2001.09382, 2020.
- Reinforcement learning for molecular design guided by quantum mechanics. In International Conference on Machine Learning, pages 8959–8969. PMLR, 2020.
- Symmetry-aware actor-critic for 3d molecular design. arXiv preprint arXiv:2011.12747, 2020.
- Graphvae: Towards generation of small graphs using variational autoencoders. In Artificial Neural Networks and Machine Learning–ICANN 2018: 27th International Conference on Artificial Neural Networks, Rhodes, Greece, October 4-7, 2018, Proceedings, Part I 27, pages 412–422. Springer, 2018.
- From target to drug: generative modeling for the multimodal structure-based ligand design. Molecular pharmaceutics, 16(10):4282–4291, 2019.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pages 2256–2265. PMLR, 2015.
- Zinc 15–ligand discovery for everyone. Journal of chemical information and modeling, 55(11):2324–2337, 2015.
- Generative models for de novo drug design. Journal of Medicinal Chemistry, 64(19):14011–14027, 2021.
- Leonid Nisonovich Vaserstein. Markov processes over denumerable products of spaces, describing large systems of automata. Problemy Peredachi Informatsii, 5(3):64–72, 1969.
- Order matters: Sequence to sequence for sets. arXiv preprint arXiv:1511.06391, 2015.
- David Weininger. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences, 28(1):31–36, 1988.
- Scene graph generation by iterative message passing. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5410–5419, 2017.
- Exploiting rich syntactic information for semantic parsing with graph-to-sequence model. arXiv preprint arXiv:1808.07624, 2018.
- A syntactic neural model for general-purpose code generation. arXiv preprint arXiv:1704.01696, 2017.
- Graph convolutional policy network for goal-directed molecular graph generation. Advances in neural information processing systems, 31, 2018.
- Graphrnn: Generating realistic graphs with deep auto-regressive models. In International conference on machine learning, pages 5708–5717. PMLR, 2018.
- Deep sets. Advances in neural information processing systems, 30, 2017.
- Moflow: an invertible flow model for generating molecular graphs. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pages 617–626, 2020.
- Molecule generation for target protein binding with structural motifs. In The Eleventh International Conference on Learning Representations, 2022.
- Graph neural networks: A review of methods and applications. AI open, 1:57–81, 2020.
- A survey on deep graph generation: Methods and applications. arXiv preprint arXiv:2203.06714, 2022.
- Encoding robust representation for graph generation. In 2019 International Joint Conference on Neural Networks (IJCNN), pages 1–9. IEEE, 2019.