SNIP: Bridging Mathematical Symbolic and Numeric Realms with Unified Pre-training (2310.02227v3)
Abstract: In an era where symbolic mathematical equations are indispensable for modeling complex natural phenomena, scientific inquiry often involves collecting observations and translating them into mathematical expressions. Recently, deep learning has emerged as a powerful tool for extracting insights from data. However, existing models typically specialize in either numeric or symbolic domains, and are usually trained in a supervised manner tailored to specific tasks. This approach neglects the substantial benefits that could arise from a task-agnostic multi-modal understanding between symbolic equations and their numeric counterparts. To bridge the gap, we introduce SNIP, a Symbolic-Numeric Integrated Pre-training model, which employs contrastive learning between symbolic and numeric domains, enhancing their mutual similarities in the embeddings. By performing latent space analysis, we observe that SNIP provides cross-domain insights into the representations, revealing that symbolic supervision enhances the embeddings of numeric data and vice versa. We evaluate SNIP across diverse tasks, including symbolic-to-numeric mathematical property prediction and numeric-to-symbolic equation discovery, commonly known as symbolic regression. Results show that SNIP effectively transfers to various tasks, consistently outperforming fully supervised baselines and competing strongly with established task-specific methods, especially in the low data regime scenarios where available data is limited. Code and model are available at: https://github.com/deep-symbolic-mathematics/Multimodal-Math-Pretraining
- A cookbook of self-supervised learning. arXiv preprint arXiv:2304.12210, 2023.
- Predicting ordinary differential equations with transformers. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp. 1978–2002. PMLR, 23–29 Jul 2023. URL https://proceedings.mlr.press/v202/becker23a.html.
- Neural symbolic regression that scales. In Marina Meila and Tong Zhang (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 936–945. PMLR, 18–24 Jul 2021. URL https://proceedings.mlr.press/v139/biggio21a.html.
- Operon c++: An efficient genetic programming framework for symbolic regression. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion, GECCO ’20, pp. 1562–1570, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450371278. doi: 10.1145/3377929.3398099. URL https://doi.org/10.1145/3377929.3398099.
- Moformer: Self-supervised transformer model for metal–organic framework property prediction. Journal of the American Chemical Society, 145(5):2958–2967, 2023. doi: 10.1021/jacs.2c11420. URL https://doi.org/10.1021/jacs.2c11420. PMID: 36706365.
- Francois Charton. Linear algebra with transformers. Transactions on Machine Learning Research, 2022. ISSN 2835-8856. URL https://openreview.net/forum?id=Hp4g7FAXXG.
- Miles Cranmer. Interpretable machine learning for science with pysr and symbolicregression. jl. arXiv preprint arXiv:2305.01582, 2023.
- Discovering symbolic models from deep learning with inductive biases. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Red Hook, NY, USA, 2020. ISBN 9781713829546.
- Deep symbolic regression for recurrence prediction. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp. 4520–4536. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/d-ascoli22a.html.
- M5product: Self-harmonized contrastive learning for e-commercial multi-modal pretraining. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21252–21262, 2022.
- Hierarchical neural story generation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 889–898, Melbourne, Australia, jul 2018. Association for Computational Linguistics. doi: 10.18653/v1/P18-1082. URL https://aclanthology.org/P18-1082.
- Roger Fletcher. Practical Methods of Optimization. John Wiley & Sons, New York, NY, USA, second edition, 1987.
- Injecting numerical reasoning skills into language models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 946–958, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.89. URL https://aclanthology.org/2020.acl-main.89.
- xval: A continuous number encoding for large language models. arXiv preprint arXiv:2310.02989, 2023.
- Best of both worlds: Multimodal contrastive learning with tabular and imaging data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 23924–23935, 2023.
- Deep generative symbolic regression. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=o7koEEMA1bR.
- Length generalization in arithmetic transformers. arXiv preprint arXiv:2306.15400, 2023.
- Scaling up visual and vision-language representation learning with noisy text supervision. In Marina Meila and Tong Zhang (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 4904–4916. PMLR, 18–24 Jul 2021. URL https://proceedings.mlr.press/v139/jia21b.html.
- End-to-end symbolic regression with transformers. In Advances in Neural Information Processing Systems, 2022.
- Inference of compact nonlinear dynamic models by epigenetic local search. Engineering Applications of Artificial Intelligence, 55:292–306, 2016. ISSN 0952-1976. doi: https://doi.org/10.1016/j.engappai.2016.07.004. URL https://www.sciencedirect.com/science/article/pii/S0952197616301294.
- Contemporary symbolic regression methods and their relative performance. In J. Vanschoren and S. Yeung (eds.), Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, volume 1, 2021. URL https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/file/c0c7c76d30bd3dcaefc96f40275bdc0a-Paper-round1.pdf.
- Deep learning for symbolic mathematics. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=S1eZYeHFDS.
- Hypertree proof search for neural theorem proving. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (eds.), Advances in Neural Information Processing Systems, volume 35, pp. 26337–26349. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/a8901c5e85fb8e1823bbf0f755053672-Paper-Conference.pdf.
- A unified framework for deep symbolic regression. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=2FNnBhwJsHK.
- Mwp-bert: Numeracy-augmented pre-training for math word problem solving. In Findings of NAACL 2022, pp. 997–1009, 2022.
- Opt: Omni-perception pre-trainer for cross-modal understanding and generation. arXiv preprint arXiv:2107.00249, 2021.
- A survey of deep learning for mathematical reasoning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 14605–14631, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.817. URL https://aclanthology.org/2023.acl-long.817.
- Kazem Meidani and Amir Barati Farimani. Identification of parametric dynamical systems using integer programming. Expert Systems with Applications, 219:119622, 2023. ISSN 0957-4174. doi: https://doi.org/10.1016/j.eswa.2023.119622.
- Efficient generator of mathematical expressions for symbolic regression. Machine Learning, Sep 2023. ISSN 1573-0565. doi: 10.1007/s10994-023-06400-2. URL https://doi.org/10.1007/s10994-023-06400-2.
- Grey wolf optimizer. Advances in Engineering Software, 69:46–61, 2014. ISSN 0965-9978. doi: https://doi.org/10.1016/j.advengsoft.2013.12.007. URL https://www.sciencedirect.com/science/article/pii/S0965997813001853.
- Clipcap: Clip prefix for image captioning. arXiv preprint arXiv:2111.09734, 2021.
- Symbolic regression via deep reinforcement learning enhanced genetic programming seeding. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, 2021. URL https://openreview.net/forum?id=tjwQaOI9tdy.
- Pmlb: a large benchmark suite for machine learning evaluation and comparison. BioData Mining, 10(1):36, Dec 2017. ISSN 1756-0381. doi: 10.1186/s13040-017-0154-4. URL https://doi.org/10.1186/s13040-017-0154-4.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
- Mathbert: A pre-trained model for mathematical formula understanding. arXiv preprint arXiv:2105.00377, 2021.
- Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=m5Qsh0kBQG.
- Symbolic expression generation via variational auto-encoder. PeerJ Computer Science, 9:e1241, 2023.
- Learning transferable visual models from natural language supervision. In Marina Meila and Tong Zhang (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 8748–8763. PMLR, 18–24 Jul 2021. URL https://proceedings.mlr.press/v139/radford21a.html.
- J. Rapin and O. Teytaud. Nevergrad - A gradient-free optimization platform. https://GitHub.com/FacebookResearch/Nevergrad, 2018.
- Data-driven discovery of partial differential equations. Science Advances, 3(4):e1602614, 2017. doi: 10.1126/sciadv.1602614. URL https://www.science.org/doi/abs/10.1126/sciadv.1602614.
- Attentive pooling networks. arXiv preprint arXiv:1602.03609, 2016.
- Analysing mathematical reasoning abilities of neural models. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=H1gR5iR5FX.
- Distilling free-form natural laws from experimental data. Science Advance, 324(5923):81–85, 2009. ISSN 0036-8075. doi: 10.1126/science.1165893.
- Transformer-based planning for symbolic regression. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=0rVXQEeFEL.
- Flava: A foundational language and vision alignment model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15638–15650, 2022.
- A molecular multimodal foundation model associating molecule graphs with natural language. arXiv preprint arXiv:2209.05481, 2022.
- Quantitative measure of nonconvexity for black-box continuous functions. Information Sciences, 476:64–82, 2019. ISSN 0020-0255. doi: https://doi.org/10.1016/j.ins.2018.10.009. URL https://www.sciencedirect.com/science/article/pii/S0020025518308053.
- Representing numbers in NLP: a survey and a vision. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 644–656, Online, June 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.53. URL https://aclanthology.org/2021.naacl-main.53.
- Ai feynman: A physics-inspired method for symbolic regression. Science Advances, 6(16):eaay2631, 2020. doi: 10.1126/sciadv.aay2631. URL https://www.science.org/doi/abs/10.1126/sciadv.aay2631.
- Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(86):2579–2605, 2008. URL http://jmlr.org/papers/v9/vandermaaten08a.html.
- Attention is all you need. In Advances in Neural Information Processing Systems, volume 30, 2017.
- Large-scale multi-modal pre-trained models: A comprehensive survey. Machine Intelligence Research, pp. 1–36, 2023.
- Symbolic brittleness in sequence models: On systematic generalization in symbolic mathematics. Proceedings of the AAAI Conference on Artificial Intelligence, 36(8):8629–8637, Jun. 2022. doi: 10.1609/aaai.v36i8.20841. URL https://ojs.aaai.org/index.php/AAAI/article/view/20841.
- Eugene P. Wigner. The unreasonable effectiveness of mathematics in the natural sciences. richard courant lecture in mathematical sciences delivered at new york university, may 11, 1959. Communications on Pure and Applied Mathematics, 13(1):1–14, 1960. doi: https://doi.org/10.1002/cpa.3160130102. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/cpa.3160130102.
- Meta-transformer: A unified framework for multimodal learning. arXiv preprint arXiv:2307.10802, 2023.
- A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. arXiv preprint arXiv:2302.09419, 2023.
- Self-supervised multimodal learning: A survey. arXiv preprint arXiv:2304.01008, 2023.