SymbolNet: Neural Symbolic Regression with Adaptive Dynamic Pruning for Compression (2401.09949v3)
Abstract: Compact symbolic expressions have been shown to be more efficient than neural network models in terms of resource consumption and inference speed when implemented on custom hardware such as FPGAs, while maintaining comparable accuracy~\cite{tsoi2023symbolic}. These capabilities are highly valuable in environments with stringent computational resource constraints, such as high-energy physics experiments at the CERN Large Hadron Collider. However, finding compact expressions for high-dimensional datasets remains challenging due to the inherent limitations of genetic programming, the search algorithm of most symbolic regression methods. Contrary to genetic programming, the neural network approach to symbolic regression offers scalability to high-dimensional inputs and leverages gradient methods for faster equation searching. Common ways of constraining expression complexity often involve multistage pruning with fine-tuning, which can result in significant performance loss. In this work, we propose $\tt{SymbolNet}$, a neural network approach to symbolic regression specifically designed as a model compression technique, aimed at enabling low-latency inference for high-dimensional inputs on custom hardware such as FPGAs. This framework allows dynamic pruning of model weights, input features, and mathematical operators in a single training process, where both training loss and expression complexity are optimized simultaneously. We introduce a sparsity regularization term for each pruning type, which can adaptively adjust its strength, leading to convergence at a target sparsity ratio. Unlike most existing symbolic regression methods that struggle with datasets containing more than $\mathcal{O}(10)$ inputs, we demonstrate the effectiveness of our model on the LHC jet tagging task (16 inputs), MNIST (784 inputs), and SVHN (3072 inputs).
- M. Planck, “On an improvement of Wien’s equation for the spectrum,” Verh. Dtsch. Phys. Ges., vol. 2, 1900.
- M. Virgolin and S. P. Pissis, “Symbolic regression is NP-hard,” arXiv:2207.01018, 2022.
- M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015, software available from tensorflow.org. [Online]. Available: https://www.tensorflow.org/
- J. Liu, Z. Xu, R. Shi, R. C. C. Cheung, and H. K. H. So, “Dynamic sparse training: Find efficient sparse network from scratch with trainable masked layers,” arXiv:2005.06870, 2020.
- J. Koza, “Genetic programming as a means for programming computers by natural selection,” Statistics and Computing, vol. 4, pp. 87–112, 1994.
- M. Schmidt and H. Lipson, “Distilling free-form natural laws from experimental data,” Science, vol. 324, pp. 81–85, 2009. [Online]. Available: https://doi.org/10.1126/science.1165893
- M. Cranmer, “Interpretable machine learning for science with PySR and SymbolicRegression.jl,” arXiv:2305.01582, 2023.
- D. Wadekar, F. Villaescusa-Navarro, S. Ho, and L. Perreault-Levasseur, “Modeling assembly bias with machine learning and symbolic regression,” arXiv:2012.00111, 2020.
- H. Shao, F. Villaescusa-Navarro, S. Genel, D. N. Spergel, D. Anglé s-Alcázar, L. Hernquist, R. Davé, D. Narayanan, G. Contardo, and M. Vogelsberger, “Finding universal relations in subhalo properties with artificial intelligence,” The Astrophysical Journal, vol. 927, no. 1, p. 85, 2022. [Online]. Available: https://doi.org/10.3847%2F1538-4357%2Fac4d30
- A. M. Delgado, D. Wadekar, B. Hadzhiyska, S. Bose, L. Hernquist, and S. Ho, “Modelling the galaxy–halo connection with machine learning,” Monthly Notices of the Royal Astronomical Society, vol. 515, no. 2, pp. 2733–2746, 2022. [Online]. Available: https://doi.org/10.1093%2Fmnras%2Fstac1951
- D. Wadekar, L. Thiele, J. C. Hill, S. Pandey, F. Villaescusa-Navarro, D. N. Spergel, M. Cranmer, D. Nagai, D. Anglé s-Alcázar, S. Ho, and L. Hernquist, “The SZ flux-mass (Y−M𝑌𝑀Y-Mitalic_Y - italic_M) relation at low-halo masses: improvements with symbolic regression and strong constraints on baryonic feedback,” Monthly Notices of the Royal Astronomical Society, vol. 522, no. 2, pp. 2628–2643, 2023. [Online]. Available: https://doi.org/10.1093%2Fmnras%2Fstad1128
- P. Lemos, N. Jeffrey, M. Cranmer, S. Ho, and P. Battaglia, “Rediscovering orbital mechanics with machine learning,” arXiv:2202.02306, 2022.
- D. Wadekar, L. Thiele, F. Villaescusa-Navarro, J. C. Hill, M. Cranmer, D. N. Spergel, N. Battaglia, D. Anglé s-Alcázar, L. Hernquist, and S. Ho, “Augmenting astrophysical scaling relations with machine learning: Application to reducing the sunyaev–zeldovich flux–mass scatter,” Proceedings of the National Academy of Sciences, vol. 120, no. 12, 2023. [Online]. Available: https://doi.org/10.1073%2Fpnas.2202074120
- A. Grundner, T. Beucler, P. Gentine, and V. Eyring, “Data-driven equation discovery of a cloud cover parameterization,” arXiv:2304.08063, 2023.
- T. Stephens, “Genetic programming in Python, with a scikit-learn inspired API: gplearn,” 2016. [Online]. Available: https://gplearn.readthedocs.io/en/stable/
- B. Burlacu, G. Kronberger, and M. Kommenda, “Operon C++: An efficient genetic programming framework for symbolic regression,” in Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion, ser. GECCO ’20. New York, NY, USA: Association for Computing Machinery, 2020, p. 1562–1570. [Online]. Available: https://doi.org/10.1145/3377929.3398099
- M. Virgolin, T. Alderliesten, C. Witteveen, and P. A. N. Bosman, “Improving model-based genetic programming for symbolic regression of small expressions,” Evolutionary Computation, vol. 29, no. 2, pp. 211–237, 2021. [Online]. Available: https://doi.org/10.1162%2Fevco_a_00278
- G. Martius and C. H. Lampert, “Extrapolation and learning equations,” 2016.
- S. S. Sahoo, C. H. Lampert, and G. Martius, “Learning equations for extrapolation and control,” 2018.
- M. Werner, A. Junginger, P. Hennig, and G. Martius, “Informed equation learning,” 2021.
- S. Kim, P. Y. Lu, S. Mukherjee, M. Gilbert, L. Jing, V. Ceperic, and M. Soljacic, “Integration of neural network-based symbolic regression in deep learning for scientific discovery,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 9, pp. 4166–4177, 2021. [Online]. Available: https://doi.org/10.1109%2Ftnnls.2020.3017010
- I. A. Abdellaoui and S. Mehrkanoon, “Symbolic regression for scientific discovery: an application to wind speed forecasting,” arXiv:2102.10570, 2021.
- A. Costa, R. Dangovski, O. Dugan, S. Kim, P. Goyal, M. Soljačić, and J. Jacobson, “Fast neural models for symbolic regression at scale,” arXiv:2007.10784, 2021.
- B. K. Petersen, M. L. Larma, T. N. Mundhenk, C. P. Santiago, S. K. Kim, and J. T. Kim, “Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients,” in International Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=m5Qsh0kBQG
- H. Zhou and W. Pan, “Bayesian learning to discover mathematical operations in governing equations of dynamic systems,” arXiv:2206.00669, 2022.
- J. Kubalík, E. Derner, and R. Babuška, “Toward physically plausible data-driven models: A novel neural network approach to symbolic regression,” IEEE Access, vol. 11, pp. 61 481–61 501, 2023. [Online]. Available: https://doi.org/10.1109%2Faccess.2023.3287397
- L. Biggio, T. Bendinelli, A. Neitz, A. Lucchi, and G. Parascandolo, “Neural symbolic regression that scales,” in Proceedings of the 38th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, M. Meila and T. Zhang, Eds., vol. 139. PMLR, 18–24 Jul 2021, pp. 936–945. [Online]. Available: https://proceedings.mlr.press/v139/biggio21a.html
- M. Valipour, B. You, M. Panju, and A. Ghodsi, “Symbolicgpt: A generative transformer model for symbolic regression,” 2021.
- P.-A. Kamienny, S. d’Ascoli, G. Lample, and F. Charton, “End-to-end symbolic regression with transformers,” in Advances in Neural Information Processing Systems, 2022.
- M. Vastl, J. Kulhánek, J. Kubalík, E. Derner, and R. Babuška, “Symformer: End-to-end symbolic regression using transformer-based architecture,” arXiv:2205.15764, 2022.
- A. Meurer, C. P. Smith, M. Paprocki, O. Čertík, S. B. Kirpichev, M. Rocklin, A. Kumar, S. Ivanov, J. K. Moore, S. Singh, T. Rathnayake, S. Vig, B. E. Granger, R. P. Muller, F. Bonazzi, H. Gupta, S. Vats, F. Johansson, F. Pedregosa, M. J. Curry, A. R. Terrel, v. Roučka, A. Saboo, I. Fernando, S. Kulal, R. Cimrman, and A. Scopatz, “Sympy: symbolic computing in python,” PeerJ Computer Science, vol. 3, p. e103, Jan. 2017. [Online]. Available: https://doi.org/10.7717/peerj-cs.103
- H. F. Tsoi, A. A. Pol, V. Loncar, E. Govorkova, M. Cranmer, S. Dasu, P. Elmer, P. Harris, I. Ojalvo, and M. Pierini, “Symbolic regression on FPGAs for fast machine learning inference,” arXiv:2305.04099, 2023.
- M. Pierini, J. M. Duarte, N. Tran, and M. Freytsis, “HLS4ML LHC Jet dataset (150 particles),” 2020. [Online]. Available: https://doi.org/10.5281/zenodo.3602260
- Y. LeCun and C. Cortes, “MNIST handwritten digit database,” 2010. [Online]. Available: http://yann.lecun.com/exdb/mnist/
- Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, “Reading digits in natural images with unsupervised feature learning,” in NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011, 2011. [Online]. Available: http://ufldl.stanford.edu/housenumbers/nips2011_housenumbers.pdf
- ATLAS Collaboration, “Technical Design Report for the Phase-II Upgrade of the ATLAS TDAQ System,” CERN-LHCC-2017-020, ATLAS-TDR-029, 2017.
- CMS Collaboration, “The Phase-2 Upgrade of the CMS Level-1 Trigger,” CERN-LHCC-2020-004, CMS-TDR-021, 2020.
- J. Duarte et al., “Fast inference of deep neural networks in FPGAs for particle physics,” JINST, vol. 13, no. 07, p. P07027, 2018.
- E. A. Moreno, O. Cerri, J. M. Duarte, H. B. Newman, T. Q. Nguyen, A. Periwal, M. Pierini, A. Serikova, M. Spiropulu, and J.-R. Vlimant, “JEDI-net: a jet identification algorithm based on interaction networks,” Eur. Phys. J. C, vol. 80, no. 1, p. 58, 2020.
- E. Coleman, M. Freytsis, A. Hinzmann, M. Narain, J. Thaler, N. Tran, and C. Vernieri, “The importance of calorimetry for highly-boosted jet substructure,” JINST, vol. 13, no. 01, p. T01003, 2018.
- T. Aarrestad et al., “Fast convolutional neural networks on FPGAs with hls4ml,” Mach. Learn. Sci. Tech., vol. 2, no. 4, p. 045015, 2021.
- C. N. Coelho, A. Kuusela, S. Li, H. Zhuang, T. Aarrestad, V. Loncar, J. Ngadiuba, M. Pierini, A. A. Pol, and S. Summers, “Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors,” Nature Mach. Intell., vol. 3, pp. 675–686, 2021.
- M. Zhu and S. Gupta, “To prune, or not to prune: exploring the efficacy of pruning for model compression,” 2017.
- FastML Team, “fastmachinelearning/hls4ml,” 2021. [Online]. Available: https://github.com/fastmachinelearning/hls4ml
- Xilinx, “Vivado Design Suite User Guide: High-Level Synthesis,” https://www.xilinx.com/support/documentation/sw_manuals/xilinx2020_1/ug902-vivado-high-level-synthesis.pdf, 2020.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2017.