SimbaML: Connecting Mechanistic Models and Machine Learning with Augmented Data
Abstract: Training sophisticated ML models requires large datasets that are difficult or expensive to collect for many applications. If prior knowledge about system dynamics is available, mechanistic representations can be used to supplement real-world data. We present SimbaML (Simulation-Based ML), an open-source tool that unifies realistic synthetic dataset generation from ordinary differential equation-based models and the direct analysis and inclusion in ML pipelines. SimbaML conveniently enables investigating transfer learning from synthetic to real-world data, data augmentation, identifying needs for data collection, and benchmarking physics-informed ML approaches. SimbaML is available from https://pypi.org/project/simba-ml/.
- Integrating machine learning and multiscale modeling—perspectives, challenges, and opportunities in the biological, biomedical, and behavioral sciences. npj Digital Medicine, 2(1):115, 2019. ISSN 2398-6352. doi: 10.1038/s41746-019-0193-y.
- Gluonts: Probabilistic and neural time series modeling in python. Journal of Machine Learning Research, 21(116):1–6, 2020. URL http://jmlr.org/papers/v21/19-820.html.
- Mechanistic models versus machine learning, a fight worth fighting for the biological community? Biology Letters, 14:20170660, 05 2018. doi: 10.1098/rsbl.2017.0660.
- Learning data-driven discretizations for partial differential equations. Proceedings of the National Academy of Sciences, 116:15344–15349, 07 2019. doi: 10.1073/pnas.1814058116.
- Of gene expression and cell division time: A mathematical framework for advanced differential gene expression and data analysis. Cell Systems, 9(6):569–579.e7, 2019. ISSN 2405-4712. doi: https://doi.org/10.1016/j.cels.2019.07.009.
- A machine learning approach to predict metabolic pathway dynamics from time-series multiomics data. npj Systems Biology and Applications, 4, 05 2018. doi: 10.1038/s41540-018-0054-3.
- Generative adversarial networks. Communications of the ACM, 63:139–144, 10 2020. doi: 10.1145/3422622.
- Dynamic patterns of information flow in complex networks. Nature Communications, 8(1):2181, 2017. ISSN 2041-1723. doi: 10.1038/s41467-017-01916-3.
- Benchmark problems for dynamic modeling of intracellular processes. Bioinformatics, 35(17):3073–3082, 2019. ISSN 1367-4803. doi: 10.1093/bioinformatics/btz020.
- An ode traffic network model. Journal of Computational and Applied Mathematics, 203(2):419–436, 2007. ISSN 0377-0427. doi: 10.1016/j.cam.2006.04.007.
- Alan C. Hindmarsh. ODEPACK, a systematized collection of ODE solvers. In R.S. Stepleman (ed.), Scientific Computing, pp. 55–64. North-Holland, 1983.
- Deeptime: a python library for machine learning dynamical models from time series data. Machine Learning: Science and Technology, 3:015009, 12 2021. doi: 10.1088/2632-2153/ac3de0.
- Ultrasensitivity in the mitogen-activated protein kinase cascade. Proceedings of the National Academy of Sciences, 93:10078–10083, 09 1996. doi: 10.1073/pnas.93.19.10078.
- Physics-informed machine learning. Nature Reviews Physics, 3:422–440, 06 2021. doi: 10.1038/s42254-021-00314-5.
- A contribution to the mathematical theory of epidemics. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 115:700–721, 08 1927. doi: 10.1098/rspa.1927.0118.
- An extensible benchmark suite for learning to simulate physical systems. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1), 2021. URL https://openreview.net/forum?id=pY9MHwmrymR.
- Linda Petzold. Automatic selection of methods for solving stiff and nonstiff systems of ordinary differential equations. SIAM Journal on Scientific and Statistical Computing, 4(1):136–148, 1983. doi: 10.1137/0904010.
- Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686–707, 02 2019. doi: 10.1016/j.jcp.2018.10.045.
- Hidden fluid mechanics: Learning velocity and pressure fields from flow visualizations. Science, 367(6481):1026–1030, 2020. doi: 10.1126/science.aaw4741.
- Robert Koch-Institut. SARS-CoV-2 Infektionen in Deutschland. 08 2022. doi: 10.5281/zenodo.6994808.
- Deep unsupervised learning using nonequilibrium thermodynamics. In Francis R. Bach and David M. Blei (eds.), Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, volume 37 of JMLR Workshop and Conference Proceedings, pp. 2256–2265. JMLR.org, 2015. URL http://proceedings.mlr.press/v37/sohl-dickstein15.html.
- PDEBench: An extensive benchmark for scientific machine learning. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022. URL https://openreview.net/forum?id=dh_MkX0QfrK.
- Variational autoencoder based synthetic data generation for imbalanced learning. 2017 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–7, 11 2017. doi: 10.1109/SSCI.2017.8285168.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.