Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior (2306.00258v1)

Published 1 Jun 2023 in cs.LG, cs.NA, and math.NA

Abstract: Pre-trained ML models have shown great performance for a wide range of applications, in particular in NLP and computer vision (CV). Here, we study how pre-training could be used for scientific machine learning (SciML) applications, specifically in the context of transfer learning. We study the transfer behavior of these models as (i) the pre-trained model size is scaled, (ii) the downstream training dataset size is scaled, (iii) the physics parameters are systematically pushed out of distribution, and (iv) how a single model pre-trained on a mixture of different physics problems can be adapted to various downstream applications. We find that-when fine-tuned appropriately-transfer learning can help reach desired accuracy levels with orders of magnitude fewer downstream examples (across different tasks that can even be out-of-distribution) than training from scratch, with consistent behavior across a wide range of downstream examples. We also find that fine-tuning these models yields more performance gains as model size increases, compared to training from scratch on new downstream tasks. These results hold for a broad range of PDE learning tasks. All in all, our results demonstrate the potential of the "pre-train and fine-tune" paradigm for SciML problems, demonstrating a path towards building SciML foundation models. We open-source our code for reproducibility.

Citations (44)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior (2306.00258v1)

Summary

Related Papers