Data-Efficient Operator Learning via Unsupervised Pretraining and In-Context Learning (2402.15734v3)
Abstract: Recent years have witnessed the promise of coupling machine learning methods and physical domain-specific insights for solving scientific problems based on partial differential equations (PDEs). However, being data-intensive, these methods still require a large amount of PDE data. This reintroduces the need for expensive numerical PDE solutions, partially undermining the original goal of avoiding these expensive simulations. In this work, seeking data efficiency, we design unsupervised pretraining for PDE operator learning. To reduce the need for training data with heavy simulation costs, we mine unlabeled PDE data without simulated solutions, and we pretrain neural operators with physics-inspired reconstruction-based proxy tasks. To improve out-of-distribution performance, we further assist neural operators in flexibly leveraging a similarity-based method that learns in-context examples, without incurring extra training costs or designs. Extensive empirical evaluations on a diverse set of PDEs demonstrate that our method is highly data-efficient, more generalizable, and even outperforms conventional vision-pretrained models. We provide our code at https://github.com/delta-lab-ai/data_efficient_nopt.
- Solving high-dimensional partial differential equations using deep learning. Proceedings of the National Academy of Sciences, 115(34):8505–8510, 2018.
- Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics, 378:686–707, 2019.
- Neural operator: Learning maps between function spaces with applications to PDEs. Journal of Machine Learning Research, 24(89):1–97, 2023.
- Fourier neural operator for parametric partial differential equations. In International Conference on Learning Representations, 2021a.
- Multipole graph neural operator for parametric partial differential equations. Advances in Neural Information Processing Systems, 33:6755–6766, 2020.
- Deeponet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators. arXiv preprint arXiv:1910.03193, 2019.
- Learning continuous models for continuous physics. Communications Physics, 6:319, 2023.
- Learning physical models that can respect conservation laws. Physica D: Nonlinear Phenomena, 457:133952, 2024.
- Physics-informed neural operator for learning partial differential equations. arXiv preprint arXiv:2111.03794, 2021b.
- Pdebench: An extensive benchmark for scientific machine learning. Advances in Neural Information Processing Systems, 35:1596–1611, 2022.
- Towards foundation models for scientific machine learning: Characterizing scaling and transfer behavior. arXiv preprint arXiv:2306.00258, 2023.
- Dpm: A deep learning pde augmentation method with application to large-eddy simulation. Journal of Computational Physics, 423:109811, 2020.
- Eqsim—a multidisciplinary framework for fault-to-structure earthquake simulations on exascale computers part i: Computational models and workflow. Earthquake Spectra, 37(2):707–735, 2021.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
- Unsupervised learning of visual representations by solving jigsaw puzzles. In European conference on computer vision, pages 69–84. Springer, 2016.
- Unsupervised representation learning by predicting image rotations. arXiv preprint arXiv:1803.07728, 2018.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Cutting down on prompts and parameters: Simple few-shot learning with language models. arXiv preprint arXiv:2106.13353, 2021.
- In-context operator learning with data prompts for differential equation problems. Proceedings of the National Academy of Sciences, 120(39):e2310142120, 2023a.
- Prompting in-context operator learning with sensor data, equations, and natural language. arXiv preprint arXiv:2308.05061, 2023b.
- Artificial neural networks for solving ordinary and partial differential equations. IEEE transactions on neural networks, 9(5):987–1000, 1998.
- Neural-network methods for boundary value problems with irregular boundaries. IEEE Transactions on Neural Networks, 11(5):1041–1049, 2000.
- Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems. IEEE transactions on neural networks, 6(4):911–917, 1995a.
- Approximation capability to functions of several variables, nonlinear functionals, and operators by radial basis function neural networks. IEEE Transactions on Neural Networks, 6(4):904–910, 1995b.
- Physics-constrained deep learning for high-dimensional surrogate modeling and uncertainty quantification without labeled data. Journal of Computational Physics, 394:56–81, 2019.
- Modeling the dynamics of pde systems with physics-constrained deep auto-regressive networks. Journal of Computational Physics, 403:109056, 2020.
- Phygeonet: Physics-informed geometry-adaptive convolutional neural networks for solving parameterized steady-state pdes on irregular domain. Journal of Computational Physics, 428:110079, 2021.
- Phycrnet: Physics-informed convolutional-recurrent network for solving spatiotemporal pdes. Computer Methods in Applied Mechanics and Engineering, 389:114399, 2022.
- Characterizing possible failure modes in physics-informed neural networks. Advances in Neural Information Processing Systems, 34:26548–26560, 2021.
- C. Edwards. Neural networks learn to speed up simulations. Communications of the ACM, 65(5):27–29, 2022.
- Lie point symmetry data augmentation for neural pde solvers. In International Conference on Machine Learning, pages 2241–2256. PMLR, 2022.
- Artificial intelligence for science in quantum, atomistic, and continuum systems. arXiv preprint arXiv:2307.08423, 2023.
- Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR, 2019.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
- Contrastive multiview coding. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16, pages 776–794. Springer, 2020.
- Ishan Misra and Laurens van der Maaten. Self-supervised learning of pretext-invariant representations. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6707–6717, 2020.
- Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020.
- Improving language understanding by generative pre-training. OpenAI, 2018.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Emergent abilities of large language models. arXiv preprint arXiv:2206.07682, 2022.
- Self-supervised learning with lie symmetries for partial differential equations. arXiv preprint arXiv:2307.05432, 2023.
- Astroclip: Cross-modal pre-training for astronomical foundation models. arXiv preprint arXiv:2310.03024, 2023.
- Multiple physics pretraining for physical surrogate models. arXiv preprint arXiv:2310.02994, 2023.
- One-shot learning of object categories. IEEE transactions on pattern analysis and machine intelligence, 28(4):594–611, 2006.
- Sg-one: Similarity guidance network for one-shot semantic segmentation. IEEE transactions on cybernetics, 50(9):3855–3865, 2020.
- Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910, 2022.
- Deep language networks: Joint prompt training of stacked llms using variational inference. arXiv preprint arXiv:2306.12509, 2023.
- AK Nandakumaran and PS Datti. Partial differential equations: classical theory with a modern touch. Cambridge University Press, 2020.
- Machine learning for fluid mechanics. Annual Review of Fluid Mechanics, 52:477–508, 2020.
- Shallow neural networks for fluid flow reconstruction with limited sensors. Proceedings of the Royal Society A, 476(2238):20200097, 2020.
- Turbulence in focus: Benchmarking scaling behavior of 3d volumetric super-resolution with blastnet 2.0 data. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023.
- Superbench: A super-resolution benchmark dataset for scientific machine learning. arXiv preprint arXiv:2306.14070, 2023.
- Machine learning–accelerated computational fluid dynamics. Proceedings of the National Academy of Sciences, 118(21):e2101784118, 2021.
- Enhancing computational fluid dynamics with machine learning. Nature Computational Science, 2(6):358–366, 2022.
- Subgrid modelling for two-dimensional turbulence using neural networks. Journal of Fluid Mechanics, 858:122–144, 2019.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. Advances in neural information processing systems, 35:10078–10093, 2022.
- Self-produced guidance for weakly-supervised object localization. In Proceedings of the European conference on computer vision (ECCV), pages 597–613, 2018.
- Hypercorrelation squeeze for few-shot segmentation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6941–6952, 2021.
- Dense cross-query-and-support attention weighted mask aggregation for few-shot segmentation. In European Conference on Computer Vision, pages 151–168. Springer, 2022.
- Integrative few-shot learning for classification and segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9979–9990, 2022.
- Dynamic prototype convolution network for few-shot semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11553–11562, 2022.
- Hierarchical dense correlation distillation for few-shot segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23641–23651, 2023.
- The” something something” video database for learning and evaluating visual common sense. In Proceedings of the IEEE international conference on computer vision, pages 5842–5850, 2017.
- Grokking: Generalization beyond overfitting on small algorithmic datasets. arXiv preprint arXiv:2201.02177, 2022.
- Critical data size of language models from a grokking perspective. arXiv preprint arXiv:2401.10463, 2024.
- Stationary wave solutions of a system of reaction-diffusion equations derived from the fitzhugh–nagumo equations. SIAM Journal on Applied Mathematics, 44(1):96–110, 1984.
- Learning-rate-free learning by d-adaptation. arXiv preprint arXiv:2301.07733, 2023.