GNOT: A General Neural Operator Transformer for Operator Learning (2302.14376v3)
Abstract: Learning partial differential equations' (PDEs) solution operators is an essential problem in machine learning. However, there are several challenges for learning operators in practical applications like the irregular mesh, multiple input functions, and complexity of the PDEs' solution. To address these challenges, we propose a general neural operator transformer (GNOT), a scalable and effective transformer-based framework for learning operators. By designing a novel heterogeneous normalized attention layer, our model is highly flexible to handle multiple input functions and irregular meshes. Besides, we introduce a geometric gating mechanism which could be viewed as a soft domain decomposition to solve the multi-scale problems. The large model capacity of the transformer architecture grants our model the possibility to scale to large datasets and practical problems. We conduct extensive experiments on multiple challenging datasets from different domains and achieve a remarkable improvement compared with alternative methods. Our code and data are publicly available at \url{https://github.com/thu-ml/GNOT}.
- Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150, 2020.
- Cao, S. Choose a transformer: Fourier or galerkin. Advances in Neural Information Processing Systems, 34:24924–24940, 2021.
- Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509, 2019.
- Rethinking attention with performers. arXiv preprint arXiv:2009.14794, 2020.
- Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity, 2021.
- Towards large-scale learned solvers for parametric pdes with model-parallel fourier neural operators. arXiv preprint arXiv:2204.01205, 2022.
- Multiwavelet-based operator learning for differential equations. Advances in Neural Information Processing Systems, 34:24048–24062, 2021.
- Physics-informed machine learning: A survey on problems, methods and applications. arXiv preprint arXiv:2211.08064, 2022.
- Augmented physics-informed neural networks (apinns): A gating network-based soft domain decomposition methodology. arXiv preprint arXiv:2211.08939, 2022.
- Ccnet: Criss-cross attention for semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 603–612, 2019.
- Adaptive mixtures of local experts. Neural computation, 3(1):79–87, 1991.
- Extended physics-informed neural networks (xpinns): A generalized space-time domain decomposition based deep learning framework for nonlinear partial differential equations. In AAAI Spring Symposium: MLPS, 2021.
- Mionet: Learning multiple-input operators via tensor product. arXiv preprint arXiv:2202.06137, 2022.
- Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
- Transformers are rnns: Fast autoregressive transformers with linear attention. In International Conference on Machine Learning, pp. 5156–5165. PMLR, 2020.
- Reformer: The efficient transformer. arXiv preprint arXiv:2001.04451, 2020.
- Lee, S. Mesh-independent operator learning for partial differential equations. In ICML 2022 2nd AI for Science Workshop.
- Gshard: Scaling giant models with conditional computation and automatic sharding. arXiv preprint arXiv:2006.16668, 2020.
- Fourier neural operator for parametric partial differential equations. arXiv preprint arXiv:2010.08895, 2020.
- Fourier neural operator with learned deformations for pdes on general geometries. arXiv preprint arXiv:2207.05209, 2022a.
- Transformer for partial differential equations’ operator learning. arXiv preprint arXiv:2205.13671, 2022b.
- Nuno: A general framework for learning parametric pdes with non-uniform data. arXiv preprint arXiv:2305.18694, 2023.
- Ht-net: Hierarchical transformer based operator learning model for multiscale pdes. arXiv preprint arXiv:2210.10890, 2022.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022, 2021.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- Deeponet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators. arXiv preprint arXiv:1910.03193, 2019.
- Owen, S. J. A survey of unstructured mesh generation technology. IMR, 239:267, 1998.
- Random feature attention. arXiv preprint arXiv:2103.02143, 2021.
- Variable-input deep operator networks. arXiv preprint arXiv:2205.11404, 2022.
- U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Springer, 2015.
- Super-convergence: Very fast training of neural networks using large learning rates. In Artificial intelligence and machine learning for multi-domain operations applications, volume 11006, pp. 369–386. SPIE, 2019.
- Efficient transformers: A survey. ACM Computing Surveys (CSUR), 2020.
- Factorized fourier neural operators. arXiv preprint arXiv:2111.13802, 2021.
- Learning the solution operator of parametric partial differential equations with physics-informed deeponets. Science advances, 7(40):eabi8605, 2021.
- Improved architectures and training algorithms for deep operator networks. Journal of Scientific Computing, 92(2):1–42, 2022.
- Weinan, E. Principles of multiscale modeling. Cambridge University Press, 2011.
- U-fno—an enhanced fourier neural operator-based deep-learning model for multiphase flow. Advances in Water Resources, 163:104180, 2022.
- Introduction to partial differential equations with applications. Courier Corporation, 1986.