HEAL-ViT: Vision Transformers on a spherical mesh for medium-range weather forecasting (2403.17016v1)
Abstract: In recent years, a variety of ML architectures and techniques have seen success in producing skillful medium range weather forecasts. In particular, Vision Transformer (ViT)-based models (e.g. Pangu-Weather, FuXi) have shown strong performance, working nearly "out-of-the-box" by treating weather data as a multi-channel image on a rectilinear grid. While a rectilinear grid is appropriate for 2D images, weather data is inherently spherical and thus heavily distorted at the poles on a rectilinear grid, leading to disproportionate compute being used to model data near the poles. Graph-based methods (e.g. GraphCast) do not suffer from this problem, as they map the longitude-latitude grid to a spherical mesh, but are generally more memory intensive and tend to need more compute resources for training and inference. While spatially homogeneous, the spherical mesh does not lend itself readily to be modeled by ViT-based models that implicitly rely on the rectilinear grid structure. We present HEAL-ViT, a novel architecture that uses ViT models on a spherical mesh, thus benefiting from both the spatial homogeneity enjoyed by graph-based models and efficient attention-based mechanisms exploited by transformers. HEAL-ViT produces weather forecasts that outperform the ECMWF IFS on key metrics, and demonstrate better bias accumulation and blurring than other ML weather prediction models. Further, the lowered compute footprint of HEAL-ViT makes it attractive for operational use as well, where other models in addition to a 6-hourly prediction model may be needed to produce the full set of operational forecasts required.
- Layer normalization. arXiv preprint arXiv:1607.06450.
- Accurate medium-range global weather forecasting with 3d neural networks. Nature, 619(7970):533–538.
- Swinrdm: integrate swinrnn with diffusion model towards high-resolution and high-quality weather forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 322–330.
- Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast.
- Training deep nets with sublinear memory cost. corr abs/1604.06174 (2016). arXiv preprint arXiv:1604.06174.
- Scaling vision transformers to 22 billion parameters. In International Conference on Machine Learning, pages 7480–7512. PMLR.
- Evaluation of tropical cyclone track and intensity forecasts from purely ml-based weather prediction models, illustrated with fourcastnet. 104th AMS Annual Meeting.
- The healpix primer. arXiv preprint astro-ph/9905275.
- Evaluation of ECMWF forecasts, including the 2018 upgrade. European Centre for Medium Range Weather Forecasts Reading, UK.
- Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415.
- The ERA5 global reanalysis. Quarterly Journal of the Royal Meteorological Society, 146(730):1999–2049.
- xarray: N-D labeled arrays and datasets in Python. in prep, J. Open Res. Software.
- Swinvrnn: A data-driven ensemble forecasting model via learned distribution perturbation. Journal of Advances in Modeling Earth Systems, 15(2):e2022MS003211.
- A method for stochastic optimization. In International conference on learning representations (ICLR), volume 5, page 6. San Diego, California;.
- An image is worth 16x16 words: Transformers for image recognition at scale.
- Convolutional neural networks on the healpix sphere: a pixel-based algorithm and its application to cmb data analysis. Astronomy & Astrophysics, 628:A129.
- Graphcast: Learning skillful medium-range global weather forecasting.
- Ultra-deep neural networks without residuals. In Int. Conf. on Learning Representations, arXiv, Toulon, France, page 1605.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10012–10022.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101.
- Pytorch: An imperative style, high-performance deep learning library.
- FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators.
- Deepsphere: Efficient spherical convolutional neural network with healpix sampling for cosmological applications. Astronomy and Computing, 27:130–146.
- Weatherbench 2: A benchmark for the next generation of data-driven global weather models. arXiv preprint arXiv:2308.15560.
Collections
Sign up for free to add this paper to one or more collections.