Inference from Real-World Sparse Measurements (2210.11269v7)
Abstract: Real-world problems often involve complex and unstructured sets of measurements, which occur when sensors are sparsely placed in either space or time. Being able to model this irregular spatiotemporal data and extract meaningful forecasts is crucial. Deep learning architectures capable of processing sets of measurements with positions varying from set to set, and extracting readouts anywhere are methodologically difficult. Current state-of-the-art models are graph neural networks and require domain-specific knowledge for proper setup. We propose an attention-based model focused on robustness and practical applicability, with two key design contributions. First, we adopt a ViT-like transformer that takes both context points and read-out positions as inputs, eliminating the need for an encoder-decoder structure. Second, we use a unified method for encoding both context and read-out positions. This approach is intentionally straightforward and integrates well with other systems. Compared to existing approaches, our model is simpler, requires less specialized knowledge, and does not suffer from a problematic bottleneck effect, all of which contribute to superior performance. We conduct in-depth ablation studies that characterize this problematic bottleneck in the latent representations of alternative models that inhibit information utilization and impede training efficiency. We also perform experiments across various problem domains, including high-altitude wind nowcasting, two-day weather forecasting, fluid dynamics, and heat diffusion. Our attention-based model consistently outperforms state-of-the-art models in handling irregularly sampled data. Notably, our model reduces the root mean square error (RMSE) for wind nowcasting from 9.24 to 7.98 and for heat diffusion tasks from 0.126 to 0.084.
- Graph element networks: adaptive, structured computation and memory. In International Conference on Machine Learning, pp. 212–222. PMLR, 2019.
- Flashattention: Fast and memory-efficient exact attention with IO-awareness. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=H4DqfPSibmx.
- GPT3.int8(): 8-bit matrix multiplication for transformers at scale. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=dXiGWqBoxaD.
- BERT: pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, and Thamar Solorio (eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, 2019. doi: 10.18653/v1/n19-1423. URL https://doi.org/10.18653/v1/n19-1423.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Scaling spherical cnns. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pp. 9396–9411. PMLR, 2023. URL https://proceedings.mlr.press/v202/esteves23a.html.
- Fast graph representation learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.
- Conditional neural processes. In Jennifer Dy and Andreas Krause (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp. 1704–1713. PMLR, 10–15 Jul 2018. URL https://proceedings.mlr.press/v80/garnelo18a.html.
- Transformers for modeling physical systems. Neural Networks, 146:272–289, 2022.
- Grun: an observation-based global gridded runoff dataset from 1902 to 2014. Earth System Science Data, 11(4):1655–1674, 2019. doi: 10.5194/essd-11-1655-2019. URL https://essd.copernicus.org/articles/11/1655/2019/.
- Latent variable sequential set transformers for joint multi-agent motion prediction. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=Dup_dDqkZC5.
- Towards multi-spatiotemporal-scale generalized pde modeling, 2022. URL https://arxiv.org/abs/2209.15616.
- ERA5 hourly data on single levels from 1940 to present. Copernicus Climate Change Service (C3S) Climate Data Store (CDS), 2023. Accessed on 17-05-2023.
- Perceiver io: A general architecture for structured inputs & outputs. arXiv preprint arXiv:2107.14795, 2021.
- EAGLE: Large-scale learning of turbulent fluid dynamics with mesh transformers. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=mfIX4QpsARJ.
- Transformers are RNNs: Fast autoregressive transformers with linear attention. In Hal Daumé III and Aarti Singh (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp. 5156–5165. PMLR, 13–18 Jul 2020. URL https://proceedings.mlr.press/v119/katharopoulos20a.html.
- Attentive neural processes. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=SkE6PjC9KX.
- Graphcast: Learning skillful medium-range global weather forecasting, 2022. URL https://arxiv.org/abs/2212.12794.
- Set transformer. In International Conference on Machine Learning, 2019.
- Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Advances in neural information processing systems, 32, 2019.
- Fourier neural operator for parametric partial differential equations. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=c8P9NQVtmnO.
- Fourier neural operator with learned deformations for pdes on general geometries, 2022a.
- Fourier neural operator with learned deformations for pdes on general geometries, 2022b.
- Wayformer: Motion forecasting via simple & efficient attention networks. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 2980–2987, 2023. doi: 10.1109/ICRA48891.2023.10160609.
- Efficient wind speed nowcasting with gpu-accelerated nearest neighbors algorithm, 2021. URL https://arxiv.org/abs/2112.10408.
- Learning mesh-based simulation with graph networks. In International Conference on Learning Representations, 2020.
- Improving language understanding by generative pre-training. Technical report, OpenAI, 2018. Available: https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf, Accessed: 16-01-2024.
- Robust speech recognition via large-scale weak supervision. arXiv preprint arXiv:2212.04356, 2022.
- Deep Learning for Precipitation Nowcasting: A Benchmark and A New Model. Advances in Neural Information Processing Systems, 2017.
- Skilful precipitation nowcasting using deep generative models of radar. Nature, 597(7878):672–677, 09 2021. ISSN 1476-4687. doi: 10.1038/s41586-021-03854-z. URL https://doi.org/10.1038/s41586-021-03854-z.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Omry Yadan. Hydra - a framework for elegantly configuring complex applications. Github, 2019. URL https://github.com/facebookresearch/hydra.
- Agentformer: Agent-aware transformers for socio-temporal multi-agent forecasting. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9813–9823, October 2021.
- Deep sets. CoRR, abs/1703.06114, 2017. URL http://arxiv.org/abs/1703.06114.