Estimating Probability Densities with Transformer and Denoising Diffusion (2407.15703v1)
Abstract: Transformers are often the go-to architecture to build foundation models that ingest a large amount of training data. But these models do not estimate the probability density distribution when trained on regression problems, yet obtaining full probabilistic outputs is crucial to many fields of science, where the probability distribution of the answer can be non-Gaussian and multimodal. In this work, we demonstrate that training a probabilistic model using a denoising diffusion head on top of the Transformer provides reasonable probability density estimation even for high-dimensional inputs. The combined Transformer+Denoising Diffusion model allows conditioning the output probability density on arbitrary combinations of inputs and it is thus a highly flexible density function emulator of all possible input/output combinations. We illustrate our Transformer+Denoising Diffusion model by training it on a large dataset of astronomical observations and measured labels of stars within our Galaxy and we apply it to a variety of inference tasks to show that the model can infer labels accurately with reasonable distributions.
- Abdurro’uf and et al. The Seventeenth Data Release of the Sloan Digital Sky Surveys: Complete Release of MaNGA, MaStar, and APOGEE-2 Data. ApJS, 259(2):35, April 2022. doi: 10.3847/1538-4365/ac4414.
- Sloan Digital Sky Survey IV: Mapping the Milky Way, Nearby Galaxies, and the Distant Universe. AJ, 154(1):28, July 2017. doi: 10.3847/1538-3881/aa7567.
- On Galactic Density Modeling in the Presence of Dust Extinction. ApJ, 818(2):130, February 2016. doi: 10.3847/0004-637X/818/2/130.
- Internal calibration of Gaia BP/RP low-resolution spectra. A&A, 652:A86, August 2021. doi: 10.1051/0004-6361/202141249.
- Gaia Data Release 3. Processing and validation of BP/RP low-resolution spectral data. A&A, 674:A2, June 2023. doi: 10.1051/0004-6361/202243680.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv e-prints, art. arXiv:1810.04805, October 2018. doi: 10.48550/arXiv.1810.04805.
- Overview of the DESI Legacy Imaging Surveys. AJ, 157(5):168, May 2019. doi: 10.3847/1538-3881/ab089d.
- ASPCAP: The APOGEE Stellar Parameter and Chemical Abundances Pipeline. AJ, 151(6):144, June 2016. doi: 10.3847/0004-6256/151/6/144.
- All-in-one simulation-based inference. arXiv e-prints, art. arXiv:2404.09636, April 2024. doi: 10.48550/arXiv.2404.09636.
- Denoising Diffusion Probabilistic Models. arXiv e-prints, art. arXiv:2006.11239, June 2020. doi: 10.48550/arXiv.2006.11239.
- Ivezić, Ž. et al. LSST: From Science Drivers to Reference Design and Anticipated Data Products. ApJ, 873(2):111, March 2019. doi: 10.3847/1538-4357/ab042c.
- Variational Inference with Normalizing Flows. arXiv e-prints, art. arXiv:1505.05770, May 2015. doi: 10.48550/arXiv.1505.05770.
- Diffusion On Syntax Trees For Program Synthesis. arXiv e-prints, art. arXiv:2405.20519, May 2024. doi: 10.48550/arXiv.2405.20519.
- SDSS-V: Pioneering Panoptic Spectroscopy. arXiv e-prints, art. arXiv:1711.03234, November 2017. doi: 10.48550/arXiv.1711.03234.
- Towards an astronomical foundation model for stars with a transformer-based model. MNRAS, 527(1):1494–1520, January 2024. doi: 10.1093/mnras/stad3015.
- Multiple Physics Pretraining for Physical Surrogate Models. arXiv e-prints, art. arXiv:2310.02994, October 2023. doi: 10.48550/arXiv.2310.02994.
- Gaia Data Release 3. External calibration of BP/RP low-resolution spectroscopic data. A&A, 674:A3, June 2023. doi: 10.1051/0004-6361/202243880.
- Sparse spatial autoregressions. Statistics & Probability Letters, 33(3):291–297, May 1997. URL http://www.sciencedirect.com/science/article/B6V1D-3WNMX3B-B/1/4baafcb4328934d470158b0233c44102.
- Scalable Diffusion Models with Transformers. arXiv e-prints, art. arXiv:2212.09748, December 2022. doi: 10.48550/arXiv.2212.09748.
- The Two Micron All Sky Survey (2MASS). AJ, 131(2):1163–1183, February 2006. doi: 10.1086/498708.
- Scaling Laws for Galaxy Images. arXiv e-prints, art. arXiv:2404.02973, April 2024. doi: 10.48550/arXiv.2404.02973.