AstroPT: Scaling Large Observation Models for Astronomy (2405.14930v1)
Abstract: This work presents AstroPT, an autoregressive pretrained transformer developed with astronomical use-cases in mind. The AstroPT models presented here have been pretrained on 8.6 million $512 \times 512$ pixel $grz$-band galaxy postage stamp observations from the DESI Legacy Survey DR8. We train a selection of foundation models of increasing size from 1 million to 2.1 billion parameters, and find that AstroPT follows a similar saturating log-log scaling law to textual models. We also find that the models' performances on downstream tasks as measured by linear probing improves with model size up to the model parameter saturation point. We believe that collaborative community development paves the best route towards realising an open source `Large Observation Model' -- a model trained on data taken from the observational sciences at the scale seen in natural language processing. To this end, we release the source code, weights, and dataset for AstroPT under the MIT license, and invite potential collaborators to join us in collectively building and researching these models.
- The seventh data release of the Sloan Digital Sky Survey. Astrophysical Journal Supplement Series, 182(2):543, 2009. ISSN 0067-0049. doi: 10.1088/0067-0049/182/2/543.
- Scaling Laws for Generative Mixed-Modal Language Models. ArXiv, 2023. doi: 10.48550/arXiv.2301.03728.
- The Fifteenth Data Release of the Sloan Digital Sky Surveys: First Release of MaNGA-derived Quantities, Data Visualization Tools, and Stellar Library. Astrophysical Journal Supplement Series, 240(2):23, 2019. ISSN 0067-0049. doi: 10.3847/1538-4365/aaf651.
- Data compression and inference in cosmology with self-supervised machine learning. Monthly Notices of the Royal Astronomical Society, 527(3):7459–7481, 2024. ISSN 0035-8711. doi: 10.1093/mnras/stad3646.
- Scaling MLPs: A Tale of Inductive Bias. Advances in Neural Information Processing Systems, 36:60821–60840, 2023. URL https://papers.nips.cc/paper_files/paper/2023/hash/bf2a5ce85aea9ff40d9bf8b2c2561cae-Abstract-Conference.html.
- VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning. ArXiv e-prints, 2021. doi: 10.48550/arXiv.2105.04906.
- Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling. In International Conference on Machine Learning, pp. 2397–2430. PMLR, 2023. URL https://proceedings.mlr.press/v202/biderman23a.html.
- GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow, 2021. URL https://doi.org/10.5281/zenodo.5297715.
- Overview of the SDSS-IV MaNGA survey: mapping nearby galaxies at apache point observatory. Astrophysical Journal, 798(1):7, 2014. ISSN 0004-637X. doi: 10.1088/0004-637X/798/1/7.
- Domain adaptation techniques for improved cross-domain study of galaxy mergers. ArXiv e-prints, 2020. doi: 10.48550/arXiv.2011.03591.
- Automatic physical inference with information maximizing neural networks. Physical Review D, 97(8):083004, 2018. ISSN 2470-0029. doi: 10.1103/PhysRevD.97.083004.
- A simple framework for contrastive learning of visual representations. In ICML’20: Proceedings of the 37th International Conference on Machine Learning, volume 119, pp. 1597–1607. JMLR.org, 2020. doi: 10.5555/3524938.3525087.
- Learning Curves: Asymptotic Values and Rate of Convergence. Advances in Neural Information Processing Systems, 6, 1993. URL https://proceedings.neurips.cc/paper/1993/hash/1aa48fc4880bb0c9b8a3bf979d3b917e-Abstract.html.
- Overview of the DESI Legacy Imaging Surveys. Astronomical Journal, 157(5):168, 2019. ISSN 1538-3881. doi: 10.3847/1538-3881/ab089d.
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ArXiv e-prints, 2020. doi: 10.48550/arXiv.2010.11929.
- Scalable Pre-training of Large Autoregressive Image Models. ArXiv e-prints, 2024. doi: 10.48550/arXiv.2401.08541.
- Domain-Adversarial Training of Neural Networks. Journal of Machine Learning Research, 17(59):1–35, 2016. ISSN 1533-7928. URL https://jmlr.org/papers/v17/15-239.html.
- SkyNet: an efficient and robust neural network training tool for machine learning in astronomy. Monthly Notices of the Royal Astronomical Society, 441(2):1741–1759, 2014. ISSN 0035-8711. doi: 10.1093/mnras/stu642.
- Bootstrap your own latent a new approach to self-supervised learning. In NIPS ’20: Proceedings of the 34th International Conference on Neural Information Processing Systems, pp. 21271–21284. Curran Associates Inc., Red Hook, NY, USA, 2020. ISBN 978-1-71382954-6. doi: 10.5555/3495724.3497510.
- Textbooks Are All You Need. ArXiv e-prints, 2023. doi: 10.48550/arXiv.2306.11644.
- Self-supervised representation learning for astronomical images. The Astrophysical Journal Letters, 911(2):L33, 2021. doi: 10.3847/2041-8213/abf2c7. URL https://doi.org/10.3847/2041-8213/abf2c7.
- The Arecibo Legacy Fast ALFA Survey: The ALFALFA Extragalactic H i Source Catalog. Astrophysical Journal, 861(1):49, 2018. ISSN 0004-637X. doi: 10.3847/1538-4357/aac956.
- Masked Autoencoders Are Scalable Vision Learners. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 18–24. IEEE, 2022. doi: 10.1109/CVPR52688.2022.01553.
- Scaling Laws for Autoregressive Generative Modeling. ArXiv e-prints, 2020. doi: 10.48550/arXiv.2010.14701.
- Hinton, G. E. Training Products of Experts by Minimizing Contrastive Divergence. Neural Computation, 14(8):1771–1800, 2002. ISSN 0899-7667. doi: 10.1162/089976602760128018.
- A fast learning algorithm for deep belief nets. Neural Computation, 18(7):1527–1554, 2006. ISSN 0899-7667. doi: 10.1162/neco.2006.18.7.1527. URL https://doi.org/10.1162/neco.2006.18.7.1527.
- Denoising diffusion probabilistic models. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 6840–6851. Curran Associates, Inc., 2020.
- Training Compute-Optimal Large Language Models. ArXiv e-prints, 2022. doi: 10.48550/arXiv.2203.15556.
- Huber, P. J. Robust Estimation of a Location Parameter. Ann. Math. Stat., 35(1):73–101, 1964. ISSN 0003-4851. doi: 10.1214/aoms/1177703732.
- A brief review of contrastive learning applied to astrophysics. RAS Techniques and Instruments, 2(1):441–452, 2023. ISSN 2752-8200. doi: 10.1093/rasti/rzad028.
- The Platonic Representation Hypothesis. ArXiv e-prints, 2024. doi: 10.48550/arXiv.2405.07987.
- LSST: From Science Drivers to Reference Design and Anticipated Data Products. The Astrophysical Journal, 873:111, 2019. doi: 10.3847/1538-4357/ab042c.
- Likelihood-free inference with neural compression of DES SV weak lensing map statistics. Monthly Notices of the Royal Astronomical Society, 501(1):954–969, 2021. ISSN 0035-8711. doi: 10.1093/mnras/staa3594.
- Scaling Laws for Neural Language Models. arXiv, 2020. doi: 10.48550/arXiv.2001.08361.
- Strong-Lensing Source Reconstruction with Denoising Diffusion Restoration Models. arXiv, 2022. doi: 10.48550/arXiv.2211.04365.
- Auto-Encoding Variational Bayes. ArXiv e-prints, 2013. doi: 10.48550/arXiv.1312.6114.
- Fader networks: manipulating images by sliding attributes. In NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 5969–5978. Curran Associates Inc., Red Hook, NY, USA, 2017. ISBN 978-1-51086096-4. doi: 10.5555/3295222.3295346.
- AstroCLIP: Cross-Modal Pre-Training for Astronomical Foundation Models. ArXiv e-prints, 2023. doi: 10.48550/arXiv.2310.03024.
- Towards an astronomical foundation model for stars with a transformer-based model. Monthly Notices of the Royal Astronomical Society, 527(1):1494–1520, 2023. ISSN 0035-8711. doi: 10.1093/mnras/stad3015.
- LLM360: Towards Fully Transparent Open-Source LLMs. ArXiv e-prints, 2023. doi: 10.48550/arXiv.2312.06550.
- UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. ArXiv e-prints, 2018. doi: 10.48550/arXiv.1802.03426.
- PAPERCLIP: Associating Astronomical Observations and Natural Language with Multi-Modal Models. ArXiv e-prints, 2024. doi: 10.48550/arXiv.2403.08851.
- Real-time Detection of Anomalies in Multivariate Time Series of Astronomical Data. ArXiv e-prints, 2021. doi: 10.48550/arXiv.2112.08415.
- AstroLLaMA: Towards Specialized Foundation Models in Astronomy. ArXiv e-prints, 2023. doi: 10.48550/arXiv.2309.06126.
- A new catalog of type 1 AGNs and its implications on the AGN unified model. Astrophysical Journal Supplement Series, 219(1):1, 2015. ISSN 0067-0049. doi: 10.1088/0067-0049/219/1/1.
- Pixel Recurrent Neural Networks. ArXiv e-prints, 2016a. doi: 10.48550/arXiv.1601.06759.
- Conditional image generation with PixelCNN decoders. In NIPS’16: Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 4797–4805. Curran Associates Inc., Red Hook, NY, USA, 2016b. ISBN 978-1-51083881-9. doi: 10.5555/3157382.3157633.
- RWKV: Reinventing RNNs for the Transformer Era. ArXiv e-prints, 2023. doi: 10.48550/arXiv.2305.13048.
- Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence. ArXiv e-prints, 2024. doi: 10.48550/arXiv.2404.05892.
- AstroLLaMA-Chat: Scaling AstroLLaMA with Conversational and Diverse Datasets. Research Notes of the AAS, 8(1):7, 2024. ISSN 2515-5172. doi: 10.3847/2515-5172/ad1abe.
- EleutherAI: Going Beyond ”Open Science” to ”Science in the Open”. ArXiv e-prints, 2022. doi: 10.48550/arXiv.2210.06413.
- Improving language understanding by generative pre-training. OpenAI Whitepaper, 2018. URL https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf.
- Language models are unsupervised multitask learners. OpenAI Whitepaper, 2019. URL https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf.
- Learning Transferable Visual Models From Natural Language Supervision. ArXiv e-prints, 2021. doi: 10.48550/arXiv.2103.00020.
- Raymond, E. S. The Cathedral and the Bazaar. O’Reilly & Associates, Inc., USA, 1st edition, 1999. ISBN 1565927249.
- A Generalist Agent. Transactions on Machine Learning Research, 2022. ISSN 2835-8856. URL https://openreview.net/forum?id=1ikK0kHjvj.
- Application-Driven Innovation in Machine Learning. ArXiv e-prints, 2024. doi: 10.48550/arXiv.2403.17381.
- PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications. ArXiv e-prints, 2017. doi: 10.48550/arXiv.1701.05517.
- Capturing the Physics of MaNGA Galaxies with Self-supervised Machine Learning. The Astrophysical Journal, 921(2):177, 2021. ISSN 0004-637X. doi: 10.3847/1538-4357/ac1dac.
- Exploring galaxy evolution with generative models. Astronomy & Astrophysics, 616:L16, 2018. ISSN 0004-6361. doi: 10.1051/0004-6361/201833800.
- Compute Trends Across Three Eras of Machine Learning. arXiv, 2022. doi: 10.48550/arXiv.2202.05924.
- A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 362(6419):1140–1144, 2018. ISSN 0036-8075. doi: 10.1126/science.aar6404.
- Learning useful representations for radio astronomy ”in the wild” with contrastive learning. arXiv, 2022. doi: 10.48550/arXiv.2207.08666.
- Astronomia ex machina: a history, primer and outlook on neural networks in astronomy. R. Soc. Open Sci., 10(5):221454, 2023. ISSN 2054-5703. doi: 10.1098/rsos.221454.
- Realistic galaxy image simulation via score-based generative models. Monthly Notices of the Royal Astronomical Society, 511(2):1808–1818, 2022. doi: 10.1093/mnras/stac130.
- EarthPT: a foundation model for Earth Observation. In NeurIPS 2023 Workshop on Tackling Climate Change with Machine Learning, 2023a. URL https://www.climatechange.ai/papers/neurips2023/2.
- ConvNets Match Vision Transformers at Scale. arXiv e-prints, pp. arXiv:2310.16764, 2023b. doi: 10.48550/arXiv.2310.16764.
- AstroVaDEr: Astronomical Variational Deep Embedder for Unsupervised Morphological Classification of Galaxies and Synthetic Image Generation. Monthly Notices of the Royal Astronomical Society, 502:985, 2020. ISSN 0035-8711. doi: 10.1093/mnras/staa3670. URL https://doi.org/10.1093/mnras/staa3670. staa3670.
- Energy and Policy Considerations for Deep Learning in NLP. ACL Anthology, pp. 3645–3650, 2019. doi: 10.18653/v1/P19-1355.
- Sutton, R. The Bitter Lesson, 2019. URL http://incompleteideas.net/IncIdeas/BitterLesson.html.
- LLaMA: Open and Efficient Foundation Language Models. ArXiv e-prints, 2023a. doi: 10.48550/arXiv.2302.13971.
- Llama 2: Open Foundation and Fine-Tuned Chat Models. ArXiv e-prints, 2023b. doi: 10.48550/arXiv.2307.09288.
- Attention Is All You Need. arXiv, 2017. doi: 10.48550/arXiv.1706.03762.
- Practical galaxy morphology tools from deep supervised representation learning. Monthly Notices of the Royal Astronomical Society, 513(2):1581–1599, 2022. ISSN 0035-8711. doi: 10.1093/mnras/stac525.
- Galaxy Zoo DESI: Detailed morphology measurements for 8.7M galaxies in the DESI Legacy Imaging Surveys. Monthly Notices of the Royal Astronomical Society, 526(3):4768–4786, 2023. ISSN 0035-8711. doi: 10.1093/mnras/stad2919.
- Scaling Laws for Galaxy Images. ArXiv e-prints, 2024. doi: 10.48550/arXiv.2404.02973.
- Emergent Abilities of Large Language Models. Transactions on Machine Learning Research, 2022. ISSN 2835-8856. URL https://openreview.net/forum?id=yzkSU5zdwD.
- To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis. ArXiv e-prints, 2023. doi: 10.48550/arXiv.2305.13230.
- A deep learning approach to test the small-scale galaxy morphology and its relationship with star formation activity in hydrodynamical simulations. Monthly Notices of the Royal Astronomical Society, 501(3):4359–4382, 2021. ISSN 0035-8711. doi: 10.1093/mnras/staa3864.
- TinyLlama: An Open-Source Small Language Model. ArXiv e-prints, 2024. doi: 10.48550/arXiv.2401.02385.
- OPT: Open Pre-trained Transformer Language Models. ArXiv e-prints, 2022. doi: 10.48550/arXiv.2205.01068.
- The clustering of DESI-like luminous red galaxies using photometric redshifts. Monthly Notices of the Royal Astronomical Society, 501(3):3309–3331, 2021. ISSN 0035-8711. doi: 10.1093/mnras/staa3764.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.