Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 150 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 19 tok/s Pro
GPT-5 High 24 tok/s Pro
GPT-4o 104 tok/s Pro
Kimi K2 171 tok/s Pro
GPT OSS 120B 427 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

AstroPT: Scaling Large Observation Models for Astronomy (2405.14930v1)

Published 23 May 2024 in astro-ph.IM, astro-ph.GA, and cs.LG

Abstract: This work presents AstroPT, an autoregressive pretrained transformer developed with astronomical use-cases in mind. The AstroPT models presented here have been pretrained on 8.6 million $512 \times 512$ pixel $grz$-band galaxy postage stamp observations from the DESI Legacy Survey DR8. We train a selection of foundation models of increasing size from 1 million to 2.1 billion parameters, and find that AstroPT follows a similar saturating log-log scaling law to textual models. We also find that the models' performances on downstream tasks as measured by linear probing improves with model size up to the model parameter saturation point. We believe that collaborative community development paves the best route towards realising an open source `Large Observation Model' -- a model trained on data taken from the observational sciences at the scale seen in natural language processing. To this end, we release the source code, weights, and dataset for AstroPT under the MIT license, and invite potential collaborators to join us in collectively building and researching these models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (82)
  1. The seventh data release of the Sloan Digital Sky Survey. Astrophysical Journal Supplement Series, 182(2):543, 2009. ISSN 0067-0049. doi: 10.1088/0067-0049/182/2/543.
  2. Scaling Laws for Generative Mixed-Modal Language Models. ArXiv, 2023. doi: 10.48550/arXiv.2301.03728.
  3. The Fifteenth Data Release of the Sloan Digital Sky Surveys: First Release of MaNGA-derived Quantities, Data Visualization Tools, and Stellar Library. Astrophysical Journal Supplement Series, 240(2):23, 2019. ISSN 0067-0049. doi: 10.3847/1538-4365/aaf651.
  4. Data compression and inference in cosmology with self-supervised machine learning. Monthly Notices of the Royal Astronomical Society, 527(3):7459–7481, 2024. ISSN 0035-8711. doi: 10.1093/mnras/stad3646.
  5. Scaling MLPs: A Tale of Inductive Bias. Advances in Neural Information Processing Systems, 36:60821–60840, 2023. URL https://papers.nips.cc/paper_files/paper/2023/hash/bf2a5ce85aea9ff40d9bf8b2c2561cae-Abstract-Conference.html.
  6. VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning. ArXiv e-prints, 2021. doi: 10.48550/arXiv.2105.04906.
  7. Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling. In International Conference on Machine Learning, pp. 2397–2430. PMLR, 2023. URL https://proceedings.mlr.press/v202/biderman23a.html.
  8. GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow, 2021. URL https://doi.org/10.5281/zenodo.5297715.
  9. Overview of the SDSS-IV MaNGA survey: mapping nearby galaxies at apache point observatory. Astrophysical Journal, 798(1):7, 2014. ISSN 0004-637X. doi: 10.1088/0004-637X/798/1/7.
  10. Domain adaptation techniques for improved cross-domain study of galaxy mergers. ArXiv e-prints, 2020. doi: 10.48550/arXiv.2011.03591.
  11. Automatic physical inference with information maximizing neural networks. Physical Review D, 97(8):083004, 2018. ISSN 2470-0029. doi: 10.1103/PhysRevD.97.083004.
  12. A simple framework for contrastive learning of visual representations. In ICML’20: Proceedings of the 37th International Conference on Machine Learning, volume 119, pp.  1597–1607. JMLR.org, 2020. doi: 10.5555/3524938.3525087.
  13. Learning Curves: Asymptotic Values and Rate of Convergence. Advances in Neural Information Processing Systems, 6, 1993. URL https://proceedings.neurips.cc/paper/1993/hash/1aa48fc4880bb0c9b8a3bf979d3b917e-Abstract.html.
  14. Overview of the DESI Legacy Imaging Surveys. Astronomical Journal, 157(5):168, 2019. ISSN 1538-3881. doi: 10.3847/1538-3881/ab089d.
  15. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ArXiv e-prints, 2020. doi: 10.48550/arXiv.2010.11929.
  16. Scalable Pre-training of Large Autoregressive Image Models. ArXiv e-prints, 2024. doi: 10.48550/arXiv.2401.08541.
  17. Domain-Adversarial Training of Neural Networks. Journal of Machine Learning Research, 17(59):1–35, 2016. ISSN 1533-7928. URL https://jmlr.org/papers/v17/15-239.html.
  18. SkyNet: an efficient and robust neural network training tool for machine learning in astronomy. Monthly Notices of the Royal Astronomical Society, 441(2):1741–1759, 2014. ISSN 0035-8711. doi: 10.1093/mnras/stu642.
  19. Bootstrap your own latent a new approach to self-supervised learning. In NIPS ’20: Proceedings of the 34th International Conference on Neural Information Processing Systems, pp.  21271–21284. Curran Associates Inc., Red Hook, NY, USA, 2020. ISBN 978-1-71382954-6. doi: 10.5555/3495724.3497510.
  20. Textbooks Are All You Need. ArXiv e-prints, 2023. doi: 10.48550/arXiv.2306.11644.
  21. Self-supervised representation learning for astronomical images. The Astrophysical Journal Letters, 911(2):L33, 2021. doi: 10.3847/2041-8213/abf2c7. URL https://doi.org/10.3847/2041-8213/abf2c7.
  22. The Arecibo Legacy Fast ALFA Survey: The ALFALFA Extragalactic H i Source Catalog. Astrophysical Journal, 861(1):49, 2018. ISSN 0004-637X. doi: 10.3847/1538-4357/aac956.
  23. Masked Autoencoders Are Scalable Vision Learners. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  18–24. IEEE, 2022. doi: 10.1109/CVPR52688.2022.01553.
  24. Scaling Laws for Autoregressive Generative Modeling. ArXiv e-prints, 2020. doi: 10.48550/arXiv.2010.14701.
  25. Hinton, G. E. Training Products of Experts by Minimizing Contrastive Divergence. Neural Computation, 14(8):1771–1800, 2002. ISSN 0899-7667. doi: 10.1162/089976602760128018.
  26. A fast learning algorithm for deep belief nets. Neural Computation, 18(7):1527–1554, 2006. ISSN 0899-7667. doi: 10.1162/neco.2006.18.7.1527. URL https://doi.org/10.1162/neco.2006.18.7.1527.
  27. Denoising diffusion probabilistic models. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  6840–6851. Curran Associates, Inc., 2020.
  28. Training Compute-Optimal Large Language Models. ArXiv e-prints, 2022. doi: 10.48550/arXiv.2203.15556.
  29. Huber, P. J. Robust Estimation of a Location Parameter. Ann. Math. Stat., 35(1):73–101, 1964. ISSN 0003-4851. doi: 10.1214/aoms/1177703732.
  30. A brief review of contrastive learning applied to astrophysics. RAS Techniques and Instruments, 2(1):441–452, 2023. ISSN 2752-8200. doi: 10.1093/rasti/rzad028.
  31. The Platonic Representation Hypothesis. ArXiv e-prints, 2024. doi: 10.48550/arXiv.2405.07987.
  32. LSST: From Science Drivers to Reference Design and Anticipated Data Products. The Astrophysical Journal, 873:111, 2019. doi: 10.3847/1538-4357/ab042c.
  33. Likelihood-free inference with neural compression of DES SV weak lensing map statistics. Monthly Notices of the Royal Astronomical Society, 501(1):954–969, 2021. ISSN 0035-8711. doi: 10.1093/mnras/staa3594.
  34. Scaling Laws for Neural Language Models. arXiv, 2020. doi: 10.48550/arXiv.2001.08361.
  35. Strong-Lensing Source Reconstruction with Denoising Diffusion Restoration Models. arXiv, 2022. doi: 10.48550/arXiv.2211.04365.
  36. Auto-Encoding Variational Bayes. ArXiv e-prints, 2013. doi: 10.48550/arXiv.1312.6114.
  37. Fader networks: manipulating images by sliding attributes. In NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp.  5969–5978. Curran Associates Inc., Red Hook, NY, USA, 2017. ISBN 978-1-51086096-4. doi: 10.5555/3295222.3295346.
  38. AstroCLIP: Cross-Modal Pre-Training for Astronomical Foundation Models. ArXiv e-prints, 2023. doi: 10.48550/arXiv.2310.03024.
  39. Towards an astronomical foundation model for stars with a transformer-based model. Monthly Notices of the Royal Astronomical Society, 527(1):1494–1520, 2023. ISSN 0035-8711. doi: 10.1093/mnras/stad3015.
  40. LLM360: Towards Fully Transparent Open-Source LLMs. ArXiv e-prints, 2023. doi: 10.48550/arXiv.2312.06550.
  41. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. ArXiv e-prints, 2018. doi: 10.48550/arXiv.1802.03426.
  42. PAPERCLIP: Associating Astronomical Observations and Natural Language with Multi-Modal Models. ArXiv e-prints, 2024. doi: 10.48550/arXiv.2403.08851.
  43. Real-time Detection of Anomalies in Multivariate Time Series of Astronomical Data. ArXiv e-prints, 2021. doi: 10.48550/arXiv.2112.08415.
  44. AstroLLaMA: Towards Specialized Foundation Models in Astronomy. ArXiv e-prints, 2023. doi: 10.48550/arXiv.2309.06126.
  45. A new catalog of type 1 AGNs and its implications on the AGN unified model. Astrophysical Journal Supplement Series, 219(1):1, 2015. ISSN 0067-0049. doi: 10.1088/0067-0049/219/1/1.
  46. Pixel Recurrent Neural Networks. ArXiv e-prints, 2016a. doi: 10.48550/arXiv.1601.06759.
  47. Conditional image generation with PixelCNN decoders. In NIPS’16: Proceedings of the 30th International Conference on Neural Information Processing Systems, pp.  4797–4805. Curran Associates Inc., Red Hook, NY, USA, 2016b. ISBN 978-1-51083881-9. doi: 10.5555/3157382.3157633.
  48. RWKV: Reinventing RNNs for the Transformer Era. ArXiv e-prints, 2023. doi: 10.48550/arXiv.2305.13048.
  49. Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence. ArXiv e-prints, 2024. doi: 10.48550/arXiv.2404.05892.
  50. AstroLLaMA-Chat: Scaling AstroLLaMA with Conversational and Diverse Datasets. Research Notes of the AAS, 8(1):7, 2024. ISSN 2515-5172. doi: 10.3847/2515-5172/ad1abe.
  51. EleutherAI: Going Beyond ”Open Science” to ”Science in the Open”. ArXiv e-prints, 2022. doi: 10.48550/arXiv.2210.06413.
  52. Improving language understanding by generative pre-training. OpenAI Whitepaper, 2018. URL https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf.
  53. Language models are unsupervised multitask learners. OpenAI Whitepaper, 2019. URL https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf.
  54. Learning Transferable Visual Models From Natural Language Supervision. ArXiv e-prints, 2021. doi: 10.48550/arXiv.2103.00020.
  55. Raymond, E. S. The Cathedral and the Bazaar. O’Reilly & Associates, Inc., USA, 1st edition, 1999. ISBN 1565927249.
  56. A Generalist Agent. Transactions on Machine Learning Research, 2022. ISSN 2835-8856. URL https://openreview.net/forum?id=1ikK0kHjvj.
  57. Application-Driven Innovation in Machine Learning. ArXiv e-prints, 2024. doi: 10.48550/arXiv.2403.17381.
  58. PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications. ArXiv e-prints, 2017. doi: 10.48550/arXiv.1701.05517.
  59. Capturing the Physics of MaNGA Galaxies with Self-supervised Machine Learning. The Astrophysical Journal, 921(2):177, 2021. ISSN 0004-637X. doi: 10.3847/1538-4357/ac1dac.
  60. Exploring galaxy evolution with generative models. Astronomy & Astrophysics, 616:L16, 2018. ISSN 0004-6361. doi: 10.1051/0004-6361/201833800.
  61. Compute Trends Across Three Eras of Machine Learning. arXiv, 2022. doi: 10.48550/arXiv.2202.05924.
  62. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 362(6419):1140–1144, 2018. ISSN 0036-8075. doi: 10.1126/science.aar6404.
  63. Learning useful representations for radio astronomy ”in the wild” with contrastive learning. arXiv, 2022. doi: 10.48550/arXiv.2207.08666.
  64. Astronomia ex machina: a history, primer and outlook on neural networks in astronomy. R. Soc. Open Sci., 10(5):221454, 2023. ISSN 2054-5703. doi: 10.1098/rsos.221454.
  65. Realistic galaxy image simulation via score-based generative models. Monthly Notices of the Royal Astronomical Society, 511(2):1808–1818, 2022. doi: 10.1093/mnras/stac130.
  66. EarthPT: a foundation model for Earth Observation. In NeurIPS 2023 Workshop on Tackling Climate Change with Machine Learning, 2023a. URL https://www.climatechange.ai/papers/neurips2023/2.
  67. ConvNets Match Vision Transformers at Scale. arXiv e-prints, pp.  arXiv:2310.16764, 2023b. doi: 10.48550/arXiv.2310.16764.
  68. AstroVaDEr: Astronomical Variational Deep Embedder for Unsupervised Morphological Classification of Galaxies and Synthetic Image Generation. Monthly Notices of the Royal Astronomical Society, 502:985, 2020. ISSN 0035-8711. doi: 10.1093/mnras/staa3670. URL https://doi.org/10.1093/mnras/staa3670. staa3670.
  69. Energy and Policy Considerations for Deep Learning in NLP. ACL Anthology, pp.  3645–3650, 2019. doi: 10.18653/v1/P19-1355.
  70. Sutton, R. The Bitter Lesson, 2019. URL http://incompleteideas.net/IncIdeas/BitterLesson.html.
  71. LLaMA: Open and Efficient Foundation Language Models. ArXiv e-prints, 2023a. doi: 10.48550/arXiv.2302.13971.
  72. Llama 2: Open Foundation and Fine-Tuned Chat Models. ArXiv e-prints, 2023b. doi: 10.48550/arXiv.2307.09288.
  73. Attention Is All You Need. arXiv, 2017. doi: 10.48550/arXiv.1706.03762.
  74. Practical galaxy morphology tools from deep supervised representation learning. Monthly Notices of the Royal Astronomical Society, 513(2):1581–1599, 2022. ISSN 0035-8711. doi: 10.1093/mnras/stac525.
  75. Galaxy Zoo DESI: Detailed morphology measurements for 8.7M galaxies in the DESI Legacy Imaging Surveys. Monthly Notices of the Royal Astronomical Society, 526(3):4768–4786, 2023. ISSN 0035-8711. doi: 10.1093/mnras/stad2919.
  76. Scaling Laws for Galaxy Images. ArXiv e-prints, 2024. doi: 10.48550/arXiv.2404.02973.
  77. Emergent Abilities of Large Language Models. Transactions on Machine Learning Research, 2022. ISSN 2835-8856. URL https://openreview.net/forum?id=yzkSU5zdwD.
  78. To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis. ArXiv e-prints, 2023. doi: 10.48550/arXiv.2305.13230.
  79. A deep learning approach to test the small-scale galaxy morphology and its relationship with star formation activity in hydrodynamical simulations. Monthly Notices of the Royal Astronomical Society, 501(3):4359–4382, 2021. ISSN 0035-8711. doi: 10.1093/mnras/staa3864.
  80. TinyLlama: An Open-Source Small Language Model. ArXiv e-prints, 2024. doi: 10.48550/arXiv.2401.02385.
  81. OPT: Open Pre-trained Transformer Language Models. ArXiv e-prints, 2022. doi: 10.48550/arXiv.2205.01068.
  82. The clustering of DESI-like luminous red galaxies using photometric redshifts. Monthly Notices of the Royal Astronomical Society, 501(3):3309–3331, 2021. ISSN 0035-8711. doi: 10.1093/mnras/staa3764.

Summary

  • The paper demonstrates that increasing model size enhances performance in astronomical tasks, with saturation observed near 89M parameters.
  • The paper employs causal autoregressive training using 16x16 pixel patches and a Huber loss to generate semantically rich embeddings from 8.6M galaxy images.
  • The paper emphasizes open-source collaboration and highlights future research directions to integrate multimodal data for advanced astronomical analysis.

An Overview of 'AstroPT: Scaling Large Observation Models for Astronomy'

The paper "AstroPT: Scaling Large Observation Models for Astronomy" details the development and findings associated with AstroPT, an autoregressive transformer model specifically devised for astronomical use. AstroPT has been pretrained on galaxy observations from the DESI Legacy Survey DR8, preparing it for various downstream tasks within the field of astronomical data analysis. This research echoes the trends observed in natural language processing, emphasizing the scalability of neural networks and the utility of autoregressive pretrained transformers in this context.

Development and Methodology

AstroPT was constructed with a specific focus on leveraging the characteristics and challenges inherent in astronomical data. The paper narrates the training of foundation models with parameter counts ranging from 1 million to 2.1 billion, empirically demonstrating that the performance improves with size until a saturation point is reached. This finding is in line with the established scaling laws for neural networks found in textual data settings.

The training dataset comprised 8.6 million galaxy images, efficiently processed through a causal autoregressive training protocol. A noteworthy aspect of AstroPT's methodology is its ability to effectively utilize multimodal data sources, which is pertinent given the diverse data types present in observational sciences. The use of 16 × 16 pixel patches as tokens and the application of a Huber loss function are distinctive methodological choices that underline the model's adaptability to large-scale astronomical datasets.

Results and Observations

The results obtained from AstroPT indicate a significant correlation between model size and task performance, with the saturation occurring near 89 million parameters. The paper employs linear probing to assess the embeddings' scientific value by predicting galaxy properties and morphologies, showing improved performance with larger models. This substantiates their claim of semantically meaningful learning from the pretraining routine. Remarkably, emergent abilities were observed, suggesting that model capacity significantly affects the complexity of learnable tasks.

The paper also highlights the efficiency benefits of causal transformers in pretraining time due to their widespread adoption and adaptability for autoregressive generative tasks. Specifically, AstroPT's design enables it to be a robust tool in assessing diverse scientific tasks, stretching its utility beyond the immediate dataset it was trained on.

Implications and Future Directions

AstroPT's development and its open-source availability are strategic in encouraging collaboration for scaling and applying large observation models. By sharing the model weights, dataset, and code openly, the authors emphasize collective endeavor and progression in the domain. This accessibility aligns well with the open science movement, promoting further research and adaptation of such models in observing sciences.

In terms of implications, the potential for integrating multimodal data sources to overcome token data shortages presents a promising frontier in model training. This approach can catalyze advancements in cross-modal foundations in scientific inquiry, enabling the fusion of textual and observational data for enhanced analytical capability.

The paper suggests future directions, including exploring more information-dense observational modalities and further dissecting scaling laws within this context. These pursuits could extend the applicability of autoregressive models in astronomy and other areas reliant on large, intricate datasets.

Conclusion

AstroPT stands as a technological testament to the adaptability and utility of autoregressive models in the astronomical domain. The model serves as a bridge, demonstrating how neural scaling principles from NLP can be effectively transferred and applied to observational sciences. By choosing a deliberately community-focused development strategy, the research encourages further exploration and refinement of large observation models, promising wider-reaching impacts in both theoretical and practical capacities within and beyond astronomy.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 7 tweets and received 34 likes.

Upgrade to Pro to view all of the tweets about this paper:

Youtube Logo Streamline Icon: https://streamlinehq.com
Reddit Logo Streamline Icon: https://streamlinehq.com