Scaling neural networks on astronomical observation data

Determine whether neural networks trained on astronomical observation data can be scaled in a manner analogous to established scaling behavior in the textual domain (i.e., whether increasing model parameters and data leads to predictable performance improvements as observed for large language models).

Background

The paper proposes AstroPT, a decoding transformer pretrained on 8.6 million DESI-LS galaxy images, as a step toward general Large Observation Models (LOMs). In the language domain, scaling laws (e.g., Chinchilla) guide how model size and token count affect performance; whether similar scaling applies to observational modalities like astronomical images is explicitly posed as an open question.

This question is motivated by prior work (e.g., Leung 2023) that employs transformer-based architectures for stellar analysis and by the broader aim of confirming that the predictable log-log scaling seen in text also manifests when training on observational tokens from astronomy.

References

While a robust and innovative approach, \citet{ref_leung2023} leave some open questions which we hope to complement with this work: that is, can we scale neural networks on astronomical observation data just as we have done in the textual domain, and do we need the computational and architectural overhead of pretraining a full encoder-decoder transformer architecture to teach our models scientifically useful information?

— AstroPT: Scaling Large Observation Models for Astronomy (2405.14930 - Smith et al., 23 May 2024) in Section 1, On 'Large Observation Models' (Introduction)

Scaling neural networks on astronomical observation data

Background

References

Related Problems