Necessity of encoder–decoder transformer pretraining for astronomy
Ascertain whether pretraining a full encoder–decoder transformer architecture is necessary to learn scientifically useful information from astronomical observations, or whether simpler decoding-only causal transformer architectures suffice.
References
While a robust and innovative approach, \citet{ref_leung2023} leave some open questions which we hope to complement with this work: that is, can we scale neural networks on astronomical observation data just as we have done in the textual domain, and do we need the computational and architectural overhead of pretraining a full encoder-decoder transformer architecture to teach our models scientifically useful information?
— AstroPT: Scaling Large Observation Models for Astronomy
(2405.14930 - Smith et al., 23 May 2024) in Section 1, On 'Large Observation Models' (Introduction)