Dice Question Streamline Icon: https://streamlinehq.com

Necessity of encoder–decoder transformer pretraining for astronomy

Ascertain whether pretraining a full encoder–decoder transformer architecture is necessary to learn scientifically useful information from astronomical observations, or whether simpler decoding-only causal transformer architectures suffice.

Information Square Streamline Icon: https://streamlinehq.com

Background

The authors contrast their decoding-only causal transformer approach (AstroPT) with prior encoder–decoder transformer models used in astronomy (e.g., Leung 2023). They explicitly question if the computational and architectural overhead of full encoder–decoder pretraining is required to capture scientifically meaningful representations from astronomical data.

This issue is central to designing scalable, efficient LOMs for the observational sciences, particularly given the computational costs and community-driven goals of open-source development and accessible model training.

References

While a robust and innovative approach, \citet{ref_leung2023} leave some open questions which we hope to complement with this work: that is, can we scale neural networks on astronomical observation data just as we have done in the textual domain, and do we need the computational and architectural overhead of pretraining a full encoder-decoder transformer architecture to teach our models scientifically useful information?

AstroPT: Scaling Large Observation Models for Astronomy (2405.14930 - Smith et al., 23 May 2024) in Section 1, On 'Large Observation Models' (Introduction)