Dice Question Streamline Icon: https://streamlinehq.com

Tokenization for unevenly-sampled light curves

Develop a tokenization scheme for unevenly-sampled stellar light curves that enables efficient representation learning for transformer-based autoregressive generative modeling, addressing strong local correlations between adjacent observations and reducing sequence length and training/inference cost.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper notes that treating each individual flux measurement as a token is overly simplistic given strong correlations between adjacent observations, which dilutes the effective sequence and increases computational cost.

Tokenization strategies based on patching have been successfully applied in images, audio, and evenly-sampled light curves, but the authors highlight that extending such tokenization to unevenly-sampled light curves is not yet resolved.

References

The tokenization of unevenly-sampled light curves, however, remains an open question.

The Scaling Law in Stellar Light Curves (2405.17156 - Pan et al., 27 May 2024) in Section Discussion