Dice Question Streamline Icon: https://streamlinehq.com

Essential reason for long-sequence training instability

Ascertain the essential reason underlying the correlation between long biological sequence lengths and instability during training of foundation models for biological sequences, in order to enable stable training without sacrificing information through truncation or other ad hoc procedures.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper highlights that biological sequences can be extremely long (e.g., billions of nucleotides in humans), and training foundation models on such lengths leads to extreme gradient variance, instability, and reduced efficiency. While practical heuristics like sequence length warmup have been explored, these approaches often rely on truncation and may discard informative content.

The authors explicitly note that the fundamental cause of the observed instability in training with long sequences remains unresolved, underscoring a need for a principled understanding that could guide the development of more robust and efficient training methodologies for models such as transformers and related architectures used in genomics.

References

The essential reason for the correlation between long sequences and training instability has not been completely deciphered.

Progress and Opportunities of Foundation Models in Bioinformatics (2402.04286 - Li et al., 6 Feb 2024) in Challenges, subsection “Long sequence length”