Underlying reason for RedLLM long-context behavior
Characterize the underlying reason for the long-context behavior of the encoder–decoder large language model RedLLM—implemented with rotary positional embeddings across encoder self-attention, decoder self-attention, and cross-attention using continuous positions and pretrained with a prefix language modeling objective—when extrapolating to sequences substantially longer than the pretraining context length.
References
Still, the underlying reason behind RedLLM's long context behavior remains unclear, which we leave to the future.
— Encoder-Decoder or Decoder-Only? Revisiting Encoder-Decoder Large Language Model
(2510.26622 - Zhang et al., 30 Oct 2025) in Section 5, paragraph "The decoder self- and cross-attention in RedLLM show intriguing patterns under long context"